encodeBUORChID BU ORChID Boston University ORChID 2007 (OH Radical Cleavage Intensity Database) Pilot ENCODE Chromatin Structure Description This track displays the predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the ENCODE regions. Because the hydroxyl radical cleavage intensity is proportional to the solvent accessible surface area of the deoxyribose hydrogen atoms (Balasubramanian et al., 1998), this track represents a structural profile of the DNA in the ENCODE regions. Please visit the ORChID web site maintained by the Tullius group for access to experimental hydroxyl radical cleavage data, and to a server which can be used to predict the cleavage pattern for any input sequence. Display Conventions and Configuration This track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page. For more information, click the Graph configuration help link. Methods Hydroxyl radical cleavage intensity predictions were performed using an in-house sliding tetramer window (STW) algorithm. This algorithm draws data from the ·OH Radical Cleavage Intensity Database (ORChID), which contains more than 150 experimentally determined cleavage patterns. These predictions are fairly accurate, with a Pearson coefficient of 0.88 between the predicted and experimentally determined cleavage intensities. For more details on the hydroxyl radical cleavage method, see below for reference (Greenbaum et al. 2007). Verification The STW algorithm has been cross-validated by removing each test sequence from the training set and performing a prediction. The mean correlation coefficient (between predicted and experimental cleavage patterns) from this study was 0.88. Credits These data were generated through the combined effort of Bo Pang at MIT, Jason Greenbaum at The La Jolla Institute for Allergy and Immunology and Steve Parker, Eric Bishop and Tom Tullius of Boston University. References Balasubramanian B, Pogozelski WK, and Tullius TD DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. USA 95(17), 9738-9743 (1998). Price MA, and Tullius TD Using the Hydroxyl Radical to Probe DNA Structure. Meth. Enzymol. 212, 194-219 (1992). Tullius TD. Probing DNA Structure with Hydroxyl Radicals. In Current Protocols in Nucleic Acid Chemistry, (eds. Beaucage, S.L., Bergstrom, D.E., Glick, G.D. and Jones, R.A.) (Wiley, 2001), pp. 6.7.1-6.7.8. Greenbaum JA, Pang B, and Tullius TD Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 17(6), 947-953 (2007). cons44way Conservation Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics Description This track shows multiple alignments of 44 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. PhastCons (which has been used in previous Conservation tracks) is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more "texture" at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to "runs" of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites). Another important difference is that phyloP can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). In the phyloP plots, sites predicted to be conserved are assigned positive scores (and shown in blue), while sites predicted to be fast-evolving are assigned negative scores (and shown in red). The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution. The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1. Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as missing data, and both were run with the same parameters for each species set (vertebrates, placental mammals, and primates). Thus, in regions in which only primates appear in the alignment, all three sets of scores will be the same, but in regions in which additional species are available, the mammalian and/or vertebrate scores may differ from the primate scores. The alternative plots help to identify sequences that are under different evolutionary pressures in, say, primates and non-primates, or mammals and non-mammals. The species aligned for this track include the reptile, amphibian, bird, and fish clades, as well as marsupial, monotreme (platypus), and placental mammals. Compared to the previous 28-vertebrate alignment, this track includes 16 new species and 8 species with updated sequence assemblies (Table 1). The new species consist of two high-coverage (5-8.5X) assemblies (orangutan, zebra finch), low-coverage draft assemblies of gorilla, marmoset, tarsier, mouse lemur, kangaroo rat, squirrel, pika, megabat, microbat, dolphin, alpaca, sloth, rock hyrax and lamprey. The mouse, cow, guinea pig, horse, elephant, zebrafish, and medaka assemblies have been updated from those used in the previous 28-species alignment. UCSC has repeatmasked and aligned the low-coverage genome assemblies, and provides the sequence for download; however, we do not construct genome browsers for them. Missing sequence in the low-coverage assemblies is highlighted in the track display by regions of yellow when zoomed out and Ns displayed at base level (see Gap Annotation, below). OrganismSpeciesRelease dateUCSC versionalignment type HumanHomo sapiens Mar 2006 hg18reference species AlpacaVicugna pacosJul. 2008 vicPac1* Reciprocal Best ArmadilloDasypus novemcinctusJul. 2008 dasNov2* Reciprocal Best BushbabyOtolemur garnettiiDec. 2006 otoGar1* Reciprocal Best CatFelis catus Mar. 2006felCat3Reciprocal Best ChickenGallus gallus May 2006galGal3Syntenic Net ChimpPan troglodytes Mar. 2006panTro2Syntenic Net CowBos taurus Oct. 2007bosTau4Syntenic Net DogCanis lupus familiaris May 2005canFam2Syntenic Net DolphinTursiops truncatusFeb. 2008 turTru1* Reciprocal Best ElephantLoxodonta africanaJul. 2008 loxAfr2* Reciprocal Best FuguTakifugu rubripes Oct. 2004fr2MAF Net GorillaGorilla gorilla gorillaOct. 2008 gorGor1* Reciprocal Best Guinea PigCavia porcellus Feb. 2008cavPor3Syntenic Net HedgehogErinaceus europaeusJune 2006 eriEur1* Reciprocal Best HorseEquus caballus Sep. 2007equCab2Syntenic Net Kangaroo ratDipodomys ordiiJul. 2008 dipOrd1* Reciprocal Best LampreyPetromyzon marinus Mar. 2007petMar1MAF Net LizardAnolis carolinensis Feb. 2007anoCar1Reciprocal Best MarmosetCallithrix jacchus June 2007calJac1Reciprocal Best MedakaOryzias latipes Oct. 2005oryLat2MAF Net MegabatPteropus vampyrusJul. 2008 pteVam1* Reciprocal Best Little brown batMyotis lucifugusMar. 2006 myoLuc1* Reciprocal Best MouseMus musculus July 2007mm9Syntenic Net Mouse lemurMicrocebus murinusJun. 2003 micMur1* Reciprocal Best OpossumMonodelphis domestica Jan. 2006monDom4Syntenic Net OrangutanPongo pygmaeus abelii July 2007ponAbe2Syntenic Net PikaOchotona princepsJul. 2008 ochPri2* Reciprocal Best PlatypusOrnithorhynchus anatinus Mar. 2007ornAna1Reciprocal Best RabbitOryctolagus cuniculusMay 2005 oryCun1* Reciprocal Best RatRattus norvegicus Nov. 2004rn4Syntenic Net RhesusMacaca mulatta Jan. 2006rheMac2Syntenic Net Rock hyraxProcavia capensis Jul. 2008proCap1* Reciprocal Best ShrewSorex araneusJune 2006 sorAra1* Reciprocal Best SlothCholoepus hoffmanniJul. 2008 choHof1* Reciprocal Best SquirrelSpermophilus tridecemlineatusFeb. 2008 speTri1* Reciprocal Best SticklebackGasterosteus aculeatus Feb. 2006gasAcu1MAF Net TarsierTarsier syrichtaAug. 2008 tarSyr1* Reciprocal Best TenrecEchinops telfairiJuly 2005 echTel1* Reciprocal Best TetraodonTetraodon nigroviridis Feb. 2004tetNig1MAF Net TreeShrewTupaia belangeriDec. 2006 tupBel1* Reciprocal Best X. tropicalisXenopus tropicalis Aug. 2005xenTro2MAF Net Zebra finchTaeniopygia guttata Jul. 2008taeGut1Syntenic Net ZebrafishDanio rerio July 2007danRer5MAF Net Table 1. Genome assemblies included in the 44-way Conservation track. * Data download only, browser not available. Downloads for data in this track are available: Multiz alignments (MAF format), and phylogenetic trees PhyloP conservation (WIG format) PhastCons conservation (WIG format) Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Configuration buttons are available to select all of the species (Set all), deselect all of the species (Clear all), or use the default settings (Set defaults). By default, the following 11 species are included in the pairwise display: rhesus, mouse, dog, horse, armadillo, opossum, platypus, lizard, chicken, X. tropicalis (frog), and stickleback. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: No bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: Aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: Aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: Represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: Enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: The gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: The annotations from the genome displayed in the Default species to establish reading frame pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: Codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: Codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen (Table 2). Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Known Geneshuman, mouse Ensembl Genes alpaca, bush baby, cat, chicken, chimp, cow, dog, dolphin, frog, fugu, gorilla, guinea pig, hedgehog, horse, kangaroo rat, medaka, megabat, microbat, mouse lemur, opossum, orangutan, pika, platypus, rabbit, rat, rhesus, rock hyrax, shrew, squirrel, stickleback, tarsier, tenrec, tetraodon, tree shrew, zebrafish mRNAslamprey, lizard, marmoset, zebra finch No annotationarmadillo, elephant, sloth Table 2. Gene tracks used for codon translation. Methods Pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. Pairwise alignments were then linked into chains using a dynamic programming algorithm that finds maximally scoring chains of gapless subsections of the alignments organized in a kd-tree. The scoring matrix and parameters for pairwise alignment and chaining were tuned for each species based on phylogenetic distance from the reference. High-scoring chains were then placed along the genome, with gaps filled by lower-scoring chains, to produce an alignment net. For more information about the chaining and netting process and parameters for each species, see the description pages for the Chain and Net tracks. An additional filtering step was introduced in the generation of the 44-way conservation track to reduce the number of paralogs and pseudogenes from the high-quality assemblies and the suspect alignments from the low-quality assemblies: the pairwise alignments of high-quality mammalian sequences (placental and marsupial) were filtered based on synteny; those for 2X mammalian genomes were filtered to retain only alignments of best quality in both the target and query ("reciprocal best"). The resulting best-in-genome pairwise alignments were progressively aligned using multiz/autoMZ, following the tree topology diagrammed above, to produce multiple alignments. The multiple alignments were post-processed to add annotations indicating alignment gaps, genomic breaks, and base quality of the component sequences. The annotated multiple alignments, in MAF format, are available for bulk download. An alignment summary table containing an entry for each alignment block in each species was generated to improve track display performance at large scales. Framing tables were constructed to enable visualization of codons in the multiple alignment display. Phylogenetic Tree Model Both phastCons and phyloP are phylogenetic methods that rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 44way alignment (msa_view). The 4d sites were derived from the RefSeq (Reviewed+Coding) gene set, filtered to select single-coverage long transcripts. The placental mammal tree model and primate tree model were extracted from the vertebrate model. PhastCons Conservation The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size; therefore, short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. 2005. The phastCons parameters were tuned to produce 5% conserved elements in the genome for the vertebrate conservation measurement. This parameter set (expected-length=45, target-coverage=.3, rho=.31) was then used to generate the placental mammal and primate conservation scoring. PhyloP Conservation The phyloP program supports several different methods for computing p-values of conservation or acceleration, for individual nucleotides or larger elements ( http://compgen.cshl.edu/phast/). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., the --subtree option was not used). The scores were computed by performing a likelihood ratio test at each alignment column (--method LRT), and scores for both conservation and acceleration were produced (--mode CONACC). Conserved Elements The conserved elements were predicted by running phastCons with the --viterbi option. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created using the following programs: Alignment tools: blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC Conservation scoring: phastCons, phyloP, phyloFit, tree_doctor, msa_view and other programs in PHAST by Adam Siepel at Cold Spring Harbor Laboratory (original development done at the Haussler lab at UCSC). MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC Tree image generator: phyloPng by Galt Barber, UCSC Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle display), and Brian Raney (gap annotation and codon framing) at UCSC The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Phylo-HMMs, phastCons, and phyloP: Pollard KS, Hubisz MJ, Siepel A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 2009 Oct 26. [Epub ahead of print] Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. cons44wayViewalign Multiz Alignments Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics multiz44way Multiz Align Multiz Alignments of 44 Vertebrates Comparative Genomics cons44wayViewphastcons Element Conservation (phastCons) Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phastCons44way Vertebrate Cons Vertebrate Conservation by PhastCons Comparative Genomics phastCons44wayPlacental Mammal Cons Placental Mammal Conservation by PhastCons Comparative Genomics phastCons44wayPrimates Primate Cons Primate Conservation by PhastCons Comparative Genomics cons44wayViewelements Conserved Elements Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phastConsElements44way Vertebrate El Vertebrate Conserved Elements Comparative Genomics phastConsElements44wayPlacental Mammal El Placental Mammal Conserved Elements Comparative Genomics phastConsElements44wayPrimates Primate El Primate Conserved Elements Comparative Genomics cons44wayViewphyloP Basewise Conservation (phyloP) Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phyloP44wayAll Vertebrate Cons Vertebrate Basewise Conservation by PhyloP Comparative Genomics phyloP44wayPlacMammal Mammal Cons Placental Mammal Basewise Conservation by PhyloP Comparative Genomics phyloP44wayPrimate Primate Cons Primate Basewise Conservation by PhyloP Comparative Genomics cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 mrna Human mRNAs Human mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between human mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods GenBank human mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. Note that this track was created using a version of RepeatMasker from Nov. 2005 along with Repbase Update 9.11. In hg18, there is also a RepMask 3.2.7 track which was created in 2009 using a newer version of RepeatMasker and Repbase Update. All of the hg18 tracks are based upon this original track and not upon the newer RepMask 3.2.7 track. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. knownGene UCSC Genes UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics) Genes and Gene Predictions Description The UCSC Genes track shows gene predictions based on data from RefSeq, Genbank, CCDS and UniProt. This is a moderately conservative set of predictions, requiring the support of one GenBank RNA sequence plus at least one additional line of evidence. The RefSeq RNAs are an exception to this, requiring no additional evidence. The track includes both protein-coding and putative non-coding transcripts. Some of these non-coding transcripts may actually code for protein, but the evidence for the associated protein is weak at best. Compared to RefSeq, this gene set has generally about 10% more protein-coding genes, approximately five times as many putative non-coding genes, and about twice as many splice variants. Display Conventions and Configuration This track in general follows the display conventions for gene prediction tracks. The exons for putative noncoding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. The following color key is used: Black -- feature has a corresponding entry in the Protein Data Bank (PDB) Dark blue -- transcript has been reviewed or validated by either the RefSeq, SwissProt or CCDS staff Medium blue -- other RefSeq transcripts Light blue -- non-RefSeq transcripts This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods The UCSC Genes are built using a multi-step pipeline: RefSeq and GenBank RNAs are aligned to the genome with BLAT, keeping only the best alignments for each RNA and discarding alignments of less than 98% identity. Alignments are broken up at non-intronic gaps, with small isolated fragments thrown out. A splicing graph is created for each set of overlapping alignments. This graph has an edge for each exon or intron, and a vertex for each splice site, start, and end. Each RNA that contributes to an edge is kept as evidence for that edge. Gene models from the Consensus CDS project (CCDS) are also added to the graph. A similar splicing graph is created in the mouse, based on mouse RNA and ESTs. If the mouse graph has an edge that is orthologous to an edge in the human graph, that is added to the evidence for the human edge. If an edge in the splicing graph is supported by two or more human ESTs, it is added as evidence for the edge. If there is an Exoniphy prediction for an exon, that is added as evidence. The graph is traversed to generate all unique transcripts. The traversal is guided by the initial RNAs to avoid a combinatorial explosion in alternative splicing. All refSeq transcripts are output. For other multi-exon transcripts to be output, an edge supported by at least one additional line of evidence beyond the RNA is required. Single-exon genes require either two RNAs or two additional lines of evidence beyond the single RNA. Protein predictions are generated. For non-RefSeq transcripts we use the txCdsPredict program to determine if the transcript is protein-coding and if so, the locations of the start and stop codons. The program weighs as positive evidence the length of the protein, the presence of a Kozak consensus sequence at the start codon, and the length of the orthologous predicted protein in other species. As negative evidence it considers nonsense-mediated decay and start codons in any frame upstream of the predicted start codon. For RefSeq transcripts the RefSeq protein prediction is used directly instead of this procedure. For CCDS proteins the CCDS protein is used directly. The corresponding UniProt protein is found, if any. The transcript is assigned a permanent "uc" accession. Credits The UCSC Genes track was produced at UCSC using a computational pipeline developed by Jim Kent, Chuck Sugnet and Mark Diekhans. It is based on data from NCBI RefSeq, UniProt (including TrEMBL and TrEMBL-NEW), CCDS, and GenBank. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them. Data Use Restrictions The UniProt data have the following terms of use, UniProt copyright(c) 2002 - 2004 UniProt consortium: For non-commercial use, all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. For commercial use, all databases and documents in the UniProt FTP directory except the files ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found at the UniProt License & disclaimer page. From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32:D23-6. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46. Kent WJ. BLAT - The BLAST-Like Alignment Tool. Genome Res. 2002 Apr;12(4):656-64. wgEncodeRegTxn Transcription ENCODE Transcription Levels Assayed by RNA-seq on 6 Cell Lines Regulation Description This track shows transcription levels for several cell types as assayed by high throughput sequencing of polyadenylated RNA (RNA-seq). Additional views of this dataset and additional documentation on the methods used for this track are available at the ENCODE Caltech RNA-seq page. The Raw Signal view derived from the paired 75-mer reads is shown here. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Wold Lab at Cal Tech, part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeReg ENCODE Regulation ENCODE Integrated Regulation Regulation Description These tracks contain information relevant to the regulation of transcription from the ENCODE project. The Transcription track shows transcription levels assayed by sequencing of polyadenylated RNA from a variety of cell types. The Enhancer H3K4Me1 and Enhancer H3K27Ac tracks show where modification of histone proteins is suggestive of enhancer and, to a lesser extent, promoter activity. These histone modifications, particularly H3K4Me1, are quite broad. The actual enhancers are typically just a small portion of the area marked by these histone modifications. The Promoter H3K4Me3 track shows a histone mark associated with promoters. The DNase Clusters track shows regions where the chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase sensitive, and promoters are particularly DNase sensitive. The Txn Factor ChIP track shows DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq). These tracks complement each other and together can shed much light on regulatory DNA. The histone marks are informative at a high level, but they have a resolution of just ~200 bases and do not provide much in the way of functional detail. The DNase hypersensitive assay is higher in resolution at the DNA level and can be done on a large number of cell types since it's just a single assay. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, but provides little information beyond that. The transcription factor ChIP assay has a high resolution at the DNA level, and, due to the very specific nature of the transcription factors, is often informative with respect to functional detail. However, since each transcription factor must be assayed separately, the information is only available for a limited number of transcription factors on a limited number of cell lines. Though each assay has its strengths and weaknesses, the fact that all of these assays are relatively independent of each other gives increased confidence when multiple tracks are suggesting a regulatory function for a region. For additional information please click on the hyperlinks for the individual tracks above. Also note that additional histone marks and transcription information is available in other ENCODE tracks. This integrative Super-track just shows a selection of the most informative data of most general interest. Display conventions By default, the transcription and histone mark displays use a transparent overlay method of displaying data from a number of cell lines in a single track. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. The DNase and Transcription Factor ChIP tracks contain information on so many cell lines that a color convention is inadequate. Instead, these tracks show gray boxes where the darkness of the box is proportional to the maximum value seen in any cell line in that region. Clicking on the item takes you to a details page where the values for each cell line assayed are displayed. Credits The data in this super-track comes from the ENCODE grants led by Bradley Bernstein (Broad Institute), Richard Myers (HudsonAlpha Institute), Michael Snyder (Stanford) and John Stamatoyannopoulos (University of Washington). Specific labs and contributors for these datasets are listed in the Credits section of the individual tracks in this super-track. The integrative view was developed by Jim Kent at UCSC. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeRegTxnNhek NHEK Transcription of NHEK cells from ENCODE Regulation wgEncodeRegTxnK562 K562 Transcription of K562 cells from ENCODE Regulation wgEncodeRegTxnHuvec HUVEC Transcription of HUVEC cells from ENCODE Regulation wgEncodeRegTxnHepg2 HepG2 Transcription of HepG2 cells from ENCODE Regulation wgEncodeRegTxnH1hesc H1 ES Transcription of H1 ES cells from ENCODE Regulation wgEncodeRegTxnGm12878 Gm12878 Transcription of Gm12878 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1 Layered H3K4Me1 ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on 8 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me1 histone mark across the genome as determined by a ChIP-seq assay. The H3K4me1 histone mark is the mono-methylation of lysine 4 of the H3 histone protein, and it is associated with enhancers and with DNA regions downstream of transcription starts. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkEnhH3k4me1Nhlf NHLF Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on NHLF cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Nhek NHEK Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on NHEK cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1K562 K562 Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on K562 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Huvec HUVEC Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Hsmm HSMM Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HSMM cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Hmec HMEC Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1H1hesc H1 ES Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on H1 ES cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Gm12878 Gm12878 Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on Gm12878 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27ac Enhanced H3K27Ac ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on 8 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K27Ac histone mark across the genome as determined by a ChIP-seq assay. The H3K27Ac histone mark is the acetylation of lysine 27 of the H3 histone protein, and it is thought to enhance transcription possibly by blocking the spread of the repressive histone mark H3K27Me3. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkEnhH3k27acNhlf NHLF Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on NHLF cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acNhek NHEK Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on NHEK cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acK562 K562 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on K562 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHuvec HUVEC Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHsmm HSMM Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HSMM cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHmec HMEC Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHepg2 HepG2 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HepG2 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acGm12878 Gm12878 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on Gm12878 cells from ENCODE Regulation wgEncodeRegMarkPromoter Layered H3K4Me3 ENCODE Promoter-Associated Histone Mark (H3K4Me3) on 9 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me3 histone mark across the genome as determined by a ChIP-seq assay. The H3K4Me3 histone mark is the tri-methylation of lysine 4 of the H3 histone protein, and it is associated with promoters that are active or poised to be activated. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkPromoterNhlf NHLF Promoter-Associated Histone Mark (H3K4Me3) on NHLF cells from ENCODE Regulation wgEncodeRegMarkPromoterNhek NHEK Promoter-Associated Histone Mark (H3K4Me3) on NHEK cells from ENCODE Regulation wgEncodeRegMarkPromoterK562 K562 Promoter-Associated Histone Mark (H3K4Me3) on K562 cells from ENCODE Regulation wgEncodeRegMarkPromoterHuvec HUVEC Promoter-Associated Histone Mark (H3K4Me3) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkPromoterHsmm HSMM Promoter-Associated Histone Mark (H3K4Me3) on HSMM cells from ENCODE Regulation wgEncodeRegMarkPromoterHmec HMEC Promoter-Associated Histone Mark (H3K4Me3) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me3Hepg2 HepG2 Promoter-Associated Histone Mark (H3K4Me3) on HepG2 cells from ENCODE Regulation wgEncodeRegMarkPromoterH1hesc H1 ES Promoter-Associated Histone Mark (H3K4Me3) on H1 cells from ENCODE Regulation wgEncodeRegMarkPromoterGm12878 Gm12878 Promoter-Associated Histone Mark (H3K4Me3) on Gm12878 cells from ENCODE Regulation wgEncodeRegDnaseClustered DNase Clusters Cluster 2010-10-22 Kent UCSC wgEncodeRegDnaseClustered Element Clusters by Integrative Analysis Kent Kent - UC Santa Cruz ENCODE Digital DNaseI Hypersensitivity Clusters Regulation Description This track shows DNase hypersensitive areas assayed in a large collection of cell types. Regulatory regions in general, and promoters in particular, tend to be DNase sensitive. Additional views of this dataset and additional documentation on the methods used for this track are available at the UW DNaseI HS page. The Peaks view in that page is the basis for the clusters shown here, which combine data from the peaks of the different cell lines in that page. Display conventions A gray box indicates the extent of the hypersensitive region. The darkness is proportional to the maximum signal strength observed in any cell line. The number to the left of the box shows how many cell lines are hypersensitive in the region. Credits This track shows data from the UW ENCODE group. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. The full data release policy for ENCODE is available here. wgEncodeRegTfbsClustered Txn Factor ChIP Cluster 2010-10-22 Kent UCSC wgEncodeRegTfbsClustered Element Clusters by Integrative Analysis Kent Kent - UC Santa Cruz ENCODE Transcription Factor ChIP-seq Regulation Description This track shows regions where transcription factors, proteins responsible for modulating gene transcription, bind to DNA as assayed by ChIP-seq (chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA). Additional views of this dataset and additional documentation on the methods used for this track are available at the Yale TFBS Track page. Some data in this track are from the HAIB TFBS Track, which has been dropped from hg18. The Peaks views in those pages are the basis for the clusters shown here, which combine data from the peaks from the different cell lines and different transcription factors in those pages. Display Conventions and Configuration A gray box encompasses the peaks of transcription factor occupancy. The darkness of the box is proportional to the maximum signal strength observed in any cell line. The name to the left of the box is the transcription factor. The letters to the right represent the cell lines where a signal is detected. The darkness of the letter is proportional to the signal strength in the cell line. Click on an item in the track to see the cell lines spelled out. Credits This track shows data from the Myers Lab at the HudsonAlpha Institute for Biotechnology and by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University; Peggy Farnham at UC Davis; and Kevin Struhl at Harvard. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. The full data release policy for ENCODE is available here. multiz17way 17-Way Cons Vertebrate Multiz Alignment & Conservation (17 Species) Comparative Genomics Description This track shows a measure of evolutionary conservation in 17 vertebrates, including mammalian, amphibian, bird, and fish species, based on a phylogenetic hidden Markov model, phastCons (Siepel et al., 2005). Multiz alignments of the following assemblies were used to generate this track: human (Mar. 2006 (NCBI36/hg18), hg18) chimp (Nov 2003, panTro1) macaque (Jan 2006, rheMac2) mouse (Feb 2006, mm8) rat (Nov 2004, rn4) rabbit (May 2005, oryCun1) dog (May 2005, canFam2) cow (Mar 2005, bosTau2) armadillo (May 2005, dasNov1) elephant (May 2005, loxAfr1) tenrec (Jul 2005, echTel1) opossum (Jan 2006, monDom4) chicken (Feb 2004, galGal2) frog (Oct 2004, xenTro1) zebrafish (May 2005, danRer3) Tetraodon (Feb 2004, tetNig1) Fugu (Aug 2002, fr1) Display Conventions and Configuration In full and pack display modes, conservation scores are displayed as a "wiggle" (histogram), where the height reflects the size of the score. Pairwise alignments of each species to the human genome are displayed below as a grayscale density plot (in pack mode) or as a "wiggle" (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. The conservation wiggle can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Checkboxes in the track configuration section allow excluding species from the pairwise display; however, this does not remove them from the conservation score display. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. Gap Annotation The "Display chains between alignments" configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the "Default species for translation" pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen: Gene TrackSpecies Known Geneshuman, mouse, rat RefSeq Geneschicken MGC GenesX. tropicalis Ensembl GenesFugu, chimp mRNAsrhesus, rabbit, dog, cow, zebrafishnot translatedarmadillo, elephant, tenrec, opossum, Tetraodon Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. The pairwise alignments were then multiply aligned using multiz, following the ordering of the species tree diagrammed above. The resulting multiple alignments were then assigned conservation scores by phastCons, using a tree model with branch lengths derived from the ENCODE project Multi-Species Sequence Analysis group, September 2005 tree model. This tree was generated from TBA alignments over 23 vertebrate species and is based on 4D sites. The phastCons parameters were tuned to produce 5% conserved elements in the genome: expected-length=14, target-coverage=.008, rho=.28. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. Conservation track display by Hiram Clawson ("wiggle" display), Brian Raney (gap annotation and codon framing) and Kate Rosenbloom, codon frame software by Mark Diekhans at UCSC. The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community. References Phylo-HMMs and phastCons Felsenstein, J. and Churchill, G.A. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93-104 (1996). Siepel, A. and Haussler, D. Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York (2005). Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). Yang, Z. A space-time process model for the evolution of DNA sequences. Genetics 139, 993-1005 (1995). Chain/Net: Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003). Multiz: Blanchette, M., Kent, W.J., Riemer, C., Elnitski, .L, Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W. Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner. Genome Res. 14(4), 708-15 (2004). Blastz: Chiaromonte, F., Yap, V.B., and Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002). Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003). Phylogenetic Tree: Murphy, W.J., et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001). encodeNhgriDukeDnaseHs Duke/NHGRI DNase Duke/NHGRI DNaseI Hypersensitivity Pilot ENCODE Chromatin Structure Description This track displays DNaseI-hypersensitive sites identified using two methods (DNase-chip and MPSS sequencing) in seven human cell types: primary unactivated and activated CD4+ T cells GM06990 lymphoblastoid HeLa S3 cervical carcinoma (Puck et al., 1956) HepG2 liver carcinoma H9 human undifferentiated embryonic stem (ES) (Thomson et al., 1998) IMR90 human fibroblast K562 myeloid leukemia-derived (Klein et al., 1976) DNaseI-hypersensitive sites are associated with all types of gene regulatory regions, including promoters, enhancers, silencers, insulators, and locus control regions. Display Conventions and Configuration The subtracks within this track are grouped into three sections: Raw subtracks display log2 ratio data averaged from three biological replicates and three DNase concentrations. Pval subtracks show significant regions that likely represent valid DNaseI-hypersensitive sites based on the raw data. The higher the score for the region, the more likely the site is to be hypersensitive. Regions have unique identifiers that are prefixed with the cell type. For display purposes, the p value scores were mapped to integer scores in the range 0-1000. Regions are displayed in a range of light gray to black, based on score. MPSS subtracks show hypersensitive sites determined by massively parallel signature sequencing (MPSS). Each cluster has a unique identifier. The last digit of each identifier represents the number of sequences that map within that particular cluster. The sequence number is also reflected in the score, e.g. a cluster of two sequences scores 500, three sequences scores 750 and four or more sequences scores 1000. Sites are displayed in a range of light gray to black, based on score. The "Raw" and "Pval" subtracks are displayed by default. Use the checkboxes on the Track Settings page to change the subtracks displayed. Methods DNase-Chip DNaseI hypersensitive sites were isolated using a method called DNase-chip (Crawford et al., 2006). Briefly, DNaseI digested ends from intact chromatin were captured using three different DNase concentrations as well as three biological replicates. This material was amplified, labeled, and hybridized to NimbleGen ENCODE tiled microarrays. H9 human ES cells (Thomson et al., 1998) were cultured on a feeder layer of mitotically inactivated mouse embryo fibroblasts. For analysis, human ES cell colonies were separated away from the feeder layer and processed for DNaseI hypersensitive site mapping. Cultures were routinely inspected by immunohistochemistry, flow cytometry, and microarray to ensure that the human ES cells were in the undifferentiated state. For the DNase-chip experiments, the raw data were averaged from nine hybridizations per cell type. The Pval scores represent -log10 p values as determined by the ACME (Algorithm for Capturing Microarray Enrichment) program (Scacheri et al., 2006). Only regions that had p value < 0.001 were included. For display in the Genome Browser, the p value scores were mapped to integer scores in the range 0-1000 using the following formula: score = (pVal * 35) + 100. The -log10 p values can be viewed using the Table Browser. MPSS Sequencing Primary human CD4+ T cells were activated by incubation with anti-CD3 and anti-CD28 antibodies for 24 hours. DNaseI-hypersensitive sites were cloned from the cells before and after activation, and sequenced using massively parallel signature sequencing (Brenner et al., 2000; Crawford et al., 2006). Only those clusters of multiple DNaseI library sequences that map within 500 bases of each other are displayed. Verification DNase-Chip A real-time PCR assay (McArthur et al., 2001; Crawford et al. , 2004) was used to validate a randomly selected subset of DNase-chip regions. For the New DNase-chip, the Sensitivity of DNase-chip was determined to be > 86% and Specificity to be > 97%. Approximately 20-30% of regions detected in only a single DNase concentration are valid. 50-80% of regions detected in two out of three DNase concentrations are valid (the exact percentage depends on which two DNase concentrations had significant signal). 90% of regions detected in all three DNase concentrations are valid. This data set includes elements for all 44 ENCODE regions. MPSS Sequencing Real-time PCR was used to verify valid DNaseI-hypersensitive sites. Approximately 50% of clusters of two sequences are valid. These clusters are shown in light gray. 80% of clusters of three sequences are valid, indicated by dark gray. 100% of clusters of four or more sequences are valid, shown in black. This data set includes confirmed elements for 35 of the 44 ENCODE regions. It is estimated that these data identify 10-20% of all hypersensitive sites within CD4+ T cells. Further sequencing will be required to identify additional sites. MPSS data from the whole genome can be found in the Expression and Regulation track group (NHGRI DNaseI-HS track). Credits These data were produced at the Crawford Lab at Duke University, and at the Collins Lab at NHGRI. Thanks to Gregory E. Crawford and Francis S. Collins for supplying the information for this track. H9 cells were grown in collaboration with Ron McKay and Paul Tesar at the National Institute of Neurological Disorders and Stroke (NINDS)—an institute of the National Institutes of Health (NIH). References Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000 Jun;18(6):630-4. Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: A high resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nature Methods. 2006 Jul;3(7):503-9. Crawford GE, Holt IE, Mullikin JC, Tai D, Blakesley R, Bouffard G, Young A, Masiello C, Green ED, Wolfsberg TG et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci USA. 2004 Jan 27;101(4):992-7. Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31. (See also NHGRI's data site for the project.) Klein E, Ben-Bassat H, Neumann H, Ralph P, Zeuthen J, Polliack A, Vanky F. Properties of the K562 cell line, derived from a patient with chronic myeloid leukemia. Int J Cancer. 1976 Oct 15;18(4):421-31. McArthur M, Gerum S, Stamatoyannopoulos G. Quantification of DNaseI-sensitivity by real-time PCR: quantitative analysis of DNaseI-hypersensitivity of the mouse beta-globin LCR . J Mol Biol. 2001 Oct 12;313(1):27-34. Puck TT, Marcus PI, Cieciura SJ. Clonal growth of mammalian cells in vitro: growth characteristics of colonies from single HeLa cells with and without a "feeder" layer. J Exp Med. 1956 Feb 1;103(2):273-83. Scacheri PC, Crawford GE, Davis S. Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. Methods Enzymol. 2006;411:270-82. Thomson JA, Itskovitz-Eldor J, Shapirom SS, Waknitz MA, Swiergiel JJ, Marshall VS, Jones JM. Embryonic stem cell lines derived from human blastocysts. Science. 1998 Nov 6;282(5391):1145-7. encodeNhgriDnaseHsMpssCd4Act DNase CD4-act MS Duke/NHGRI DNaseI Hypersensitive Sites (CD4+ T-Cells Activated, MPSS method) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsMpssCd4 DNase CD4 MS Duke/NHGRI DNaseI Hypersensitive Sites (CD4+ T-Cells, MPSS method) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalK562 DNase K562 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (K562) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalImr90 DNase IMR90 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (IMR90) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalH9 DNase H9 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (H9) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalHepG2 DNase HepG2 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (HepG2) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalHela DNase HeLa Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (HeLaS3) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalCd4 DNase CD4 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (CD4+) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalGm06990 DNase GM069 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (GM06990) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawK562 DNase K562 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (K562) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawImr90 DNase IMR90 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (IMR990) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawH9 DNase H9 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (H9) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawHepG2 DNase HepG2 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (HepG2) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawHela DNase HeLa Raw Duke/NHGRI DNaseI Hypersensitivity Raw (HeLaS3) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawCd4 DNase CD4 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (CD4+) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawGm06990 DNase GM069 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (GM06990) Pilot ENCODE Chromatin Structure refSeqComposite NCBI RefSeq RefSeq genes from NCBI Genes and Gene Predictions Description The NCBI RefSeq Genes composite track shows human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences between the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we advise using NCBI aligned tables like RefSeq All or RefSeq Curated. See the Methods section for more details about how the different tracks were created. Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track is a composite track that contains differing data sets. To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide. Note: Not all subtracts are available on all assemblies. The possible subtracks include: RefSeq aligned annotations and UCSC alignment of RefSeq annotations RefSeq All – all curated and predicted annotations provided by RefSeq. RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, NP or YP. (NP and YP are used only for protein-coding genes on the mitochondrion; YP is used for human only.) RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR. RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks, as they do not have a product and therefore no RefSeq accession. More than 90% are pseudogenes, T-cell receptor or immunoglobulin segments. The few remaining entries are gene clusters (e.g. protocadherin). RefSeq Alignments – alignments of RefSeq RNAs to the human genome provided by the RefSeq group, following the display conventions for PSL tracks. RefSeq Diffs – alignment differences between the human reference genome(s) and RefSeq transcripts. (Track not currently available for every assembly.) UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track. RefSeq Select+MANE (subset) – Subset of RefSeq Curated, transcripts marked as RefSeq Select or MANE Select. A single Select transcript is chosen as representative for each protein-coding gene. This track includes transcripts categorized as MANE, which are further agreed upon as representative by both NCBI RefSeq and Ensembl/GENCODE, and have a 100% identical match to a transcript in the Ensembl annotation. See NCBI RefSeq Select. Note that we provide a separate track, MANE (hg38), which contains only the MANE transcripts. RefSeq HGMD (subset) – Subset of RefSeq Curated, transcripts annotated by the Human Gene Mutation Database. This track is only available on the human genomes hg19 and hg38. It is the most restricted RefSeq subset, targeting clinical diagnostics. The RefSeq All, RefSeq Curated, RefSeq Predicted, RefSeq HGMD, RefSeq Select/MANE and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq. Color Level of review Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information. Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff. Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted. The item labels and codon display properties for features within this track can be configured through the check-box controls at the top of the track description page. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list . Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely. Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. The RefSeq Diffs track contains five different types of inconsistency between the reference genome sequence and the RefSeq transcript sequences. The five types of differences are as follows: mismatch – aligned but mismatching bases, plus HGVS g. to show the genomic change required to match the transcript and HGVS c./n. to show the transcript change required to match the genome. short gap – genomic gaps that are too small to be introns (arbitrary cutoff of < 45 bp), most likely insertions/deletion variants or errors, with HGVS g. and c./n. showing differences. shift gap – shortGap items whose placement could be shifted left and/or right on the genome due to repetitive sequence, with HGVS c./n. position range of ambiguous region in transcript. Here, thin and thick lines are used -- the thin line shows the span of the repetitive sequence, and the thick line shows the rightmost shifted gap. double gap – genomic gaps that are long enough to be introns but that skip over transcript sequence (invisible in default setting), with HGVS c./n. deletion. skipped – sequence at the beginning or end of a transcript that is not aligned to the genome (invisible in default setting), with HGVS c./n. deletion HGVS Terminology (Human Genome Variation Society): g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. When reporting HGVS with RefSeq sequences, to make sure that results from research articles can be mapped to the genome unambiguously, please specify the RefSeq annotation release displayed on the transcript's Genome Browser details page and also the RefSeq transcript ID with version (e.g. NM_012309.4 not NM_012309). Methods Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information about the NCBI annotation pipeline can be found here. The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments. The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Data Access The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the REST API, Table Browser or Data Integrator. The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing. The previous track versions are available in the archives of our downloads server. You can also access any RefSeq table entries in JSON format through our JSON API. The data in the RefSeq Other and RefSeq Diffs tracks are organized in bigBed file format; more information about accessing the information in this bigBed file can be found below. The other subtracks are associated with database tables as follows: genePred format: RefSeq All - ncbiRefSeq RefSeq Curated - ncbiRefSeqCurated RefSeq Predicted - ncbiRefSeqPredicted RefSeq HGMD - ncbiRefSeqHgmd RefSeq Select+MANE - ncbiRefSeqSelect UCSC RefSeq - refGene PSL format: RefSeq Alignments - ncbiRefSeqPsl The first column of each of these tables is "bin". This column is designed to speed up access for display in the Genome Browser, but can be safely ignored in downstream analysis. You can read more about the bin indexing system here. The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed files, which can be obtained from our downloads server here, ncbiRefSeqOther.bb and ncbiRefSeqDiffs.bb. Individual regions or the whole set of genome-wide annotations can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system from the utilities directory linked below. For example, to extract only annotations in a given region, you could use the following command: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg18/ncbiRefSeq/ncbiRefSeqOther.bb -chrom=chr16 -start=34990190 -end=36727467 stdout You can download a GTF format version of the RefSeq All table from the GTF downloads directory. The genePred format tracks can also be converted to GTF format using the genePredToGtf utility, available from the utilities directory on the UCSC downloads server. The utility can be run from the command line like so: genePredToGtf hg18 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access section. A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads server here. Please refer to our mailing list archives for questions. Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server. Credits This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 refGene UCSC RefSeq UCSC annotations of RefSeq RNAs (NM_* and NR_*) Genes and Gene Predictions Description The RefSeq Genes track shows known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 snp130 SNPs (130) Simple Nucleotide Polymorphisms (dbSNP build 130) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 130, available from ftp.ncbi.nih.gov/snp. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The configuration categories reflect the following definitions (not all categories apply to this assembly): Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name No Variation - no variation asserted for sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G} Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap - validated by HapMap project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. For filtering and coloring, function terms are grouped into more general categories: Locus Region - variation is 3' to and within 500 bases of a transcript, or is 5' to and within 2000 bases of a transcript (dbSNP terms: near-gene-3, near-gene-5; Sequence Ontology terms: downstream_gene_variant, upstream_gene_variant) Coding - Synonymous - no change in peptide for allele with respect to reference assembly (dbSNP term: coding-synon; Sequence Ontology term: synonymous_variant) Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly (dbSNP terms: nonsense, missense, frameshift; Sequence Ontology terms: stop_gained, missense_variant, frameshift_variant) Untranslated - variation in transcript, but not in coding region interval (dbSNP terms: untranslated-3, untranslated-5; Sequence Ontology terms: 3_prime_UTR_variant, 5_prime_UTR_variant) Intron - variation in intron, but not in first two or last two bases of intron (dbSNP term: intron; Sequence Ontology term: intron_variant) Splice Site - variation in first two or last two bases of intron (dbSNP terms: splice-3, splice-5; Sequence Ontology terms: splice_acceptor_variant, splice_donor_variant) Note: these terms were not actually assigned to any variants in dbSNP build 130. Unknown - no known functional classification Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Average heterozygosity: Calculated by dbSNP as described here Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 3. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene). Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Annotations UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found: The dbSNP reference allele is not the same as the UCSC reference allele, i.e. the bases in the mapped position range. Class is single, in-del, mnp or mixed and the UCSC reference allele does not match any observed allele. In NCBI's alignment of flanking sequences to the genome, part of the flanking sequence around the SNP does not align to the genome. Class is single, but the size of the mapped SNP is not one base. Class is named and indicates an insertion or deletion, but the size of the mapped SNP implies otherwise. Class is single and the format of observed alleles is unexpected. The length of the observed allele(s) is not available because it is too long. Multiple distinct insertion SNPs have been mapped to this location. At least one observed allele contains an ambiguous IUPAC base (e.g. R, Y, N). Another condition, which does not necessarily imply any problem, is noted: Class is single and SNP is tri-allelic or quad-allelic. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b130_SNPContigLoc_36_3.bcp.gz and b130_SNPContigInfo_36_3.bcp.gz. b130_SNPMapInfo_36_3.bcp.gz provided the alignment weights. Functional classification was obtained from b130_SNPContigLocusId_36_3.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Orthologous Alleles (human assemblies only) Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. Beginning with dbSNP build 129, the orangutan assembly is also included. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' chromEnd = chromStart + 1 align to just one location are not aligned to a chrN_random chrom are biallelic (not tri or quad allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download here. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. intronEst Spliced ESTs Human ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between human expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the human EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, human ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 cpgIslandExtUnmasked Unmasked CpG CpG Islands on All Sequence (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 phastConsElements17way 17-Way Most Cons PhastCons Conserved Elements, 17-way Vertebrate Multiz Alignment Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program. PhastCons is part of the PHAST (PHylogenetic Analysis with Space/Time models) package. The predictions are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 rho rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). Chain/Net Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003). Multiz Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708-15 (2004). Blastz Chiaromonte, F., Yap, V.B., Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002). Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003). encodeUncFaire UNC FAIRE UNC FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) Pilot ENCODE Chromatin Structure Description Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) is a procedure used to isolate chromatin that is resistant to the formation of protein-DNA cross-links. These tracks display FAIRE data from 2091 fibroblast cells hybridized to high-resolution NimbleGen arrays that tile the ENCODE regions. The four datasets, in practical terms, can be thought of as independent replicates. However, because they were part of a series of experiments aimed at optimizing cross-linking conditions in human cells, the data represent different cross-linking times (1, 2, 4, and 7 minutes). Although the individual replicates are not displayed, the replicate data and also the signal averages and the peaks for the averages can be downloaded. Display Conventions and Configuration The FAIRE data are represented by three subtracks. One subtrack shows the average normalized log2 ratios for the tiled probes; the other two subtracks display peaks. The peaks in one set were determined using PeakFinder software supplied by NimbleGen. A false positive rate (FPR) was estimated for the peak set using a permutation-based method. All peaks had an FPR of < 0.01. The peaks in the other set (Apr. 2006 update) were identified by ChIPOTle, a peak-finding algorithm that uses a sliding window to identify statistically significant signals that comprise a peak. A null distribution was determined by reflecting the negative data, which is presumed to be noise, about zero and a Gaussian distribution was fitted to it. Windows were considered significant with a p-value < 1e-25, after using the Benjamini-Hochberg correction for multiple tests. This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only one subtrack, uncheck the box next to the track you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Note that the graphical configuration options are available only for the Signal subtrack; the Peaks subtracks are fixed. Methods To perform FAIRE, proteins were cross-linked to DNA using 1% formaldehyde solution, the complex was sheared using sonication, and a phenol/chloroform extraction was performed to remove DNA fragments crosslinked to protein. The DNA recovered in the aqueous phase was fluorescently-labeled and hybridized to a microarray along with fluorescently-labeled genomic DNA as a control. Ratios were scaled by subtracting the Tukey Bi-weight mean for the log-ratio values from each log-ratio value, as recomended by NimbleGen. Results in yeast were consistent with enrichment for nucleosome-depleted regions of the genome. Therefore, the method may have utility as a positive selection for genomic regions with properties normally detected by assays like DNAse hypersensitivity. Verification The data were verified using PCR with primers designed to promoters enriched with FAIRE and downstream coding regions. Credits Cell culture, fixing, and DNA amplification were performed by Jonghwan Kim in the Vishy Iyer lab at the University of Texas, Austin. FAIRE was done by Paul Giresi in the Jason Lieb lab at the University of North Carolina at Chapel Hill. Paul Giresi of NimbleGen did the sample labeling and hybridization with the help of Mike Singer and Roland Green. Nan Jiang at NimbleGen supplied the Software used for the permutation analysis. References Buck, M.J., Nobel, A.B., and Lieb, J.D. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 6(11), R97 (2005). Nagy, P.L., Cleary, M.L., Brown, P.O., and Lieb, J.L. Genomewide demarcation of RNA polymerase II transcription units revealed by physical fractionation of chromatin. PNAS 100(11), 6364-9 (2003). encodeUncFairePeaksChipotle FAIRE ChIPOTle University of North Carolina FAIRE Peaks (ChIPOTle) Pilot ENCODE Chromatin Structure encodeUncFairePeaks FAIRE PeakFinder University of North Carolina FAIRE Peaks (PeakFinder) Pilot ENCODE Chromatin Structure encodeUncFaireSignal FAIRE Signal University of North Carolina FAIRE Signal Pilot ENCODE Chromatin Structure phyloPCons28way 28-Way Base Cons Basewise Conservation by PhyloP for 28-Species Multiz Align. Comparative Genomics Description This track shows measures of evolutionary conservation generated using the phyloP (Phylogenetic P-Values) program from the PHAST package. Two measurements are provided: conservation across 28 species, and an alternative measurement restricted to the placental mammal subset (17 species plus human) of the multiple alignment. PhyloP differs from phastCons — which is used to produce the scores in the main Conservation track — in several key ways. The scores produced by phyloP reflect individual alignment columns, and do not take into account conservation at neighboring sites. As a result, the phyloP conservation plot has a less smooth appearance, with more "texture" at individual bases, than the phastCons plot. In addition, this property makes phyloP more appropriate than phastCons for evaluating signatures of selection at particular bases or classes of bases in the genome (e.g., all third codon positions). In addition, phyloP requires fewer assumptions than phastCons, by depending only on a model of neutral evolution, rather than on models of both neutral evolution and negative selection (conservation). Finally, rather than representing probabilities of negative selection and ranging between 0 and 1, the phyloP scores represent -log p-values under a null hypothesis of neutral evolution, and range from 0 to infinity (although in practice there is a maximum achievable value for any particular data set). See the Conservation track description for information about the multiple alignments used as the basis of these conservation measurements. Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. For example, the windowing function option controls how scores are combined across sites, with averaging as the default. This will have a strong effect on how the plot appears when zoomed out. Click the Graph configuration help link for an explanation of the configuration options. Methods Conservation scoring was performed using the phyloP program from the PHAST package. PhyloP is a general method for computing p-values of conservation by comparing estimated numbers of substitutions along the branches of a phylogeny with the distribution expected under neutral evolution (Siepel, Pollard, and Haussler, 2006). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., --subtree was not used). Alignment gaps were treated as missing data. PhyloP relies on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 28way alignment (msa_view). The 4d sites were derived from the Oct 2005 Gencode Reference Gene set, which was filtered to select single-coverage long transcripts. A second, mammalian tree model including only placental mammals was used to generate the placental mammal conservation scoring. Credits This track was created using phyloP, phyloFit, and other programs in PHAST by Adam Siepel's group at Cold Spring Harbor Laboratory (original development done at the Haussler lab at UCSC). The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Siepel A, Pollard KS, Haussler D. New methods for detecting lineage-specific selection. Proc. 10th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '06). 2006. Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 14 Dec 2001;294(5550):2348-51. phyloP28way Vertebrate Cons Vertebrate Basewise Conservation by PhyloP Comparative Genomics phyloP28wayPlacMammal Mammal Cons Placental Mammal Basewise Conservation by PhyloP Comparative Genomics multiz28way 28-Way Cons Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) Comparative Genomics Description This track shows multiple alignments of 28 vertebrate species and two measures of evolutionary conservation -- conservation across all 28 species and an alternative measurement restricted to the placental mammal subset (17 species plus human) of the alignment. These two measurements produce the same results in regions where only mammals appear in the alignment. For other regions, the non-mammalian species can either boost the scores (if conserved) or decrease them (if non-conserved). The mammalian conservation helps to identify sequences that are under different evolutionary pressures in mammalian and non-mammalian vertebrates. The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. The conservation measurements were created using the phastCons package from Adam Siepel at Cold Spring Harbor Laboratory. The species aligned for this track include the reptile, amphibian, bird, and fish clades, as well as marsupial, monotreme (platypus), and placental mammals. Compared to the previous 17-vertebrate alignment, this track includes 11 new species and 6 species with updated sequence assemblies (Table 1). The new species consist of five high-coverage (5-8.5X) assemblies (horse, platypus, lizard, and two teleost fish: stickleback and medaka) and six low-coverage (2X) genome assemblies from mammalian species selected for sampling by NHGRI (bushbaby, tree shrew, guinea pig, hedgehog, common shrew, and cat). The chimp, cow, chicken, frog, fugu, and zebrafish assemblies in this track have been updated from those used in the previous 17-species alignment. UCSC has repeatmasked and aligned the low-coverage genome assemblies, and provides the sequence for download; however, we do not construct genome browsers for them. Missing sequence in the low-coverage assemblies is highlighted in the track display by regions of yellow when zoomed out and Ns displayed at base level (see Gap Annotation, below). OrganismSpeciesRelease dateUCSC version HumanHomo sapiens Mar 2006 hg18 ArmadilloDasypus novemcinctusMay 2005 dasNov1 BushbabyOtolemur garnettiDec 2006 otoGar1 CatFelis catus Mar 2006 felCat3 ChickenGallus gallus May 2006 galGal3 ChimpanzeePan troglodytes Mar 2006 panTro2 CowBos taurus Aug 2006bosTau3 DogCanis familiaris May 2005 canFam2 ElephantLoxodonta africanaMay 2005 loxAfr1 FrogXenopus tropicalis Aug 2005 xenTro2 FuguTakifugu rubripes Oct 2004 fr2 Guinea pigCavia porcellusOct 2005 cavPor2 HedgehogErinaceus europaeusJune 2006 eriEur1 HorseEquus caballus Feb 2007 equCab1 LizardAnolis carolinensis Feb 2007 anoCar1 MedakaOryzias latipes Apr 2006oryLat1 MouseMus musculus Feb 2006 mm8 OpossumMonodelphis domestica Jan 2006 monDom4 PlatypusOrnithorhychus anatinus Mar 2007 ornAna1 RabbitOryctolagus cuniculusMay 2005 oryCun1 RatRattus norvegicus Nov 2004 rn4 RhesusMacaca mulatta Jan 2006 rheMac2 ShrewSorex araneusJune 2006 sorAra1 SticklebackGasterosteus aculeatus Feb 2006 gasAcu1 TenrecEchinops telfairiJuly 2005 echTel1 TetraodonTetraodon nigroviridis Feb 2004 tetNig1 Tree shrewTupaia belangeriDec 2006 tupBel1 ZebrafishDanio rerio Mar 2006 danRer4 Table 1. Genome assemblies included in the 28-way Conservation track. Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Configuration buttons are available to select all of the species (Set all), deselect all of the species (Clear all), or use the default settings (Set defaults). By default, the following 11 species are included in the pairwise display: rhesus, mouse, dog, horse, armadillo, opossum, platypus, lizard, chicken, X. tropicalis (frog), and stickleback. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the Default species for translation; pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen (Table 2). Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Known Geneshuman, mouse, rat RefSeq Geneschimp Ensembl Genesrhesus, opossum, zebrafish, fugu, stickleback mRNAsrabbit, dog, cow, horse, chicken, frog, tetraodon Nonebushbaby, tree shrew, guinea pig, rabbit, shrew, hedgehog, cat, armadillo, elephant, tenrec, platypus, lizard, medaka Table 2. Gene tracks used for codon translation. Methods Pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. Pairwise alignments were then linked into chains using a dynamic programming algorithm that finds maximally scoring chains of gapless subsections of the alignments organized in a kd-tree. The scoring matrix and parameters for pairwise alignment and chaining were tuned for each species based on phylogenetic distance from the reference. High-scoring chains were then placed along the genome, with gaps filled by lower-scoring chains, to produce an alignment net. For more information about the chaining and netting process and parameters for each species, see the description pages for the Chain and Net tracks. An additional filtering step was introduced in the generation of the 28-way conservation track to reduce the number of paralogs and pseudogenes from the high-quality assemblies and the suspect alignments from the low-quality assemblies: the pairwise alignments of high-quality mammalian sequences (placental and marsupial) were filtered based on synteny; those for 2X mammalian genomes were filtered to retain only alignments of best quality in both the target and query ("reciprocal best"). The resulting best-in-genome pairwise alignments were progressively aligned using multiz/autoMZ, following the tree topology diagrammed above, to produce multiple alignments. The multiple alignments were post-processed to add annotations indicating alignment gaps, genomic breaks, and base quality of the component sequences. The annotated multiple alignments, in MAF format, are available for bulk download. An alignment summary table containing an entry for each alignment block in each species was generated to improve track display performance at large scales. Framing tables were constructed to enable visualization of codons in the multiple alignment display. Conservation scoring was performed using the PhastCons package (A. Siepel), which computes conservation based on a two-state phylogenetic hidden Markov model (HMM). PhastCons measurements rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the phastCons package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 28way alignment (msa_view). The 4d sites were derived from the Oct 2005 Gencode Reference Gene set, which was filtered to select single-coverage long transcripts. A second, mammalian tree model including only placental mammals was used to generate the placental mammal conservation scoring. The phastCons parameters were tuned to produce 5% conserved elements in the genome for the vertebrate conservation measurement. This parameter set (expected-length=45, target-coverage=.3, rho=.31) was then used to generate the placental mammal conservation scoring. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Unlike many conservation-scoring programs, note that phastCons does not rely on a sliding window of fixed size; therefore, short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. 2005. PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created using the following programs: Alignment tools: blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC Conservation scoring: PhastCons, phyloFit, tree_doctor, msa_view by Adam Siepel while at UCSC, now at Cold Spring Harbor Laboratory MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC Tree image generator: phyloPng by Galt Barber, UCSC Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle display), and Brian Raney (gap annotation and codon framing) at UCSC The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Phylo-HMMs and phastCons: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York (2005). Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. mostConserved28way 28-Way Most Cons PhastCons Conserved Elements, 28-way Vertebrate Multiz Alignment Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program based on a whole-genome alignment of vertebrates, and for the placental mammal subset of species in the alignment. They are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. For more details see the track description for the Conservation track. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 rho rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Chain/Net Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. phastConsElements28way Vertebrate PhastCons Vertebrate Conserved Elements, 28-way Multiz Alignment Comparative Genomics phastConsElements28wayPlacMammal Mammal PhastCons Placental Mammal Conserved Elements, 28-way Multiz Alignment Comparative Genomics consIndelsHgMmCanFam Cons Indels MmCf Indel-based Conservation for human hg18, mouse mm8 and dog canFam2 Comparative Genomics Description This track displays regions showing evidence for conservation with respect to mutations involving sequence insertions and deletions (indels). These �indel-purified sequences� (IPSs) were obtained by comparing the predictions of a neutral model of indel evolution with data obtained from human (hg18), mouse (mm8) and dog (camFam2) alignments (Lunter et al., 2006) The evidence for conservation is statistical, and each region is annotated with a posterior probability. It may be interpreted as the probability that the segment shows the paucity of indels by selection, rather than by random chance. Apart from the underlying alignment, these data are independent of the conservation of the nucleotide sequence itself. Any inferred conservation of the sequence, e.g. as shown by phastCons, is therefore independent evidence for selection. It may happen that sequence is conserved with respect to indel mutations without concomitant evidence of conservation of the nucleotide sequence. The opposite may also happen. Display Conventions The score (based on the false discovery rate, FDR) is reflected in the bluescale density gradient coloring the track items. Lighter colours reflect a higher FDR. Methods In the absence of selection, indels have a certain predicted distribution over the genome. The actual distribution shows an over-abundance of ungapped regions, due to selection purifying functional sequence from the deleterious effects of indels. Neutrally evolving sequence, such as (by and large) ancestral repeats, show an exceedingly good fit to the neutral predictions. This accurate fit allows the identification of a good proportion of conserved sequence at a relatively low false discovery rate (FDR). For example, at an FDR of 10%, the predicted sensitivity is 75%. Each identified indel-purified sequence (IPS) is annotated by two numbers: a false discovery rate (FDR), and a posterior probability (p). The FDR refers to a set of segments, not a given segment by itself. In this case, it refers to the minimum FDR of all sets that include the segment of interest. For example, a segment annotated with a 10% FDR also belongs to a set with a 15% FDR, but not a set with a 5% FDR. The posterior probability does refer to the single segment by itself. It has a frequentist interpretation, namely, as the proportion of regions, annotated with the same posterior probability, that have been under purifying selection, rather than showing the observed lack of indels by random chance. The data include segments for a false-discovery rate of up to 50%. The score directly reflects the FDR, through the following formula: score = 90 / (FDR + 0.08) This maps FDR 1% (the most restrictive category) to 999, and FDR 10% to 500. For further details of the Methods, see Lunter et al., 2006. Verification The neutral indel model was calibrated using ancestral repeats, against which it showed an excellent fit. Among the identified IPSs at 10% FDR and predicted sensitivity of 75%, we found 75% of annotated protein-coding exons (weighted by length), and 75% of the 222 microRNAs that were annotated at the time. Ancestral repeats were heavily depleted among the identified segments. Credits These data were generated by Gerton Lunter and Chris Ponting, MRC Functional Genetics Unit, University of Oxford, United Kingdom and Jotun Hein, Department of Statistics, University of Oxford, United Kingdom. References Lunter G, Ponting, CP, Hein J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comp Biol. 2006 Jan;2(1):e5. The data may also be browsed here. evoCpg Evo Cpg Weizmann Evolutionary CpG Islands Comparative Genomics Description Evolutionary analysis of CpG-rich regions reveals that several distinct processes generate and maintain CpG islands. One central evolutionary regime resulting in enriched CpG content is driven by low levels of DNA methylation and consequentially low rates of deamination (C → T). Another major force forming CpG islands is biased gene conversion, which stabilizes constitutively methylated CpG islands by balancing rapid deamination with G/C fixation, indirectly increasing the CpG frequency. This track classifies contiguous CpG rich regions according to their inferred evolutionary dynamics. Analysis of different epigenetic marks (DNA methylation and others) should usually be performed separately for the different evolutionary classes. Display Conventions The track shows contiguous (100bp or more) genomic elements with CpG content greater than 3%, color-coded according to their classification of evolutionary dynamics. Green elements represent CpG islands that have low rates of C→T deamination and are typically unmethylated. Red elements represent CpG rich regions that gain G/C quickly and are in many cases constitutively methylated. Blue elements represent CpG rich loci that overlap exons (where stabilization of CpGs can be explained by indirect selective pressure on coding sequence). A probabilistic score for each CpG island indicates the specificity of the evolutionary behavior; positive values indicate hypo-deamination and negative values indicate high rates of G/C gain.The intensity of the CpG island classification score is also represented in the shade of the CpG island element (shades of green for hypodeaminated elements, and shades of red for constitutively methylated islands). Note: CpG islands in chromosomes X and Y and islands that cannot be aligned to other primate genomes are currently ignored. Methods A parameter-rich evolutionary model was used to infer substitution dynamics over genomic bins of 50bp and clustering analysis identified two major types of genomic behaviors (as described in Mendelson Cohen, Kenigsberg and Tanay, Cell 2011). The distributions of evolutionary parameters in each cluster (Figure 3 in the paper) were used to compute a log-odds score for each 50bp genomic bin. Bins with CpG content higher than 3% (smoothed over 500bp) were then assembled into contiguous segments as follows: Adjacent bins from the same cluster were merged. Ambiguously classified bins were merged with any adjacent non-ambiguous bins. Bins of the same class with gaps of up to 50bp were merged. Short intervals (Intervals shorter than 100bp were discarded. All merged intervals were reclassified according to the mean log-odds score spanning the entire interval. The raw inferred evolutionary statistics and cluster distributions are available upon request (amos.tanay@weizmann.ac.il) Credits Thanks to Amos Tanay's lab at the Weizmann Institute of Science for the evolutionary model and classification scheme. References Cohen NM, Kenigsberg E, Tanay A. Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection. Cell. 2011 May 11;145(5):773-786. phastBias phastBias gBGC phastBias gBGC predictions Comparative Genomics Description The phastBias gBGC tracks show regions predicted to be influenced by GC-biased gene conversion (gBGC). gBGC is a process in which GC/AT (strong/weak) heterozygotes are preferentially resolved to the strong allele during gene conversion. This confers an advantage to G and C alleles that mimics positive selection, without conferring any known functional advantage. Therefore, some regions of the genome identified to be under positive selection may be better explained by gBGC. gBGC has also been hypothesized to be an important contributor to variation in GC content and the fixation of deleterious mutations. PhastBias is a prediction method that captures gBGC's signature in multiple-genome alignments: clusters of weak-to-strong substitutions amidst a deficit of strong-to-weak substitutions. Due to the short life of recombination hotspots, phastBias searches for gBGC tracts on a single foreground branch. PhastBias is designed to pick up gBGC tracts of arbitrary length and to be robust to variations in local mutation rate and GC content. It uses a hidden Markov model (HMM) that can be thought of as an extension to the phastCons model. Whereas phastCons predicts conserved elements using an HMM with two states (conserved and neutral), phastBias predicts gBGC tracts using a four-state HMM (conserved, neutral, conserved with gBGC, neutral with gBGC). One of the main parameters of the phastBias model is B, which represents the strength of gBGC and the degree to which weak-to-strong and strong-to-weak substitution rates are skewed on the foreground branch. The tracks presented here were created with B=3, which was chosen for being sensitive while still having a low false positive rate. Simulation experiments suggest that phastBias has reasonable power to pick up tracts with length > 1000bp, and very good power for tracts > 2000bp. Nonetheless, other lines of evidence suggest that phastBias only identifies approximately 25-50% of bases influenced by gBGC, so the tract predictions should not be considered exhaustive. Display Conventions The phastBias tracks display separate predictions for both human and chimp lineages of the phylogenetic tree (from the human-chimp ancestor). For each lineage, two tracks are available: a wiggle showing raw posterior probabilities, and a BED track showing regions predicted to be affected by gBGC. The posterior probability track shows the probability that each base is assigned to either of the gBGC states under the phastBias HMM. The phastBias tracts show regions predicted to be affected by gBGC on a particular lineage. These are simply defined as all regions with posterior probability > 0.5. Methods The phastBias tracks were predicted using the phastBias program, available as part of the PHAST software package. The phastBias tracks represent two separate result sets; one predicting gBGC on the branch leading from the human-chimp ancestor to human, and the other on the opposite branch leading to chimp. The software was run on human-referenced alignments of hg18, panTro2, ponAbe2, and rheMac2, which were extracted from the hg18 44-way multiple alignment. Details are available in Capra et al., 2013 (cited below). Briefly, the gBGC bias parameter B was set to 3, the mean expected tract length was set to 1/1000, and the transition rate into gBGC states was estimated by expectation-maximization. Most other parameter settings were set to the same values used for UCSC's mammalian conservation tracts. Relative branch lengths came from this placental mammal tree model, the conservation scale factor was set to 0.31, expected length of conserved elements to 45, and expected coverage of conserved elements to 0.3. The alignment was split into 10 Mb chunks; for each chunk, a scaling factor for the neutral tree, the transition/transversion rate ratio, and the background base frequencies were re-estimated using the PHAST program phyloFit. The final tracts were filtered to remove elements with length ≥ 5000bp, as these are likely due to artifacts unrelated to gBGC (repeats, alignment error). The method was re-run on hg19 data, extracting hg19, panTro2, rheMac2, and ponAbe2 from the 46-way alignments. The chimp tracks were not re-created for hg19, since interest in them is lower. References Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet. 2013 Aug;9(8):e1003684. PMID: 23966869; PMC: PMC3744432 Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 2011 Jan;12(1):41-51. PMID: 21278375; PMC: PMC3030812 Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285-311. PMID: 19630562 phastBiasTracts Tracts phastBias gBGC predictions Comparative Genomics phastBiasChimpTracts3 chimp tracts phastBias gBGC chimp tracts Comparative Genomics phastBiasTracts3 human tracts phastBias gBGC human tracts Comparative Genomics phastBiasPosteriors Posteriors phastBias gBGC predictions Comparative Genomics phastBiasChimpPosteriors3 chimp posterior phastBias gBGC posterior probability on chimp branch Comparative Genomics phastBiasPosteriors3 human posterior phastBias gBGC posterior probability on human branch Comparative Genomics encodeGencodeGeneMar07 Gencode Genes Mar07 Gencode Gene Annotations (March 2007) Pilot ENCODE Regions and Genes Description The Gencode Genes track (v3.1, March 2007) shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. The gene annotations are colored based on the HAVANA annotation type. See the table below for the color key, as well as more detail about the transcript and feature types. The Gencode project recommends that the annotations with known and validated transcripts; i.e., the types Known and Novel_CDS (which are colored dark green in the track display) be used as the reference gene annotation. The v3.1 release includes the following updates and enhancements to v2.2 (Oct. 2005): Apart from the usual additions and corrections, 69 loci (consisting of 132 transcripts) were re-annotated based on Rapid Amplification of cDNA Ends (RACE), array, and sequencing analyses performed within the Affymetrix/GENCODE collaboration (see the Methods section below, also Denoeud et al., 2007 and The ENCODE Project Consortium, 2007). The polymorphic gene type was added. PolyA features were added. A bug affecting frames of CDSs with missing start or stop codons was fixed. The experimental validation data contained in the Gencode Introns track from the previous release were integrated into the Gencode Genes track by annotators using the Human and Vertebrate Analysis and Annotation manual curation process (HAVANA). Type Color Description Known dark green Known protein-coding genes (i.e., referenced in Entrez Gene) Novel_CDS dark green Have an open reading frame (ORF) and are identical, or have homology, to cDNAs or proteins but do not fall into the above category. These can be known in the sense that they are represented by mRNA sequences in the public databases, but they are not yet represented in Entrez Gene or have not received an official gene name. They can also be novel in that they are not yet represented by an mRNA sequence in human. Novel_transcript light green Similar to Novel_CDS; however, cannot be assigned an unambigous ORF. Putative light green Have identical, or have homology to spliced ESTs, but are devoid of significant ORF and polyA features. These are generally short (two or three exon) genes or gene fragments. TEC light green (To Experimentally Confirm) Single-exon objects (supported by multiple unspliced ESTs with polyA sites and signals). Polymorphic purple Have functional transcripts in one haplotype and "pseudo" (non-functional) transcripts in another. Processed_pseudogene blue Pseudogenes that lack introns and are thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome. Unprocessed_pseudogene blue Pseudogenes that can contain introns, as they are produced by gene duplication. Artifact grey Transcript evidence and/or its translation equivocal. Usually these arise from high-throughput cDNA sequencing projects that submit automatic annotation, sometimes resulting in erroneous CDSs in what turns out to be, for example, 3' UTRs. In addition HAVANA has extended this category to include cDNAs with non-canonical splice sites due to deletion/sequencing errors. PolyA_signal brown Polyadenylation signal PolyA_site orange Polyadenylation site Pseudo_polyA pink "Pseudo"-polyadenylation signal detected in the sequence of a processed pseudogene. Warning: Pseudo_polyA features and processed_pseudogenes generally don't overlap. The reason is that pseudogene annotations are based solely on protein evidence, whereas pseudo_polyA signals are identified from transcript evidence; as they are found at the end of the 3' UTR, they can lie several kb downstream of the 3' end of the pseudogene. The current full set of GENCODE annotations is available for download here. Methods For a detailed description of the methods and references used, see Harrow et al., 2006 and Denoeud et al., 2007. 5' RACE/array experiments A combination of 5’ RACE and high-density tiling microarrays were used to empirically annotate 5’ transcription start sites (TSSs) and internal exons of all 410 annotated protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE freeze). The 5’ RACE reactions were performed with oligonucleotides mapping to a coding exon common to most of the transcripts of a protein-coding gene locus annotated by GENCODE (Oct. 2005 freeze) on polyA+ RNA from twelve adult human tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta) and three cell lines (GM06990 (lymphoblastoid), HL60 (acute promyelocytic leukemia) and HeLaS3 (cervix carcinoma)). The RACE reactions were then hybridized to 20 nucleotide-resolution Affymetrix tiling arrays covering the non-repeated regions of the 44 ENCODE regions. The resulting "RACEfrags" -- array-detected fragments of RACE products -- were assessed for novelty by comparing their genome coordinates to those of GENCODE-annotated exons. Connectivity between novel RACEfrags and their respective index exon were further investigated by RT-PCR, cloning and sequencing. The resulting cDNA sequences (deposited in GenBank under accession numbers DQ655905-DQ656069 and EF070113-EF070122) were then fed into the HAVANA annotation pipeline as mRNA evidence (see "HAVANA manual annotations" below). HAVANA manual annotations The HAVANA process was used to produce these annotations. Before the manual annotation process begins, an automated analysis pipeline for similarity searches and ab initio predictions is run on a computer farm and stored in an Ensembl MySQL database using a modified Ensembl analysis pipeline system. All searches and prediction algorithms, except CpG island prediction (see cpgreport in the EMBOSS application suite), are run on repeat-masked sequence. RepeatMasker is used to mask interspersed repeats, followed by Tandem repeats finder to mask tandem repeats. Nucleotide sequence databases are searched with wuBLASTN, and significant hits are re-aligned to the unmasked genomic sequence using est2genome. The UniProt protein database is searched with wuBLASTX, and the accession numbers of significant hits are found in the Pfam database. The hidden Markov models for Pfam protein domains are aligned against the genomic sequence using Genewise to provide annotation of protein domains. Several ab initio prediction algorithms are also run: Genescan and Fgenesh for genes, tRNAscan to find tRNAgenes and Eponine TSS to predict transcription start sites. Once the automated analysis is complete, the annotator uses a Perl/Tk based graphical interface, "otterlace", developed in-house at the Wellcome Trust Sanger Institute to edit annotation data held in a separate MySQL database system. The interface displays a rich, interactive graphical view of the genomic region, showing features such as database matches, gene predictions, and transcripts created by the annotators. Gapped alignments of nucleotide and protein blast hits to the genomic sequence are viewed and explored using the "Blixem" alignment viewer. Additionally, the "Dotter" dot plot tool is used to show the pairwise alignments of unmasked sequence, thus revealing the location of exons that are occasionally missed by the automated blast searches because of their small size and/or match to repeat-masked sequence. The interface provides a number of tools that the annotator uses to build genes and edit annotations: adding transcripts, exon coordinates, translation regions, gene names and descriptions, remarks and polyadenlyation signals and sites. Verification See Harrow et al., 2006 for information on verification techniques. Credits This GENCODE release is the result of a collaborative effort among the following laboratories: Lab/Institution Contributors HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, Jonathan Mudge, James Gilbert, Tim Hubbard, Jennifer Harrow Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Sylvain Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal Investigator) Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Thomas R. Gingeras References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816. encodeGencodeSuper Gencode Genes Gencode Gene Annotation Pilot ENCODE Regions and Genes Overview This super-track combines related tracks from the GENCODE project. The goal of this project is to identify all protein-coding genes in the ENCODE regions using a pipeline that uses computational predictions, experimental verification, and manual annotation, based on the Sanger Havana process. Gencode Genes Mar07 This track shows gene annotations from the GENCODE release v3.1 (March 2007). These annotations contain updates and corrections to the GENCODE October 2005 annotations, based on validation data from 5' RACE and RT-PCR experiments, which are displayed in the Gencode RACEfrags and Gencode Introns Oct05 tracks. Gencode RACEfrags This track shows the products of 5' RACE reactions performed on GENCODE genes in 12 tissues and 3 cell lines, as assayed on Affymetrix ENCODE 20nt tiling arays. The results were used to annotate 5' transcription start sites and internal exons of all annotated protein-coding loci in the Oct. 2005 GENCODE freeze. Gencode Genes Oct05 This track shows gene annotations from the GENCODE release v2.2 (Oct 2005), which was released as part of the ENCODE October 2005 data freeze. Gencode Introns Oct05 This track shows validation status of the introns in selected gene models from the Gencode Oct 05 gene annotations, as identified by RT-PCR and RACE experiments in 24 human tissues. Credits This GENCODE release is the result of a collaborative effort among the following laboratories: Lab/Institution Contributors HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, Jonathan Mudge, James Gilbert, Tim Hubbard, Jennifer Harrow Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Sylvain Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal Investigator) Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Thomas R. Gingeras The RACEfrags result from a collaborative effort among the following laboratories: Lab/Institution Contributors Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Tyler Alioto, Sylvain Foissac, Robert Castelo, Roderic Guigó Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, James Gilbert, Tim Hubbard, Jennifer Harrow References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto TS, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. encodeGencodeGenePolyAMar07 Gencode PolyA Gencode polyA Features Pilot ENCODE Regions and Genes encodeGencodeGenePseudoMar07 Gencode Pseudo Gencode Pseudogenes Pilot ENCODE Regions and Genes encodeGencodeGenePolymorphicMar07 Gencode Polymorph Gencode Polymorphic Pilot ENCODE Regions and Genes encodeGencodeGenePutativeMar07 Gencode Putative Gencode Putative Genes Pilot ENCODE Regions and Genes encodeGencodeGeneKnownMar07 Gencode Ref Gencode Reference Genes Pilot ENCODE Regions and Genes encodeGencodeRaceFrags Gencode RACEfrags 5' RACE-Array experiments on Gencode loci Pilot ENCODE Regions and Genes Description RACEfrags are the products of 5’ RACE reactions performed on GENCODE genes (using the primers displayed in the subtrack "Gencode 5’ RACE primer") in 12 tissues and 3 cell lines (15 subtracks) followed by hybridization on ENCODE tiling arrays. Each RACEfrag is linked to the 5’ RACE primer but no other connectivity information is available from this experiment. Methods For a detailed description of the methods and references used, see Denoeud et al., 2007. A combination of 5’ RACE and high-density tiling microarrays were used to empirically annotate 5’ transcription start sites (TSSs) and internal exons of all 410 annotated protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE freeze ; Harrow et al., 2006). Oligonucleotides for 5’ RACE experiments were chosen such that they map to a coding exon (the index exon) common to most of the transcripts of protein-coding gene loci annotated by the GENCODE (Oct. 2005 freeze). The 5’ RACE reactions were performed with oligonucleotides mapping to a coding exon (the index exon) on polyA+ RNA from twelve adult human tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta) and three cell lines (GM06990 (lymphoblastoid), HL60 (acute promyelocytic leukemia) and HeLaS3 (cervix carcinoma)). The RACE reactions were then hybridized to 20 nucleotide-resolution Affymetrix tiling arrays covering the non-repeated regions of the 44 ENCODE regions. The resulting "RACEfrags" -- array-detected fragments of RACE products -- were assessed for novelty by comparing their genomic coordinates to those of GENCODE-annotated exons. Verification Connectivity between novel RACEfrags and their respective index exon were investigated by RT-PCR using the 5’ RACE primer as one of the primers, followed by hybridization on tiling arrays. 385 RT-PCR reactions corresponding to 199 GENCODE loci were positive after hybridization on tiling arrays (244 RACE reactions). All positive RT-PCR reactions and a subset of those that were negative in the hybridization experiments were further verified by cloning and sequencing of the RT-PCR products. In most cases, eight clones were selected from each set of RT-PCR products for sequencing. To be retained in the dataset, these sequences must unambiguously map to the correct location, show splicing and pass manual inspection by the HAVANA team. By these criteria, 89 of these RT-PCR reactions (69 GENCODE loci) were positive after cloning and sequencing. (see Denoeud et al., 2007 for further details). The resulting cDNA sequences were deposited in GenBank under accession numbers DQ655905-DQ656069 and EF070113-EF070122. See additional information about the sequences here. Credits The RACEfrags result from a collaborative effort among the following laboratories: Lab/Institution Contributors Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Tyler Alioto, Sylvain Foissac, Robert Castelo, Roderic Guigó Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, James Gilbert, Tim Hubbard, Jennifer Harrow References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816. encodeGencodeRaceFragsHela RACEfrags HeLaS3 Gencode RACEfrags from HeLaS3 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsHL60 RACEfrags HL60 Gencode RACEfrags from HL60 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsGM06990 RACEfrags GM06990 Gencode RACEfrags from GM06990 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsTestis RACEfrags Testis Gencode RACEfrags from Testis Pilot ENCODE Regions and Genes encodeGencodeRaceFragsStomach RACEfrags Stomach Gencode RACEfrags from Stomach Pilot ENCODE Regions and Genes encodeGencodeRaceFragsSpleen RACEfrags Spleen Gencode RACEfrags from Spleen Pilot ENCODE Regions and Genes encodeGencodeRaceFragsSmallIntest RACEfrags Sm Int Gencode RACEfrags from Small Intestine Pilot ENCODE Regions and Genes encodeGencodeRaceFragsPlacenta RACEfrags Placenta Gencode RACEfrags from Placenta Pilot ENCODE Regions and Genes encodeGencodeRaceFragsMuscle RACEfrags Muscle Gencode RACEfrags from Muscle Pilot ENCODE Regions and Genes encodeGencodeRaceFragsLung RACEfrags Lung Gencode RACEfrags from Lung Pilot ENCODE Regions and Genes encodeGencodeRaceFragsLiver RACEfrags Liver Gencode RACEfrags from Liver Pilot ENCODE Regions and Genes encodeGencodeRaceFragsKidney RACEfrags Kidney Gencode RACEfrags from Kidney Pilot ENCODE Regions and Genes encodeGencodeRaceFragsHeart RACEfrags Heart Gencode RACEfrags from Heart Pilot ENCODE Regions and Genes encodeGencodeRaceFragsColon RACEfrags Colon Gencode RACEfrags from Colon Pilot ENCODE Regions and Genes encodeGencodeRaceFragsBrain RACEfrags Brain Gencode RACEfrags from Brain Pilot ENCODE Regions and Genes encodeGencodeRaceFragsPrimer RACEfrags Primer Gencode 5' RACE primer Pilot ENCODE Regions and Genes encodeGencodeGeneOct05 Gencode Genes Oct05 Gencode Gene Annotations (October 2005) Pilot ENCODE Regions and Genes Description The Gencode Gene track shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. A companion track, Gencode Introns, shows experimental gene structure validations for these annotations. The gene annotations are colored based on the Havana annotation type. Known and validated transcripts are colored dark green, putative and unconfirmed are light green, pseudogenes are blue, and artifacts are grey. The transcript types are defined in more detail in the accompanying table. The Gencode project recommends that the annotations with known and validated transcripts; i.e., the types Known, Novel_CDS, Novel_transcript_gencode_conf, and Putative_gencode_conf (which are colored dark green in the track display) be used as the reference annotation. Type Color Description Known dark green Known protein coding genes (referenced in Entrez Gene, NCBI) Novel_CDS dark green Novel protein coding genes annotated by Havana (not referenced in Entrez Gene, NCBI) Novel_transcript_gencode_conf dark green Novel transcripts annotated by Havana (no ORF assigned) with at least one junction validated by RT-PCR Putative_gencode_conf dark green Putative transcripts (similar to "novel transcripts", EST supported, short, no viable ORF) with at least one junction validated by RT-PCR Novel_transcript light green Novel transcripts annotated by Havana (no ORF assigned) not validated by RT-PCR Putative light green Putative transcripts (similar to "novel transcripts", EST supported, short, no viable ORF) not validated by RT-PCR TEC light green Single exon objects (supported by multiple ESTs with polyA sites and signals) undergoing experimental validation/extension. Processed_pseudogene blue Pseudogenes arising via retrotransposition (exon structure of parent gene lost) Unprocessed_pseudogene blue Pseudogenes arising via gene duplication (exon structure of parent gene retained) Artifact grey Transcript evidence and/or its translation equivocal Methods The Human and Vertebrate Analysis and Annotation manual curation process (HAVANA) was used to produce these annotations. Finished genomic sequence was analyzed on a clone-by-clone basis using a combination of similarity searches against DNA and protein databases, as well as a series of ab initio gene predictions. Nucleotide sequence databases were searched with WUBLASTN and significant hits were realigned to the unmasked genomic sequence by EST2GENOME. WUBLASTX was used to search the Uniprot protein database, and the accession numbers of significant hits were retrieved from the Pfam database. Hidden Markov models for Pfam protein domains were aligned against the genomic sequence using Genewise to provide annotation of protein domains. A number of ab initio prediction algorithms were also run: Genscan and Fgenesh for genes, tRNAscan to find tRNA genes, and Eponine TSS for transcription start site predictions. The annotators used the (AceDB-based) Otterlace interface to create and edit gene objects, which were then stored in a local database named Otter. In cases where predicted transcript structures from Ensembl are available, these can be viewed from within the Otterlace interface and may be used as starting templates for gene curation. Annotation in the Otter database is submitted to the EMBL/Genbank/DDBJ nucleotide database. Verification The gene objects selected for verification came from various computational prediction methods and HAVANA annotations. RT-PCR and RACE experiments were performed on them, using a variety of human tissues, to confirm their structure. Human cDNAs from 24 different tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta, skin, peripheral blood leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal heart, fetal lung, thymus, pancreas, mammary gland, prostate) were synthesized using 12 poly(A)+ RNAs from Origene, eight from Clemente Associates/Quantum Magnetics and four from BD Biosciences as described in [Reymond et al., 2002a,b]. The relative amount of each cDNA was normalized by quantitative PCR using SyberGreen as intercalator and an ABI Prism 7700 Sequence Detection System. Predictions of human genes junctions were assayed experimentally by RT-PCR as previously described and modified [Reymond, 2002b; Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. Similar amounts of Homo sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and four ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The ten first cycles of PCR amplification were performed with a touchdown annealing temperatures decreasing from 60 to 50°C; annealing temperature of the next 30 cycles was carried out at 50°C. Amplimers were separated on "Ready to Run" precast gels (Pharmacia) and sequenced. RACE experiments were performed with the BD SMART RACE cDNA Amplification Kit following the manufacturer instructions (BD Biosciences). Credits Click here for a complete list of people who participated in the GENCODE project. References Ashurst, J.L. et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 33 (Database Issue), D459-65 (2005). Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520-62 (2002). Reymond, A. et al. Human chromosome 21 gene expression atlas in the mouse. Nature 420(6915), 582-6 (2002). Reymond, A. et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 79(6), 824-32 (2002). encodeGencodeGenePseudoOct05 Gencode Pseudo Gencode Pseudogenes Pilot ENCODE Regions and Genes encodeGencodeGenePutativeOct05 Gencode Putative Gencode Putative Genes Pilot ENCODE Regions and Genes encodeGencodeGeneKnownOct05 Gencode Ref Gencode Reference Genes Pilot ENCODE Regions and Genes encodeGencodeIntronOct05 Gencode Introns Oct05 Gencode Intron Validation (October 2005) Pilot ENCODE Regions and Genes Description The Gencode Intron Validation track shows gene structure validations generated by the GENCODE project. This track serves as a companion to the Gencode Genes track. The items in this track are colored based on the validation status determined via RT-PCR of exons flanking the intron: Status Color Validation Result RT_positive green Intron validated (RT-PCR product corresponds to expected junction) RACE_validated green Intron validated (RACE product corresponds to expected junction) RT_negative red Intron not validated (no RT-PCR product was obtained) RT_wrong_junction gold Intron not validated, but another junction exists between the two (RT-PCR product does not correspond to the expected junction) Methods Selected gene models from the Genecode Genes track were picked for RT-PCR and RACE verification experiments. RT-PCR and RACE experiments were performed on the objects, using a variety of human tissues, to confirm their structure. Human cDNAs from 24 different tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta, skin, peripheral blood leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal heart, fetal lung, thymus, pancreas, mammary gland, prostate) were synthesized using twelve poly(A)+ RNAs from Origene, eight from Clemente Associates/Quantum Magnetics and four from BD Biosciences as described in [Reymond et al., 2002a,b]. The relative amount of each cDNA was normalized with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) by quantitative PCR using SyberGreen as intercalator and an ABI Prism 7700 Sequence Detection System. Predictions of human genes junctions were assayed experimentally by RT-PCR as previously described and modified [Reymond, 2002b; Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. Similar amounts of Homo sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and 4 ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The ten first cycles of PCR amplification were performed with a touchdown annealing temperatures decreasing from 60 to 50°C; annealing temperature of the next 30 cycles was carried out at 50°C. Amplimers were separated on "Ready to Run" precast gels (Pharmacia) and sequenced. RACE experiments were performed with the BD SMART RACE cDNA Amplification Kit following the manufacturer instructions (BD Biosciences). Credits Click here for a complete list of people who participated in the GENCODE project. References Ashurst, J.L. et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 33 (Database Issue), D459-65 (2005). Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520-62 (2002). Reymond, A. et al. Human chromosome 21 gene expression atlas in the mouse. Nature 420(6915), 582-6 (2002). Reymond, A. et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 79(6), 824-32 (2002). encodeEgaspFull EGASP Full ENCODE Gene Prediction Workshop (EGASP) All ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows full sets of gene predictions covering all 44 ENCODE regions originally submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005. The following gene predictions are included: AceView DOGFISH-C Ensembl Exogean ExonHunter Fgenesh Pseudogenes Fgenesh++ GeneID-U12 GeneMark JIGSAW Pairagon/N-SCAN SGP2-U12 SPIDA Twinscan-MARS The EGASP Partial companion track shows original gene prediction submissions for a partial set of the 44 ENCODE regions; the EGASP Update track shows updated versions of the submitted predictions. These annotations were originally produced using the hg17 assembly. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track are displayed in a separate subtrack. See the top of the track description page for configuration options allowing display of selected subsets of gene predictions. To remove a subtrack from the display, uncheck the appropriate box. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. Display characteristics specific to individual subtracks are described in the Methods section. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods AceView These annotations were generated using AceView. All mRNAs and cDNAs available in GenBank, excluding NMs, were co-aligned on the Gencode sections. The results were then examined and filtered to resemble Havana. The very restrictive view of Havana on CDS was not reproduced, due to a lack of experimental data. DOGFISH-C Candidate splice sites and coding starts/stops were evaluated using DNA alignments between the human assembly and seven other vertebrate species (UCSC multiz alignments, adding the frog and removing the chimp). Genes (single transcripts only) were then predicted using dynamic programming. Ensembl The Ensembl annotation includes two types of predictions: protein-coding genes (the Ensembl Gene Predictions subtrack) and pseudogenes of protein-coding genes (the Ensembl Pseudogene Predictions subtrack). The Ensembl Pseudo track is not intended as a comprehensive annotation of pseudogenes, but rather an attempt to identify and label those gene predictions made by the Ensembl pipeline that have pseudogene characteristics. Exons that lie partially outside the ENCODE region are not included in the data set. The "Alternate Name" field on the subtrack details page shows the Ensembl ID for the selected gene or transcript. ExonHunter ExonHunter is a comprehensive gene-finder based on hidden Markov models (HMMs) allowing the use of a variety of additional sources of information (ESTs, proteins, genome-genome comparisons). Exogean Exogean annotates protein coding genes by combining mRNA and cross-species protein alignments in directed acyclic colored multigraphs where nodes and edges respectively represent biological objects and human expertise. Additional predictions and methods for this subtrack are available in the EGASP Updates track. Fgenesh Pseudogenes Fgenesh is an HMM gene structure prediction program. This data set shows predictions of potential pseudogenes. Fgenesh++ These gene predictions were generated by Fgenesh++, a gene-finding program that uses both HMMs and protein similarity to find genes in a completely automated manner. GeneID-U12 The GeneID-U12 gene prediction set, generated using a version of GeneID modified to detect U12-dependent introns (both GT-AG and AT-AC subtypes) when present, employs a single-genome ab initio method. This modified version of GeneID uses matrices for U12 donor, acceptor and branch sites constructed from examples of published U12 intron splice junctions (both experimentally confirmed and expressed-sequence-validated predictions). Two GeneID-U12 subtracks are included: GeneID Gene Predictions and GeneID U12 Intron Predictions. The U12 splice sites for features in the U12 Intron Predictions track are displayed on the track details pages. Additional predictions and methods for this subtrack are available in the EGASP Updates track. GeneMark The eukaryotic version of the GeneMark.hmm (release 2.2) gene prediction program utilizes the HMM statistical model with duration or hidden semi-Markov model (HSMM). The HMM includes hidden states for initial, internal and terminal exons, introns, intergenic regions and single exon genes. It also includes the "border" states, such as start site (initiation codon), stop site (termination codons), and donor and acceptor splice sites. Sequences of all protein-coding regions were modeled by three periodic inhomogeneous Markov chains; sequences of non-coding regions were modeled by homogeneous Markov chains. Nucleotide sequences corresponding to the site states were modeled by position-specific inhomogeneous Markov chains. Parameters of the gene models were derived from the set of genes obtained by cDNA mapping to genomic DNA. To reflect variations in G+C composition of the genome, the gene model parameters were estimated separately for the three G+C regions. JIGSAW JIGSAW uses the output from gene-finders, splice-site prediction programs and sequence alignments to predict gene models. Annotation data downloaded from the UCSC Genome Browser and TIGR gene-finder output was used as input for these predictions. JIGSAW predicts both partial and complete genes. Additional predictions and methods for this subtrack are available in the EGASP Updates track. Pairagon/N-SCAN The pairHMM-based alignment program, Pairagon, was used to align high-quality mRNA sequences to the ENCODE regions. These were supplemented with N-SCAN EST predictions which are displayed in the Pairgn/NSCAN-E subtrack, and extended further with additional transcripts from the Brent Lab to produce the predictions displayed as the Pairgn/NSCAN-E/+ subtrack. The NSCAN subtrack contains only predictions from the N-SCAN program. SGP2-U12 The SGP2-U12 gene prediction set, generated using a version of GeneID modified to detect U12-dependent introns (both AT-AC and GT-AG subtypes) when present, employs a dual-genome method (SGP2) that utilizes similarity (tblastx) to mouse genomic sequence syntenic to the ENCODE regions (Oct. 2004 MSA freeze). This modified version of GeneID uses matrices for U12 donor, acceptor and branch sites constructed from examples of published U12 intron splice junctions (both experimentally confirmed and expressed-sequence-validated predictions). Two SGP2-U12 subtracks are included: SGP2 Gene Predictions and SGP2 U12 Intron Predictions. The U12 splice sites for features in the U12 Intron Predictions track are displayed on the track details pages. Additional predictions and methods for this subtrack are available in the EGASP Updates track. SPIDA This exon-only prediction set was produced using SPIDA (Substitution Periodicity Index and Domain Analysis). Exons derived by mapping ESTs to the genome were validated by seeking periodic substitution patterns in the aligned informant DNA sequences. First, all available ESTs were mapped to the genome using Exonerate. The resulting transcript structures were "flattened" to remove redundancy. Each exon of the flattened transcripts was subjected to SPI analysis, which involves identifying periodicity in the pattern of mutations occurring between the human and an informant species DNA sequence (the informant sequences and their TBA alignments were provided by Elliott Margulies). SPI was calculated for all available human-informant pairs for whole exons and in a sliding 48 bp window. SPI analysis requires that a threshold level of periodicity be identified in at least two of the informant species if the exon is to be accepted. If accepted, SPI provides the correct frame for translation of the exon. This exon was used as a starting point for extending the ORF coding region of the flattened transcript from which it came. This gave a full or partial CDS; different exons may give different CDSs. The CDSs were translated and searched for domains using hmmpfam and Pfam_fs. Only transcripts with a domain hit with e > 1.0 were retained. Heuristics were applied to the retained CDSs to identify problems with the transcript structure, particularly frame-shifts. Many transcripts may identify the same exon, but only a single instance of each exon has been retained. Twinscan-MARS This gene prediction set was produced by a version of Twinscan that employs multiple pairwise genome comparisons to identify protein-coding genes (including alternative splices) using nucleotide homology information. No expression or protein data were used. Credits The following individuals and institutions provided the data for the subtracks in this annotation: AceView: Danielle and Jean Thierry-Mieg, NCBI, National Institutes of Health. DOGFISH-C: David Carter, Informatics Dept., Wellcome Trust Sanger Institute. Ensembl: Stephen Searle, Wellcome Trust Sanger Institute (joint Sanger/EBI project). Exogean: Sarah Djebali, Dyogen Lab, Ecole Normale Sup�rieure (Paris, France). ExonHunter: Tomas Vinar, Waterloo Bioinformatics, School of Computer Science, University of Waterloo. Fgenesh, Fgenesh++: Victor Solovyev, Department of Computer Science, Royal Holloway, London University. GeneID-U12, SGP2-U12: Tyler Alioto, Grup de Recerca en Inform�tica Biom�dica (GRIB) at the Institut Municipal d'Investigaci� M�dica (IMIM), Barcelona. GeneMark: Mark Borodovsky, Alex Lomsadze and Alexander Lukashin, Department of Biology, Georgia Institute of Technology. JIGSAW: Jonathan Allen, Steven Salzberg group, The Institute for Genomic Research (TIGR) and the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. Pairagon/N-SCAN: Randall Brown, Laboratory for Computational Genomics, Washington University in St. Louis. SPIDA: Damian Keefe, Birney Group, EMBL-EBI. Twinscan: Paul Flicek, Brent Lab, Washington University in St. Louis. encodeEgaspSuper EGASP ENCODE Gene Prediction Workshop (EGASP) Pilot ENCODE Regions and Genes Overview This super-track combines related tracks from the ENCODE Gene Annotation Assessment Project (EGASP) 2005 Gene Prediction Workshop. The goal of the workshop was to evaluate automatic methods for gene annotation of the human genome, with a focus on protein-coding genes. Predictions were evaluated in terms of their ability to reproduce the high-quality manually assisted GENCODE gene annotations and to predict novel transcripts. The EGASP Full track shows gene predictions covering all 44 ENCODE regions submitted before the GENCODE annotations were released. The EGASP Partial track shows gene predictions that cover some of the ENCODE regions, submitted before the GENCODE release. The EGASP Update track shows gene predictions that cover all ENCODE regions, submitted after the GENCODE release. These annotations were originally produced using the hg17 assembly. The following gene predictions are included: ACEScan AceView DOGFISH-C Ensembl Exogean ExonHunter Fgenesh Pseudogenes Fgenesh++ GeneID-U12 GeneMark GeneZilla JIGSAW Pairagon/N-SCAN SAGA SGP2-U12 SPIDA Twinscan-MARS Yale pseudogenes Credits Click here for a complete list of people who participated in the GENCODE project. The following individuals and institutions provided the data for the subtracks in this annotation: AceView: Danielle and Jean Thierry-Mieg, NCBI, National Institutes of Health. DOGFISH-C: David Carter, Informatics Dept., Wellcome Trust Sanger Institute. Ensembl: Stephen Searle, Wellcome Trust Sanger Institute (joint Sanger/EBI project). Exogean: Sarah Djebali, Dyogen Lab, Ecole Normale Sup�rieure (Paris, France). ExonHunter: Tomas Vinar, Waterloo Bioinformatics, School of Computer Science, University of Waterloo. Fgenesh, Fgenesh++: Victor Solovyev, Department of Computer Science, Royal Holloway, London University. GeneID-U12, SGP2-U12: Tyler Alioto, Grup de Recerca en Inform�tica Biom�dica (GRIB) at the Institut Municipal d'Investigaci� M�dica (IMIM), Barcelona. GeneMark: Mark Borodovsky, Alex Lomsadze and Alexander Lukashin, Department of Biology, Georgia Institute of Technology. JIGSAW: Jonathan Allen, Steven Salzberg group, The Institute for Genomic Research (TIGR) and the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. Pairagon/N-SCAN: Randall Brown, Laboratory for Computational Genomics, Washington University in St. Louis. SPIDA: Damian Keefe, Birney Group, EMBL-EBI. Twinscan: Paul Flicek, Brent Lab, Washington University in St. Louis. ACEScan: Gene Yeo, Crick-Jacobs Center for Computational Biology, Salk Institute. Augustus: Mario Stanke, Department of Bioinformatics, University of G�ttingen, Germany. GeneZilla: William Majoros, Dept. of Bioinformatics, The Institute for Genomic Research (TIGR). SAGA: Sourav Chatterji, Lior Pachter lab, Department of Mathematics, U.C. Berkeley. References Ashurst JL, Chen CK, Gilbert JG, Jekosch K, Keenan S, Meidl P, Searle SM, Stalker J, Storey R, Trevanion S et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D459-65. Guigo R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF, Keibler E, Lyle R, Ucla C et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002 Dec 5;420(6915):520-62. Reymond A, Marigo V, Yaylaoglu MB, Leoni A, Ucla C, Scamuffa N, Caccioppoli C, Dermitzakis ET, Lyle R, Banfi S et al. Human chromosome 21 gene expression atlas in the mouse. Nature. 2002 Dec 5;420(6915):582-6. Reymond A, Camargo AA, Deutsch S, Stevenson BJ, Parmigiani RB, Ucla C, Bettoni F, Rossier C, Lyle R, Guipponi M et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics. 2002 Jun;79(6):824-32. Chatterji S, Pachter L. Multiple organism gene finding by collapsed Gibbs sampling. J Comput Biol. 2005 Jul-Aug;12(6):599-608. Siepel A, Haussler D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology. 2004;177-186. Augustus Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl. 2):ii215-ii225. Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. FGenesh++ Solovyev VV. "Statistical approaches in Eukaryotic gene prediction". In Handbook of Statistical Genetics (eds. Balding D et al.) (John Wiley & Sons, Inc., 2001). p. 83-127. GeneID Blanco E, Parra G, Guig� R. "Using geneid to identify genes". In Current Protocols in Bioinformatics, Unit 4.3. (eds. Baxevanis AD.) (John Wiley & Sons, Inc., 2002). Guig� R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 1998 Winter;5(4):681-702. Guig� R, Knudsen S, Drake N, Smith T. Prediction of gene structure. J Mol Biol. 1992 Jul 5;226(1):141-57. Parra G, Blanco E, Guig� R. GeneID in Drosophila. Genome Res. 2000 Apr;10(4):511-5. JIGSAW Allen JE, Pertea M, Salzberg SL. Computational gene prediction using multiple sources of evidence. Genome Res. 2004 Jan;14(1):142-8. Allen JE, Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics. 2005 Sep 15;21(18):3596-603. SGP2 Guig� R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF, Keibler E, Lyle R, Ucla C et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. Parra G, Agarwal P, Abril JF, Wiehe T, Fickett JW, Guig� R. Comparative gene prediction in human and mouse. Genome Res. 2003 Jan;13(1):108-17. encodeEgaspFullTwinscan Twinscan Twinscan Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSpida SPIDA Exons SPIDA Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSgp2U12 SGP2 U12 SGP2 U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSgp2 SGP2 SGP2 Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonMultiple NSCAN N-SCAN Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonAny Pairgn/NSCAN-E/+ Pairagon/NSCAN Any Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonMrna Pairgn/NSCAN-E Pairagon/NSCAN-EST Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullJigsaw Jigsaw Jigsaw Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGenemark GeneMark GeneMark Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGeneIdU12 GeneID U12 GeneID U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGeneId GeneID GeneID Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSoftberryPseudo Fgenesh Pseudo Fgenesh Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullFgenesh Fgenesh++ Fgenesh++ Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullExonhunter ExonHunter ExonHunter Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullExogean Exogean Exogean Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullEnsemblPseudo Ensembl Pseudo Ensembl Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullEnsembl Ensembl Ensembl Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullDogfish DOGFISH-C DOGFISH-C Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullAceview AceView AceView Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartial EGASP Partial ENCODE Gene Prediction Workshop (EGASP) for Partial ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows gene predictions submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005 that cover only a partial set of the 44 ENCODE regions. The partial set excludes the 13 ENCODE regions for which high-quality annotations were released in late 2004. The following gene predictions are included: ACEScan Augustus GeneZilla SAGA The EGASP Full companion track shows original gene prediction submissions for the full set of 44 ENCODE regions using Gene Prediction algorithms other than those used here; the EGASP Update track shows updated versions of some of the submitted predictions. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track is displayed in a separate subtrack. See the top of the track description page for a complete list of the subtracks available for this annotation. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods ACEScan ACEScan (Alternative Conserved Exons Scan) indicates alternative splicing that is evolutionarily conserved in human and mouse/rat. The Conserved Alternative Exon Predictions subtrack shows predicted alternative conserved exons. The Unconserved Alternative and Constitutive Exon Predictions subtrack shows exons that are predicted to be constitutive or may have species-specific alternative splicing. Augustus Augustus uses a generalized hidden Markov model (GHMM) that models coding and non-coding sequence, splice sites, the branch point region, translation start and end, and lengths of exons and introns. The track contains four different sets of predictions. Ab initio single genome predictions are based solely on the input sequence. EST and protein evidence predictions were generated using AGRIPPA hints based on alignments of human sequence from the dbEST and nr databases. Mouse homology gene predictions were produced using mouse genomic sequence only; BLAST, CHAOS, DIALIGN were used to generate the hints for Augustus. The combined EST/protein evidence and mouse homology gene predictions were created using human sequence from the dbEST and nr databases and mouse genomic sequence to generate hints for Augustus. Additional predictions and methods for this subtrack are available in the EGASP Updates track. GeneZilla GeneZilla is a program for the computational prediction of protein-coding genes in eukaryotic DNA, based on the generalized hidden Markov model (GHMM) framework. These predictions were generated using GeneZilla and IsoScan, which uses a four-state hidden Markov model to predict isochores (regions of homogeneous G+C content) in genomic DNA. SAGA SAGA is an ab initio multiple-species gene-finding program based on the Gibbs sampling-based method described in Chatterji et al. (2004). In addition to sampling parameters, SAGA also uses a phyloHMM based model to boost the scores, similar to the method described in Siepel et al. (2004). Credits The gene prediction data sets were submitted by the following individuals and institutions: ACEScan: Gene Yeo, Crick-Jacobs Center for Computational Biology, Salk Institute. Augustus: Mario Stanke, Department of Bioinformatics, University of G�ttingen, Germany. GeneZilla: William Majoros, Dept. of Bioinformatics, The Institute for Genomic Research (TIGR). SAGA: Sourav Chatterji, Lior Pachter lab, Department of Mathematics, U.C. Berkeley. References Chatterji, S. and Pachter, L. Multiple organism gene finding by collapsed Gibbs sampling. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 187-193 (2004). Siepel, A. and Haussler, D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 177-186 (2004). encodeEgaspPartSaga SAGA SAGA Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartGenezilla GeneZilla GeneZilla Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusAny Augustus/EST/Mouse Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusDual Augustus/Mouse Augustus + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusEst Augustus/EST Augustus + EST/Protein Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusAbinitio Augustus Augustus Ab Initio Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAceOther ACEScan Other ACEScan Unconserved Alternative and Constitutive Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAceCons ACEScan Cons Alt ACEScan Conserved Alternative Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdate EGASP Update ENCODE Gene Prediction Workshop (EGASP) Updates Pilot ENCODE Regions and Genes Description This track shows updated versions of gene predictions submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005. The following gene predictions are included: Augustus Exogean FGenesh++ GeneID-U12 Jigsaw SGP2-U12 Yale pseudogenes The original EGASP submissions are displayed in the companion tracks, EGASP Full and EGASP Partial. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track are displayed in separate subtracks. See the top of the track description page for a complete list of the subtracks available for this annotation. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. Display characteristics specific to individual subtracks are described in the Methods section. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods Augustus Augustus uses a generalized hidden Markov model (GHMM) that models coding and non-coding sequence, splice sites, the branch point region, the translation start and end, and the lengths of exons and introns. This version has been trained on a set of 1284 human genes. The track contains four sets of predictions: ab initio, EST and protein-based, mouse homology-based, and those using EST/protein and mouse homology evidence as additional input to Augustus for the predictions. The EST and protein evidence was generated by aligning sequences from the dbEST and nr databases to the ENCODE region using wublastn and wublastx. The resulting alignments were used to generate hints about putative splice sites, exons, coding regions, introns, translation start and translation stop. The mouse homology evidence was generated by aligning pairs of human and mouse genomic sequences using the program DIALIGN. Regions conserved at the peptide level were used to generate hints about coding regions. Exogean Exogean produces alternative transcripts by combining mRNA and cross-species sequence alignments using heuristic rules. The program implements a generic framework based on directed acyclic colored multigraphs (DACMs). In Exogean, DACM nodes represent biological objects (mRNA or protein HSPs/transcripts) and multiple edges between nodes represent known relationships between these objects derived from human expertise. Exogean DACMs are succesively built and reduced, leading to increasingly complex objects. This process enables the production of alternative transcripts from initial HSPs. FGenesh++ FGenesh++ predictions are based on hidden Markov models and protein similarity to the NR database. For more information, see the reference below. GeneID-U12 The GeneID program predicts genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using position weight arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites plus the the log-likelihood ratio of a Markov model for coding DNA. Finally, the gene structure is assembled from the set of predicted exons, maximizing the sum of the scores of the assembled exons. The modified version of GeneID used to generate the predictions in this track incorporates models for U12-dependent splice signals in addition to U2 splice signals. The GeneID subtrack shows all GeneID genes. Only U12 introns and their flanking exons are displayed in the GeneID U12 subtrack. Exons flanking predicted U12-dependent introns are assigned a type attribute reflecting their splice sites, displayed on the details page of the GeneID U12 subtrack as the "Alternate Name" of the item composed of the intron plus flanking exons. Jigsaw Jigsaw is a gene prediction program that determines genes based on target genomic sequence and output from a gene structure annotation database. Data downloaded from UCSC's annotation database is used as input and includes the following tracks of evidence: Known Genes, Ensembl, RefSeq, GeneID, Genscan, SGP, Twinscan, Human mRNAs, TIGR Gene Index, UniGene, Most Conserved Elements and Non-human RefSeq Genes. GlimmerHMM and GeneZilla, two open source ab initio gene-finding programs based on GHMMs, are also used. SGP2-U12 To predict genes in a genomic query, SGP2 combines GeneID predictions with tblastx comparisons of the genomic query against other genomic sequences. This modified version of SGP2 uses models for U12-dependent splice signals in addition to U2 splice signals. The reference genomic sequence for this data set is the Oct. 2004 release of mouse sequence syntenic to ENCODE regions. The SGP2 and SGP2 U12 tracks follow the same display conventions as the GeneID and GeneID U12 subtracks described above. Yale Pseudogenes For this analysis, pseudogenes were defined as genomic sequences similar to known human genes and with various disablements (premature stop codons or frameshifts) in their "putative" protein-coding regions. The protein sequences of known human genes (as annotated by ENSEMBL) were used to search for similar nongenic sequences in ENCODE regions. The matching sequences were assessed as disabled copies of genes based on the occurrences of premature stop codons or frameshifts. The intron-exon structure of the functional gene was further used to infer whether a pseudogene was duplicated or processed (a duplicated pseudogene keeps the intron-exon structure of its parent functional gene). Small pseudogene sequences were labeled as fragments or other types. All pseudogenes in this track were manually curated. In the browser, the track details page shows the pseudogene type. Credits Augustus was written by Mario Stanke at the Department of Bioinformatics of the University of G�ttingen in Germany. Exogean was developed by Sarah Djebali and Hugues Roest Crollius from the Dyogen Lab, Ecole Normale Sup�rieure (Paris, France) and Franck Delaplace from the Laboratoire de M�thodes Informatiques (LaMI), (Evry, France). The FGenesh++ gene predictions were provided by Victor Solovyev of Softberry Inc. The GeneID-U12 and SGP2-U12 programs were developed by the Grup de Recerca en Inform�tica Biom�dica (GRIB) at the Institut Municipal d'Investigaci� M�dica (IMIM) in Barcelona. The version of GeneID on which GeneID-U12 is based (geneid_v1.2) was written by Enrique Blanco and Roderic Guig�. The parameter files were constructed by Genis Parra and Francisco Camara. Additional contributions were made by Josep F. Abril, Moises Burset and Xavier Messeguer. Modifications to GeneID that allow for the prediction of U12-dependent splice sites and incorporation of U12 introns into gene models were made by Tyler Alioto. Jigsaw was developed at The Institute for Genomic Research (TIGR) by Jonathan Allen and Steven Salzberg, with computational gene-finder contributions from Mihaela Pertea and William Majoros. Continued maintenance and development of Jigsaw will be provided by the Salzberg group at the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. The Yale Pseudogenes were generated by the pseudogene annotation group of Mark Gerstein at Yale University. References Augustus Stanke, M. Gene prediction with a hidden Markov model. Ph.D. thesis, Universität Göttingen, Germany (2004). Stanke, M. and Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics, 19(Suppl. 2), ii215-ii225 (2003). Stanke, M., Steinkamp, R., Waack, S. and Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucl. Acids Res., 32, W309-W312 (2004). FGenesh++ Solovyev V.V. "Statistical approaches in Eukaryotic gene prediction". In Handbook of Statistical Genetics (eds. Balding D. et al.) (John Wiley & Sons, Inc., 2001). p. 83-127. GeneID Blanco, E., Parra, G. and Guig�, R. "Using geneid to identify genes". In Current Protocols in Bioinformatics, Unit 4.3. (ed. Baxevanis, A.D.) (John Wiley & Sons, Inc., 2002). Guig�, R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 5(4), 681-702 (1998). Guig�, R., Knudsen, S., Drake, N. and Smith, T. Prediction of gene structure. J Mol Biol. 226(1), 141-57 (1992). Parra, G., Blanco, E. and Guig�, R. GeneID in Drosophila. Genome Research 10(4), 511-515 (2000). Jigsaw Allen, J.E., Pertea, M. and Salzberg, S.L. Computational gene prediction using multiple sources of evidence. Genome Res., 14(1), 142-8 (2004). Allen, J.E. and Salzberg, S.L. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18), 3596-3603 (2005). SGP2 Guig�, R., Dermitzakis, E.T., Agarwal, P., Ponting, C.P., Parra, G., Reymond, A., Abril, J.F., Keibler, E., Lyle, R., Ucla, C. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W. and Guig�, R. Comparative gene prediction in human and mouse. Genome Res. 13(1), 108-17 (2003). encodeEgaspUpdYalePseudo Yale Pseudo Upd Yale Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdSgp2U12 SGP2 U12 Update SGP2 U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdSgp2 SGP2 Update SGP2 Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdJigsaw Jigsaw Update Jigsaw Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdGeneIdU12 GeneID U12 Upd GeneID U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdGeneId GeneID Update GeneID Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdFgenesh FGenesh++ Upd Fgenesh++ Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdExogean Exogean Update Exogean Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusAny August/EST/Ms Upd Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusDual August/Mouse Upd Augustus + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusEst Augustus/EST Upd Augustus + EST/Protein Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusAbinitio Augustus Update Augustus Ab Initio Gene Predictions Pilot ENCODE Regions and Genes encodeAffyRnaSignal Affy RNA Signal Affymetrix PolyA+ RNA Signal Pilot ENCODE Transcription Description This track shows an estimate of RNA abundance (transcription) for all ENCODE regions for several cell types. Retinoic acid-stimulated HL-60 cells were harvested after 0, 2, 8, and 32 hours. Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, was hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Composite signals are shown in separate subtracks for each cell type and for each of the four timepoints for RA-stimulated HL-60. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different cell types and timepoints. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 101 bp window centered on each probe, an estimate of RNA abundance (signal) was found by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods; Cawley et al. also describes the analytical methods. Verification Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 40 base-pairs (see the Affy TransFrags track for the merged regions). A random subset of transfrags were verified by RACE where the RACE primers were designed based on the sequences of the transfrags. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and the Kevin Struhl group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyRnaSuper Affy RNA Affymetrix PolyA+ RNA Pilot ENCODE Transcription Overview This super-track combines related tracks of transcriptome data generated by the Affymetrix/Harvard ENCODE collaboration. These tracks show an estimate of RNA abundance (transcription) and the locations of sites showing transcription for all ENCODE regions for various cell types, including HL-60 (leukemia), GM06990 (lymphoblastoid), and HeLa (cervical carcinoma). RNA was isolated at multiple time points after drug treatment, and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays. Data are displayed as signals (transcript abundance) and transfrags (sites of transcription). Data for biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Credits These data were generated and analyzed by a collaboration of the Tom Gingeras group at Affymetrix and the Kevin Struhl lab at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, and Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyRnaHl60SignalHr32 Affy RNA RA 32h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr08 Affy RNA RA 8h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr02 Affy RNA RA 2h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr00 Affy RNA RA 0h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHeLaSignal Affy RNA HeLa Affymetrix PolyA+ RNA (HeLa) Signal Pilot ENCODE Transcription encodeAffyRnaGm06990Signal Affy RNA GM06990 Affymetrix PolyA+ RNA (GM06990) Signal Pilot ENCODE Transcription encodeAffyRnaTransfrags Affy Transfrags Affymetrix PolyA+ RNA Transfrags Pilot ENCODE Transcription Description This track shows the location of sites showing transcription for all ENCODE regions in several cell types, using Affymetrix arrays. Retinoic acid-stimulated HL-60 cells were harvested after 0, 2, 8, and 32 hours. Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, was hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Clustered sites are shown in separate subtracks for each cell type and for each of the four timepoints for RA-stimulated HL-60. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different cell types and timepoints. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 101 bp window centered on each probe, an estimate of RNA abundance (signal) was found by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods; Cawley et al. also describes the analytical methods. Verification Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions (see the Affy RNA Signal track) were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 40 base-pairs. A random subset of transfrags were verified by RACE where the RACE primers were designed based on the sequences of the transfrags. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and the Kevin Struhl group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyRnaHl60SitesHr32 Affy RNA RA 32h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr08 Affy RNA RA 8h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr02 Affy RNA RA 2h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr00 Affy RNA RA 0h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHeLaSites Affy RNA HeLa Affymetrix PolyA+ RNA (HeLa) Sites Pilot ENCODE Transcription encodeAffyRnaGm06990Sites Affy RNA GM06990 Affymetrix PolyA+ RNA (GM06990) Sites Pilot ENCODE Transcription encodeYaleMASPlacRNATransMap Yale MAS RNA Yale Maskless Array Synthesizer, RNA Transcript Map Pilot ENCODE Transcription Description This track shows the forward (+) and reverse (-) strand transcript map of intensity scores (estimating RNA abundance) for human NB4 cell total RNA, and human placental Poly(A)+ RNA, hybridized to the Yale MAS (Maskless Array Synthesizer) ENCODE oligonucleotide microarray, transcription mapping design #1. This array has 36-mer oligonucleotide probes approximately every 36 bp (i.e. end-to-end) covering all the non-repetitive DNA sequence of the ENCODE regions ENm001-ENm012. See NCBI GEO GPL2105 for details of this array design. This transcript map is a combined signal from three biological replicates, each with at least two technical replicates. Arrays were hybridized using either the standard Nimblegen protocol or the protocol described in Bertone et al. (2004). The label of each subtrack in this annotation indicates the specific protocol used for that particular data set. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods A score was assigned to each oligonucleotide probe position by combining two or more technical replicates and by using a sliding window approach. Within a sliding window of 160 bp (corresponding to 5 oligos), the hybridization intensities for all replicates of each oligonucleotide probe were compared to their respective array median score. Within the window and across all the replicates, the number of probes above and below their respective median were counted. Using the sign test, a one-sided P-value was then calculated and a score defined as score=-log(P-value) was assigned to the oligo in the center of the window. Three independent biological replicates were generated and each was hybridized to at least 2 different arrays (technical replicates). Verification Reasonable correlation coefficients between replicates were ensured. Additionally, transcribed regions (TARs/transfrags) were called and compared between technical and biological replicates to ensure significant overlap. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242-6 (2004). Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149-54 (2005). Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P. and Gingeras, T.R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-9 (2002). Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., Lian, Z., Ben Nasr, A., Halaban, H.R. et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A 101(17), 6508-13 (2004). Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al. The transcriptional activity of human Chromosome 22. Genes Dev 17(4), 529-40 (2003). encodeYaleRnaSuper Yale RNA Yale RNA (Neutrophil, Placenta and NB4 cells) Pilot ENCODE Transcription Overview This super-track combines related tracks from Yale Transcript Map analysis. These tracks contain transcriptome data from different cell lines and biological samples as well as analysis of transcriptionally active regions (TARs). Experiments were performed with Yale MAS (Maskless Array Synthesizer) ENCODE oligonucleotide microarray (see NCBI GEO GPL2105 for details of this array design) as well as the Affymetrix ENCODE oligonucleotide microarray. Multiple biological samples were assayed, such as total RNA from human NB4 cells. Experiments also included chemical treatments such as retinoic acid (RA) treatments. Credits Yale MAS RNA, Yale MAS TAR These data were generated and analyzed by the the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. Yale RNA, Yale TAR These data were generated and analyzed by the Yale/Affymetrix collaboration among the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. Yale RACE These data were generated and analyzed by the lab of Mark Gerstein at Yale University. References Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005 May 20;308(5725):1149-54. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. Kluger Y, Tuck DP, Chang JT, Nakayama Y, Poddar R, Kohya N, Lian Z, Ben Nasr A, Halaban HR et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A. 2004 Apr 27;101(17):6508-13. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003 Feb 15;17(4):529-40. encodeYaleMASPlacRNATransMapRevMless36mer36bp Yale Plc BtR RNA Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATransMapFwdMless36mer36bp Yale Plc BtF RNA Yale Placenta RNA TransMap, MAS array, Forward Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTMREVMless36mer36bp Yale Plc NgR RNA Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTMFWDMless36mer36bp Yale Plc NgF RNA Yale Placenta RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANprotTMREVMless36mer36bp Yale NB4 NgR RNA Yale NB4 RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANprotTMFWDMless36mer36bp Yale NB4 NgF RNA Yale NB4 RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATars Yale MAS TAR Yale Maskless Array Synthesizer, RNA Transcriptionally Active Regions Pilot ENCODE Transcription Description This track shows the locations of forward (+) and reverse (-) strand transcriptionally-active regions (TARs)/transcribed fragments (transfrags), for human NB4 cell total RNA and for human placenta Poly(A)+ RNA, hybridized to the Yale Maskless Array Synthesizer (MAS) ENCODE oligonucleotide microarray, transcription mapping design #1. This array has 36-mer oligonucleotide probes approximately every 36 bp (i.e. end-to-end) covering all the non-repetitive DNA sequence of the ENCODE regions ENm001 - ENm012. See NCBI GEO accession GPL2105 for details of this array design. These TARs/transfrags are based on a transcript map combining hybridization intensities from three biological replicates, each with at least two technical replicates. Arrays were hybridized using either Nimblegen standard protocol, or the protocol described in Bertone et al. (2004). The label of each subtrack in this annotation indicates the specific protocol used for that particular data set. Methods A score was assigned to each oligonucleotide probe position by combining two or more technical replicates and by using a sliding window approach. Within a sliding window of 160 bp (corresponding to 5 oligos), the hybridization intensities for all replicates of each oligonucleotide probe were compared to their respective array median intensity. Within the window and across all the replicates, the number of probes above and below their respective median was counted. Using the sign test, a one-sided P-value was then calculated and a score defined as score=-log(p-value) was assigned to the oligo in the center of the window. Three independent biological replicates were generated, and each was hybridized to at least two different arrays (technical replicates). Transcribed regions (TARs/transfrags) were then identified using a score threshold of 95th percentile as well as a maximum gap of 80 bp and a minimum run of 50 bp (between oligonucleotide positions), effectively allowing a gap of one oligo and demanding the TAR/transfrag to encompass at least 3 oligos. Verification Transcribed regions (TARs/transfrags), as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR, Large-scale transcriptional activity in chromosomes 21 and 22, Science. 2002 May 3;296(5569):916-9. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M, The transcriptional activity of human Chromosome 22, Genes Dev, 2003 Feb 15;17(4):529-40. Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M, Global identification of human transcribed sequences with genome tiling arrays, Science. 2004 Dec 24;306(5705):2242-6. Epub 2004 Nov 11. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR, Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution, Science. 2005 May 20;308(5725):1149-54. Epub 2005 Mar 24. encodeYaleMASPlacRNATarsRevMless36mer36bp Yale Plc BtR TAR Yale Placenta RNA TARs, MAS array, Reverse Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATarsFwdMless36mer36bp Yale Plc BtF TAR Yale Placenta RNA TARs, MAS array, Forward Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTarsREVMless36mer36bp Yale Plc NgR TAR Yale Placenta RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTarsFWDMless36mer36bp Yale Plc NgF TAR Yale Placenta RNA TARs, MAS array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANProtTarsREVMless36mer36bp Yale NB4 NgR TAR Yale NB4 RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANProtTarsFWDMless36mer36bp Yale NB4 NgF TAR Yale NB4 RNA TARs, MAS array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleAffyRNATransMap Yale RNA Yale RNA Transcript Map (Neutrophil, Placenta and NB4 cells) Pilot ENCODE Transcription Description This track shows the transcript map of signal intensity (estimating RNA abundance) for the following, hybridized to the Affymetrix ENCODE oligonucleotide microarray: human neutrophil (PMN) total RNA (10 biological samples from different individuals) human placental Poly(A)+ RNA (3 biological replicates) total RNA from human NB4 cells (4 biological replicates), each sample divided into three parts and treated as follows: untreated, treated with retinoic acid (RA), and treated with 12-O-tetradecanoylphorbol-13 acetate (TPA) (three out of the four original samples). Total RNA was extracted from each treated sample and applied to arrays in duplicate (2 technical replicates). The human NB4 cell can be made to differentiate towards either monocytes (by treatment with TPA) or neutrophils (by treatment with RA). See Kluger et al., 2004 in the References section for more details about the differentiation of hematopoietic cells. This array has 25-mer oligonucleotide probes tiled approximately every 22 bp, covering all the non-repetitive DNA sequence of the ENCODE regions. The transcript map is a combined signal for both strands of DNA. This is derived from the number of different biological samples indicated above, each with at least two technical replicates. See the following NCBI Gene Expression Omnibus (GEO) accessions for details of experimental protocols: ENCODE Transcript Mapping for Human Neutrophil (PMN) Total RNA: GSE2678 ENCODE Transcript Mapping for Human Placental Poly(A)+ RNA: GSE2671 ENCODE Transcript Mapping for Total RNA from Human NB4 Cells untreated, treated with RA, and treated with TPA: GSE2679 Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different data samples. Methods The data from biological & technical replicates were quantile-normalized to each other and then median scaled to 25. Using a 101 bp sliding window centered on each oligonucleotide probe, a signal map estimating RNA abundance was generated by computing the pseudomedian signal of all PM-MM pairs (median of pairwise PM-MM averages) within the window, including replicates. Verification Independent biological replicates (as indicated above) were generated, and each was hybridized to at least two different arrays (technical replicates). Transcribed regions were then identified using a signal threshold of 90 percentile of signal intensities, as well as a maximum gap of 50 bp and a minimum run of 50 bp (between oligonucleotide positions). Transcribed regions, as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the Yale/Affymetrix collaboration between the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. References Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005 May 20;308(5725):1149-54. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. Kluger Y, Tuck DP, Chang JT, Nakayama Y, Poddar R, Kohya N, Lian Z, Ben Nasr A, Halaban HR, Krause DS et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A. 2004 April 27;101(17):6508-13. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003 Feb 15;17(4):529-40. encodeYaleAffyNB4UntrRNATransMap Yale RNA NB4 Un Yale NB4 RNA Transcript Map, Untreated Pilot ENCODE Transcription encodeYaleAffyNB4TPARNATransMap Yale RNA NB4 TPA Yale NB4 RNA Transcript Map, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) Pilot ENCODE Transcription encodeYaleAffyNB4RARNATransMap Yale RNA NB4 RA Yale NB4 RNA Transcript Map, Treated with Retinoic Acid Pilot ENCODE Transcription encodeYaleAffyPlacRNATransMap Yale RNA Plcnta Yale Placenta RNA Transcript Map Pilot ENCODE Transcription encodeYaleAffyNeutRNATransMap Yale RNA Neutro Yale Neutrophil RNA Transcript Map Pilot ENCODE Transcription encodeYaleAffyRNATars Yale TAR Yale RNA Transcriptionally Active Regions (TARs) Pilot ENCODE Transcription Description This track shows the locations of transcriptionally active regions (TARs)/transcribed fragments (transfrags) for the following, hybridized to the Affymetrix ENCODE oligonucleotide microarray: human neutrophil (PMN) total RNA (10 biological samples from different individuals) human placental Poly(A)+ RNA (3 biological replicates) total RNA from human NB4 cells (4 biological replicates), each sample divided into three parts and treated as follows: untreated, treated with retinoic acid (RA), and treated with 12-O-tetradecanoylphorbol-13 acetate (TPA) (three out of the four original samples). Total RNA was extracted from each treated sample and applied to arrays in duplicate (2 technical replicates). The human NB4 cell can be made to differentiate towards either monocytes (by treatment with TPA) or neutrophils (by treatment with RA). See Kluger et al., 2004 in the References section for more details about the differentiation of hematopoietic cells. This array has 25-mer oligonucleotide probes tiled approximately every 22 bp, covering all the non-repetitive DNA sequence of the ENCODE regions. The transcript map is a combined signal for both strands of DNA. This is derived from the number of different biological samples indicated above, each with at least two technical replicates. See the following NCBI GEO accessions for details of experimental protocols: GSE2678 GSE2671 GSE2679 Display Conventions and Configuration TARs are represented by blocks in the graphical display. This composite annotation track consists of several subtracks that are listed at the top of the track description page. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different data samples. Methods The data from biological & technical replicates were quantile-normalized to each other and then median scaled to 25. Using a 101 bp sliding window centered on each oligonucleotide probe, a signal map estimating RNA abundance was generated by computing the pseudomedian signal of all PM-MM pairs (median of pairwise PM-MM averages) within the window, including replicates. Transcribed regions (TARs/transfrags) were then identified using a signal theshold determined from a 95% false positive rate (FPR) using the bacterial negatives on the array, as well as a maximum gap of 50 bp and a minimum run of 40 bp (between oligonucleotide positions). The TAR sites that are reported start and end at the middle nucleotide of the beginning and ending oligonucleotide probes. Verification Transcribed regions (TARs/transfrags), as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the Yale/Affymetrix collaboration between the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. References Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242-6 (2004). Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149-54 (2005). Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P. and Gingeras, T.R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-9 (2002). Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., Lian, Z., Ben Nasr, A., Halaban, H.R. et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A 101(17), 6508-13 (2004). Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al. The transcriptional activity of human Chromosome 22. Genes Dev 17(4), 529-40 (2003). encodeYaleAffyNB4UntrRNATars Yale TAR NB4 Un Yale NB4 RNA, TAR, Untreated Pilot ENCODE Transcription encodeYaleAffyNB4TPARNATars Yale TAR NB4 TPA Yale NB4 RNA, TAR, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) Pilot ENCODE Transcription encodeYaleAffyNB4RARNATars Yale TAR NB4 RA Yale NB4 RNA, TAR, Treated with Retinoic Acid Pilot ENCODE Transcription encodeYaleAffyPlacRNATars Yale TAR Plcnta Yale Placenta RNA Transcriptionally Active Region Pilot ENCODE Transcription encodeYaleAffyNeutRNATars Yale TAR Neutro Yale Neutrophil RNA Transcriptionally Active Region (TAR) Pilot ENCODE Transcription encodeAffyEcSites Affy EC Sites Affymetrix ENCODE Extension Transcription Sites Pilot ENCODE Transcription Description This track shows the location of sites showing transcription (transfrags) for chromosomes 21 and 22 for 5 cell lines and 11 tissues. The 5 cell lines used were: GM06990, HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: cerebellum, brain frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal kidney, fetal thymus, ovary, placenta, prostate and testis. Purified cytosolic polyA+ RNA from GM06990, HepG2 and Tert-BJ cell lines, as well as purified polyA+ RNA from whole-cell extracts of the remaining cell lines and tissues, were hybridized to Affymetrix Chromosome 21_22_v2 oligonucleotide tiling arrays, which have 25-mer probes spaced on average every 17 bp (center-center of each 25mer) in the non-repetitive regions of human chromosomes 21 and 22. Clustered sites are shown in separate subtracks for each cell and tissue types. Data for all biological replicates can be downloaded from Affymetrix in wig, BED, and cel formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions (see the Affy RNA Signal track) were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs. Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at the University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyEcSuper Affy EC Affymetrix ENCODE Extension Transcription Pilot ENCODE Transcription Overview This super-track combines related tracks of the ENCODE Extension data generated by Affymetrix. There are two member tracks: Affymetrix ENCODE Extension Transcription Sites: the transcribed fragments (transfrags) based on the signal. Affymetrix ENCODE Extension Transcription Signal: RNA abundance signal. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs (see the Affy TransFrags track for the merged regions). Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at the University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyEc51TertBJSites EC51 Site TertBJ Affy Ext Trans Sites (51-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc1TertBJSites EC1 Sites TertBJ Affy Ext Trans Sites (1-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc51K562Sites EC51 Site K562 Affy Ext Trans Sites (51-base window) (K562) Pilot ENCODE Transcription encodeAffyEc1K562Sites EC1 Sites K562 Affy Ext Trans Sites (1-base window) (K562) Pilot ENCODE Transcription encodeAffyEc51HepG2Sites EC51 Site HepG2 Affy Ext Trans Sites (51-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc1HepG2Sites EC1 Sites HepG2 Affy Ext Trans Sites (1-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc51GM06990Sites EC51 Site GM0699 Affy Ext Trans Sites (51-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc1GM06990Sites EC1 Sites GM0699 Affy Ext Trans Sites (1-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc51HeLaC1S3Sites EC51 Site HeLa Affy Ext Trans Sites (51-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc1HeLaC1S3Sites EC1 Sites HeLa Affy Ext Trans Sites (1-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc51OvarySites EC51 Site Ovary Affy Ext Trans Sites (51-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc1OvarySites EC1 Sites Ovary Affy Ext Trans Sites (1-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc51ProstateSites EC51 Site Prost Affy Ext Trans Sites (51-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc1ProstateSites EC1 Sites Prost Affy Ext Trans Sites (1-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc51FetalTestisSites EC51 Site FetalT Affy Ext Trans Sites (51-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc1FetalTestisSites EC1 Sites FetalT Affy Ext Trans Sites (1-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc51TestisSites EC51 Site Testis Affy Ext Trans Sites (51-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc1TestisSites EC1 Sites Testis Affy Ext Trans Sites (1-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc51PlacentaSites EC51 Site Placen Affy Ext Trans Sites (51-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc1PlacentaSites EC1 Sites Placen Affy Ext Trans Sites (1-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc51FetalSpleenSites EC51 Site Spleen Affy Ext Trans Sites (51-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc1FetalSpleenSites EC1 Sites Spleen Affy Ext Trans Sites (1-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc51FetalKidneySites EC51 Site FetalK Affy Ext Trans Sites (51-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc1FetalKidneySites EC1 Sites FetalK Affy Ext Trans Sites (1-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc51BrainHypothalamusSites EC51 Sites BrainH Affy Ext Trans Sites (51-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc1BrainHypothalamusSites EC1 Sites BrainH Affy Ext Trans Sites (1-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc51BrainHippocampusSites EC51 Site Hippoc Affy Ext Trans Sites (51-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc1BrainHippocampusSites EC1 Sites Hippoc Affy Ext Trans Sites (1-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc51BrainFrontalLobeSites EC51 Site BrainF Affy Ext Trans Sites (51-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc1BrainFrontalLobeSites EC1 Sites BrainF Affy Ext Trans Sites (1-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc51BrainCerebellumSites EC51 Sites BrainC Affy Ext Trans Sites (51-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEc1BrainCerebellumSites EC1 Sites BrainC Affy Ext Trans Sites (1-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEcSignal Affy EC Signal Affymetrix ENCODE Extension Transcription Signal Pilot ENCODE Transcription Description This track shows an estimate of RNA abundance (transcription) for chromosomes 21 and 22 for 5 cell lines and 11 tissues. The 5 cell lines used were: GM06990, HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: cerebellum, brain frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal kidney, fetal thymus, ovary, placenta, prostate and testis. Purified cytosolic polyA+ RNA from GM06990, HepG2 and Tert-BJ cell lines, as well as purified polyA+ RNA from whole cell extracts of the remaining cell lines and tissues, were hybridized to Affymetrix Chromosome 21_22_v2 oligonucleotide tiling arrays, which have 25-mer probes spaced on average every 17 bp (center-center of each 25mer) in the non-repetitive regions of human chromosomes 21 and 22. Composite signals are shown in separate subtracks for each cell and tissue types. Data for all biological replicates can be downloaded from Affymetrix in wig, BED, and cel formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs (see the Affy TransFrags track for the merged regions). Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyEc51TertBJSignal EC51 Sgnl TertBJ Affy Ext Trans Signal (51-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc1TertBJSignal EC1 Sgnl TertBJ Affy Ext Trans Signal (1-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc51K562Signal EC51 Sgnl K562 Affy Ext Trans Signal (51-base window) (K562) Pilot ENCODE Transcription encodeAffyEc1K562Signal EC1 Sgnl K562 Affy Ext Trans Signal (1-base window) (K562) Pilot ENCODE Transcription encodeAffyEc51HepG2Signal EC51 Sgnl HepG2 Affy Ext Trans Signal (51-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc1HepG2Signal EC1 Sgnl HepG2 Affy Ext Trans Signal (1-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc51GM06990Signal EC51 Sgnl GM0699 Affy Ext Trans Signal (51-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc1GM06990Signal EC1 Sgnl GM0699 Affy Ext Trans Signal (1-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc51HeLaC1S3Signal EC51 Sgnl HeLa Affy Ext Trans Signal (51-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc1HeLaC1S3Signal EC1 Sgnl HeLa Affy Ext Trans Signal (1-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc51OvarySignal EC51 Sgnl Ovary Affy Ext Trans Signal (51-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc1OvarySignal EC1 Sgnl Ovary Affy Ext Trans Signal (1-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc51ProstateSignal EC51 Sgnl Prost Affy Ext Trans Signal (51-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc1ProstateSignal EC1 Sgnl Prost Affy Ext Trans Signal (1-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc51FetalTestisSignal EC51 Sgnl FetalT Affy Ext Trans Signal (51-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc1FetalTestisSignal EC1 Sgnl FetalT Affy Ext Trans Signal (1-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc51TestisSignal EC51 Sgnl Testis Affy Ext Trans Signal (51-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc1TestisSignal EC1 Sgnl Testis Affy Ext Trans Signal (1-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc51PlacentaSignal EC51 Sgnl Placen Affy Ext Trans Signal (51-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc1PlacentaSignal EC1 Sgnl Placen Affy Ext Trans Signal (1-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc51FetalSpleenSignal EC51 Sgnl Spleen Affy Ext Trans Signal (51-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc1FetalSpleenSignal EC1 Sgnl Spleen Affy Ext Trans Signal (1-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc51FetalKidneySignal EC51 Sgnl FetalK Affy Ext Trans Signal (51-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc1FetalKidneySignal EC1 Sgnl FetalK Affy Ext Trans Signal (1-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc51BrainHypothalamusSignal EC51 Sgnl BrainH Affy Ext Trans Signal (51-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc1BrainHypothalamusSignal EC1 Sgnl BrainH Affy Ext Trans Signal (1-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc51BrainHippocampusSignal EC51 Sgnl Hippoc Affy Ext Trans Signal (51-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc1BrainHippocampusSignal EC1 Sgnl Hippoc Affy Ext Trans Signal (1-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc51BrainFrontalLobeSignal EC51 Sgnl BrainF Affy Ext Trans Signal (51-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc1BrainFrontalLobeSignal EC1 Sgnl BrainF Affy Ext Trans Signal (1-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc51BrainCerebellumSignal EC51 Sgnl BrainC Affy Ext Trans Signal (51-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEc1BrainCerebellumSignal EC1 Sgnl BrainC Affy Ext Trans Signal (1-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyChIpHl60Pval Affy pVal Affymetrix ChIP-chip (retinoic acid-treated HL-60 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of ten factors in all ENCODE regions, in retinoic-acid stimulated HL-60 cells harvested after 0, 2, 8, and 32 hours. Median P-values are shown in separate subtracks for each of the ten antibodies: Brg1 - Brahma-related Gene 1 CEBPe - CCAAT-enhancer binding protein-epsilon CTCF - CCTC binding factor H3K27me3 (H3K27T) - Histone H3 tri-methylated lysine 27 H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine P300 - E1A-binding protein, 300-KD PU1 - Spleen focus forming virus proviral integration oncogene Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) RARA (RARecA) - Retinoic Acid Receptor-Alpha SIRT1 - Sirtuin-1 Retinoic acid-stimulated HL-60 cells were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only median P-values are displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the log transformed P-value (-10 log[10] P) across processed replicate data is displayed. Several independent biological replicates (four each for Brg1, CEBPe, CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and RARA) were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal by first applying a cutoff of 20 to the log transformed P-values, a maxGap and minRun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site may be comprised of more than one probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate, the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). encodeAffyChipSuper Affy ChIP Affymetrix ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Affymetrix/Harvard ENCODE collaboration. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain ChIP-chip data of multiple transcription factors, RNA polymerase II and histones, in multiple cell lines, including HL-60 (leukemia) and ME-180 (cervical carcinoma), and at different time points after drug cell treatment. Binding was assayed on Affymetrix ENCODE tiling arrays. Data are displayed as signals, median p-values, "strict" p-values and sites. Credits These data were generated and analyzed by collaboration of the Tom Gingeras group at Affymetrix and the Kevin Struhl lab at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, and Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. encodeAffyChIpHl60PvalTfiibHr32 Affy TFIIB RA 32h Affymetrix ChIP-chip (TFIIB retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr32 Affy SIRT1 RA 32h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr08 Affy SIRT1 RA 8h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr02 Affy SIRT1 RA 2h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr00 Affy SIRT1 RA 0h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr32 Affy RARA RA 32h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr08 Affy RARA RA 8h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr02 Affy RARA RA 2h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr00 Affy RARA RA 0h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr32 Affy Pol2 RA 32h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr08 Affy Pol2 RA 8h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr02 Affy Pol2 RA 2h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr00 Affy Pol2 RA 0h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr32 Affy PU1 RA 32h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr08 Affy PU1 RA 8h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr02 Affy PU1 RA 2h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr00 Affy PU1 RA 0h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr32 Affy P300 RA 32h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr08 Affy P300 RA 8h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr02 Affy P300 RA 2h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr00 Affy P300 RA 0h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr32 Affy H4Kac4 RA 32h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr08 Affy H4Kac4 RA 8h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr02 Affy H4Kac4 RA 2h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr00 Affy H4Kac4 RA 0h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr32 Affy H3K27me3 RA 32h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr08 Affy H3K27me3 RA 8h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr02 Affy H3K27me3 RA 2h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr00 Affy H3K27me3 RA 0h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr32 Affy CTCF RA 32h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr08 Affy CTCF RA 8h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr02 Affy CTCF RA 2h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr00 Affy CTCF RA 0h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr32 Affy CEBPe RA 32h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr08 Affy CEBPe RA 8h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr02 Affy CEBPe RA 2h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr00 Affy CEBPe RA 0h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr32 Affy Brg1 RA 32h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr08 Affy Brg1 RA 8h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr02 Affy Brg1 RA 2h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr00 Affy Brg1 RA 0h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60Sites Affy Sites Affymetrix ChIP-chip (retinoic acid-treated HL-60 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of ten factors in all ENCODE regions, in retinoic-acid stimulated HL-60 cells harvested after 0, 2, 8, and 32 hours. Clustered sites are shown in separate subtracks for each of the ten antibodies: Brg1 - Brahma-related Gene 1 CEBPe - CCAAT-enhancer binding protein-epsilon CTCF - CCTC binding factor H3K27me3 (H3K27T) - Histone H3 tri-methylated lysine 27 H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine P300 - E1A-binding protein, 300-KD PU1 - Spleen focus forming virus proviral integration oncogene Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) RARA (RARecA) - Retinoic Acid Receptor-Alpha SIRT1 - Sirtuin-1 Retinoic acid-stimulated HL-60 cells were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the log transformed P-value (-10 log10 P) across processed replicate data is displayed. Several independent biological replicates (four each for Brg1, CEBPe, CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and RARA) were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal by first applying a cutoff of 20 to the log transformed P-values, a maxGap and minRun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site may be comprised of more than one probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate, the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). encodeAffyChIpHl60SitesTfiibHr32 Affy TFIIB RA 32h Affymetrix ChIP-chip (TFIIB retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr32 Affy SIRT1 RA 32h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr08 Affy SIRT1 RA 8h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr02 Affy SIRT1 RA 2h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr00 Affy SIRT1 RA 0h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr32 Affy RARA RA 32h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr08 Affy RARA RA 8h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr02 Affy RARA RA 2h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr00 Affy RARA RA 0h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr32 Affy Pol2 RA 32h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr08 Affy Pol2 RA 8h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr02 Affy Pol2 RA 2h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr00 Affy Pol2 RA 0h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr32 Affy PU1 RA 32h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr08 Affy PU1 RA 8h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr02 Affy PU1 RA 2h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr00 Affy PU1 RA 0h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr32 Affy P300 RA 32h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr08 Affy P300 RA 8h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr02 Affy P300 RA 2h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr00 Affy P300 RA 0h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr32 Affy H4Kac4 RA 32h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr08 Affy H4Kac4 RA 8h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr02 Affy H4Kac4 RA 2h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr00 Affy H4Kac4 RA 0h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr32 Affy H3K27me3 RA 32h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr08 Affy H3K27me3 RA 8h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr02 Affy H3K27me3 RA 2h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr00 Affy H3K27me3 RA 0h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr32 Affy CTCF RA 32h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr08 Affy CTCF RA 8h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr02 Affy CTCF RA 2h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr00 Affy CTCF RA 0h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr32 Affy CEBPe RA 32h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr08 Affy CEBPe RA 8h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr02 Affy CEBPe RA 2h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr00 Affy CEBPe RA 0h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr32 Affy Brg1 RA 32h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr08 Affy Brg1 RA 8h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr02 Affy Brg1 RA 2h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr00 Affy Brg1 RA 0h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrict Affy Strict pVal Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Median of the transformed P-value (-10 log[10] P) across processed replicate data is displayed as separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only the median of the transformed P-value (-10 log[10] P) is displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the transformed P-value (-10 log[10] P) across processed replicate data is displayed. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60PvalStrictp63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictp63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrict Affy Strict Sig Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Median of the signal estimate across processed replicate data is displayed as separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only the median of the signal estimate across processed replicate data is displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the signal estimate across processed replicate data is displayed. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60SignalStrictp63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictp63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrict Affy Strict Sites Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Clustered sites are shown in separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal, by first applying a cutoff of 0.693(ln(2)=0.693) to the signal estimate, a maxgap and minrun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site can comprise of more than a single probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates A final signal estimate based filter was applied, where sites with median signal estimate of at least 0.693/(total number of individual replcates) were considered. This was to ensure that if a site was not detected consistently in all replicates but was detected at a significant signal level in a subset of the replicates its detection level would be weighted accordingly in the final selection of sites. The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60SitesStrictP63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictP63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeLIChIP LI ChIP Various Ludwig Institute/UCSD ChIP-chip: Pol2 8WG16, TAF1, H3ac, H3K4me2, H3K27me3 antibodies Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analyses were conducted of binding to the initiation-complex form of RNA polymerase II (Pol2), TATA-associated factor (TAF1), acetylated histone H3 (H3ac), lysine-4-dimethylated H3 (H3K4me2), suppressor of zeste 12 protein homolog (SUZ12), and lysine-27-tri-methylated H3 (H3K27me3). The analyses used chromatin extracted from IMR90 (lung fibroblast), HCT116 (colon epithelial carcinoma), HeLa (cervix epithelial adenocarcinoma), and THP1 (blood monocyte leukemia) cells. The initiation-complex form of Pol2 is associated with the transcription start site, as is TAF1. Both H3ac and H3K4me2 are associated with transcriptionally-active "open" chromatin. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. Data for each antibody/cell line pair is displayed in a separate subtrack. See the top of the track description page for a complete list of the subtracks available for this annotation. The subtracks may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by the list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from each of the four cell lines was separately cross-linked, precipitated with antibody to one of the six proteins, sheared, amplified and hybridized to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. The array was composed of 24,537 non-repetitive sequences within the 44 ENCODE regions. For each marker, there were three biological replicates. Each experiment was normalized using the median values. The P-value and R-value were calculated using the modified single array error model (Li, Z. et al., 2003). The P-value and R-value were then derived from the weighted average results of the replicates. The displayed values were scaled to 0 - 16, corresponding to negative log base 10 of the P-value. Verification Each of the experiments has three biological replicates. The array platform, the raw and normalized data for each experiment, and the image files have all been deposited at the NCBI GEO Microarray Database. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N., Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G., Myers, R., Ren, B. Direct isolation and identification of promoters in the human genome. Genome Research 15,830-839 (2005). Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003). Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. Genome-wide location and function of DNA-associated proteins Science 290(5500), 2306-2309 (2000). encodeUcsdChipSuper LI/UCSD ChIP Ludwig Institute/UC San Diego ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Ludwig Institute/UCSD ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for transcription initiation complex (such as Pol2 and TAF1) and H3, H4 histones in multiple cell lines, including HeLa (cervical carcinoma), IMR90 (human fibroblast), and HCT116 (colon epithelial carcinoma), with some experiments including interferon-gamma induction. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim TH, Barrera LO, Qu C, Van Calcar S, Trinklein ND, Cooper SJ, Luna RM, Glass CK, Rosenfeld MG, Myers RM, Ren B. Direct isolation and identification of promoters in the human genome. Genome Res. 2005 Jun;15(6):830-9. Li Z, Van Calcar S, Qu C, Cavenee WK, Zhang MQ, Ren B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8164-9. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al. Genome-wide location and function of DNA-associated proteins. Science. 2000 Dec 22;290(5500):2306-9. Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436(7052):876-80. encodeUcsdChipH3K27me3 LI H3K27me3 HeLa Ludwig Institute ChIP-chip: H3K27me3 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipH3K27me3Suz12 LI SUZ12 HeLa Ludwig Institute ChIP-chip: SUZ12 protein ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipMeh3k4Imr90_f LI H3K4me2 IMR90 Ludwig Institute ChIP-chip: H3K4me2 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipAch3Imr90_f LI H3ac IMR90 Ludwig Institute ChIP-chip: H3ac ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Hct116_f LI TAF1 HCT116 Ludwig Institute ChIP-chip: TAF1 ab, HCT116 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Imr90_f LI TAF1 IMR90 Ludwig Institute ChIP-chip: TAF1 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Thp1_f LI TAF1 THP1 Ludwig Institute ChIP-chip: TAF1 ab, THP1 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Hela_f LI TAF1 HeLa Ludwig Institute ChIP-chip: TAF1 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapHct116_f LI Pol2 HCT116 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HCT116 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapImr90_f LI Pol2 IMR90 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapThp1_f LI Pol2 THP1 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, THP1 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapHela_f LI Pol2 HeLa Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeLIChIPgIF LI gIF ChIP Ludwig Institute/UCSD ChIP-chip - Gamma Interferon Experiments Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analysis of histones H3 and H4 with antibodies H3K4me2, H3K4me3, H3ac, H4ac, STAT1, RNA polymerase II and TAF1 was conducted with ChIP-chip, using chromatin extracted from HeLa cells induced for 30 min with interferon-gamma as well as uninduced cells. The H3K4me2, H3K4me3, H3ac form of histone H3, and H4ac form of histone H4 are associated with up-regulation of gene expression. STAT1 (signal transducer and activator of transcription) binds to DNA and activates transcription in response to various cytokines, including interferon-gamma. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from both induced and uninduced cells was separately cross-linked, precipitated with the antibodies, sheared, amplified and hybridized to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. The array was composed of 24,537 non-repetitive sequences within the 44 ENCODE regions. Each state had three or more biological replicates. Each experiment was loess-normalized using R. The P-value and R-value were calculated using the modified single array error model (Li, Z. et al., 2003). The P-value and R-value were then derived from the weighted average results of the replicates. The displayed values were scaled to 0 - 16, corresponding to negative log base 10 of the P-value. Verification Each of the two experiments has three biological replicates. The array platform, the raw and normalized data for each experiment, and the image files have all been deposited at the NCBI GEO Microarray Database (pending approval). Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N., Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G., Myers, R., Ren, B. Direct isolation and identification of promoters in the human genome. Genome Research 15,830-839 (2005). Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003). Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. Genome-wide location and function of DNA-associated proteins Science 290(5500), 2306-2309 (2000). encodeUcsdChipHeLaH3H4TAF250_p30 LI TAF1 +gIF Ludwig Institute ChIP-chip: TAF1, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4TAF250_p0 LI TAF1 -gIF Ludwig Institute ChIP-chip: TAF1, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4RNAP_p30 LI Pol2 +gIF Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4RNAP_p0 LI Pol2 -gIF Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4stat1_p30 LI STAT1 +gIF Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4stat1_p0 LI STAT1 -gIF Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH4_p30 LI H4ac +gIF Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH4_p0 LI H4ac -gIF Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH3_p30 LI H3ac +gIF Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH3_p0 LI H3ac -gIF Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4tmH3K4_p30 LI H3K4me3 +gIF Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4tmH3K4_p0 LI H3K4me3 -gIF Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4dmH3K4_p30 LI H3K4me2 +gIF Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4dmH3K4_p0 LI H3K4me2 -gIF Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgGif LI Ng gIF ChIP Ludwig Institute/UCSD ChIP-chip NimbleGen - Gamma Interferon Experiments Pilot ENCODE Chromatin Immunoprecipitation Description This track displays results of the following ChIP-chip (NimbleGen) gamma interferon experiments on HeLa cells: anti-H3K4me2, no gamma interferon anti-H3K4me2, 30 minutes after gamma interferon anti-H3K4me3, no gamma interferon anti-H3K4me3, 30 minutes after gamma interferon anti-H3ac, no gamma interferon anti-H3ac, 30 minutes after gamma interferon anti-H4ac, no gamma interferon anti-STAT1, 30 minutes after gamma interferon anti-RNA Pol2 in initiation complex, no gamma interferon anti-RNA Pol2 in initiation complex, 30 minutes after gamma interferon ENCODE region-wide location analysis of dimethylated K4 histone H3 (HK4me2 or diMeH3K4), trimethylated K4 histone H3 (H3K4me3 or triMeH3K4), RNA polymerase II, acetylated histone H3 (H3ac or AcH3), acetylated histone H4 (H4ac or AcH3) and STAT1 was conducted with ChIP-chip using chromatin extracted from HeLa cells induced for 30 minutes with gamma interferon as well as uninduced cells. Methods Chromatin from both induced and uninduced HeLa cells was separately cross-linked, precipitated with different antibodies, sheared, amplified and hybridized to an oligonucleotide tiling array produced by NimbleGen Systems. The array includes non-repetitive sequences within the 44 ENCODE regions tiled from NCBI Build 35 (UCSC hg17) with 50-mer probes at 38 bp interval. For H3K4me3 and Pol2, intensity values for biological replicate arrays were combined after quantile normalization using R. The averages of the quantile normalized intensity values for each probe were then median-scaled and Loess-normalized using R to obtain the adjusted logR-values. For all the other markers, each replicate was Loess-normalized and combined after intensity-based quantile normalization. The average log ratio for each probe was derived using linear model fitting with R. The peak positions were identified using the Mpeak program. Ren Lab download page. --> Verification Three biological replicates were used to generate the track for each factor at each time point with the exception of RNA Pol2 uninduced, where only two biological replicates were used. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. encodeUcsdNgHeLaStat1_p30_peak LI STAT1 +gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, STAT1, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH4_p0_peak LI H4ac -gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H4ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p30_peak LI H3ac +gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3ac, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p0_peak LI H3ac -gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p30_peak LI H3K4m2 +IF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3K4me2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p0_peak LI H3K4m2 -IF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3K4me2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaRnap_p30 LI Pol2 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, Pol2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaRnap_p0 LI Pol2 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, Pol2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaStat1_p30 LI STAT1 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, STAT1, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH4_p0 LI H4ac -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H4ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p30 LI H3ac +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3ac, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p0 LI H3ac -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaH3K4me3_p30 LI H3K4m3 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me3, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaH3K4me3_p0 LI H3K4me3 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me3, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p30 LI H3K4me2 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p0 LI H3K4me2 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChip Sanger ChIP Sanger ChIP-chip (histones H3,H4 ab in GM06990, K562, HeLa, and other cells) Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analysis of H3 and H4 histones was conducted employing ChIP-chip using chromatin extracted from GM06990 (lymphoblastoid), K562 (myeloid leukemia-derived), HeLaS3 (cervix carcinoma), HFL-1 (embryonic lung fibroblast), MOLT-4 (lymphoblastic leukemia), and PTR8 cells. Experiments were conducted with antibodies to the following histones: H3K4me1, H3K4me2, H3K4me3, H3K9me3, H3K27me3, H3K36me3, H3K79me3, H3ac, H4ac, and CTCF. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from the cell line was cross-linked with 1% formaldehyde, precipitated with antibody binding to the histone, and sheared and hybridized to the Sanger ENCODE3.1.1 DNA microarray. DNA was not amplified prior to hybridization. The raw and transformed data files reflect fold enrichment over background, averaged over six replicates. Verification There are six replicates: two technical replicates (immunoprecipitations) for each of the three biological replicates (cell cultures). Raw and transformed (averaged) data can be downloaded from the Wellcome Trust Sanger Institute via the ENCODE data access web site or the ENCODE FTP site. Credits The data for this track were generated by the ENCODE investigators at the Wellcome Trust Sanger Institute, Hinxton, UK. encodeSangerChipSuper Sanger ChIP-chip Sanger ChIP-chip (histones H3,H4 ab in GM06990, K562, HeLa, HFL-1, MOLT4, and PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the ENCODE group at the Sanger Institute. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for H3 and H4 histones in multiple cell lines, including HeLa (cervical carcinoma), GM06990 (lymphoblastoid), K562 (myeloid leukemia), and HFL-1 (embryonic lung fibroblast). Experiments were conducted with antibodies to histones with different post-translational modification marks. Data are displayed as signals as well as hits and peak centers identified by hidden Markov model (HMM) analysis. Credits The data were generated by the ENCODE investigators at the Wellcome Trust Sanger Institute, Hinxton, UK. Contacts: Ian Dunham and Christoph Koch. The HMM analysis was performed at the EBI by Paul Flicek. Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode. encodeSangerChipH3K4me3Ptr8 SI H3K4me3 PTR8 Sanger Institute ChIP-chip (H3K4me3 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2Ptr8 SI H3K4me2 PTR8 Sanger Institute ChIP-chip (H3K4me2 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1Ptr8 SI H3K4me1 PTR8 Sanger Institute ChIP-chip (H3K4me1 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acMolt4 SI H4ac MOLT4 Sanger Institute ChIP-chip (H4ac ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acMolt4 SI H3ac MOLT4 Sanger Institute ChIP-chip (H3ac ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3Molt4 SI H3K4me3 MOLT4 Sanger Institute ChIP-chip (H3K4me3 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2Molt4 SI H3K4me2 MOLT4 Sanger Institute ChIP-chip (H3K4me2 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1Molt4 SI H3K4me1 MOLT4 Sanger Institute ChIP-chip (H3K4me1 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acHFL1 SI H4ac HFL-1 Sanger Institute ChIP-chip (H4ac ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acHFL1 SI H3ac HFL-1 Sanger Institute ChIP-chip (H3ac ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3HFL1 SI H3K4me3 HFL-1 Sanger Institute ChIP-chip (H3K4me3 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2HFL1 SI H3K4me2 HFL-1 Sanger Institute ChIP-chip (H3K4me2 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1HFL1 SI H3K4me1 HFL-1 Sanger Institute ChIP-chip (H3K4me1 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acK562 SI H4ac K562 Sanger Institute ChIP-chip (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acK562 SI H3ac K562 Sanger Institute ChIP-chip (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCTCF SI CTCF GM06990 Sanger Institute ChIP-chip (CTCF ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K79me3 SI H3K79me3 GM06990 Sanger Institute ChIP-chip (H3K79me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K36me3 SI H3K36me3 GM06990 Sanger Institute ChIP-chip (H3K36me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K27me3 SI H3K27me3 GM06990 Sanger Institute ChIP-chip (H3K27me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K9me3 SI H3K9me3 GM06990 Sanger Institute ChIP-chip (H3K9me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4ac SI H4ac GM06990 Sanger Institute ChIP-chip (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3ac SI H3ac GM06990 Sanger Institute ChIP-chip (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3 SI H3K4m3 GM6990 Sanger Institute ChIP-chip (H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2 SI H3K4m2 GM6990 Sanger Institute ChIP-chip (H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1 SI H3K4m1 GM6990 Sanger Institute ChIP-chip (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHits Sanger ChIP Hits Sanger ChIP-chip Hits and Peak Centers Pilot ENCODE Chromatin Immunoprecipitation Description This track displays hit regions and peak centers for Sanger ChIP-chip data, as identified by hidden Markov model (HMM) analysis. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Data for each replicate was normalized with the Tukey-Biweight Method using R (as recommended by NimbleGen). The log base 2 ratio of the normalized intensities was used for downstream data processing. A two-state HMM was used to analyze the data. The states of the HMM represent regions of the tile path corresponding to antibody binding locations. State emission probabilities were determined by comparing the cumulative distribution of the experimental data for each replicate on each ENCODE region to a fitted cumulative normal distribution. The fitted distribution was calculated using the Levenberg-Marquart curve-fitting technique and six fitting points ranging from 0.05 to 0.45 of the cumulative distribution. Initial fitting parameters were set from the experimental data. This model is robust through a range of sensible transition probabilities. Bound regions were identified by finding the optimal state sequence from the HMM using the Viterbi algorithm, and the resulting region data was post-processed to develop the hit list. Hits were defined as contiguous portions of the tile path identified as bound by the HMM. The score of a hit was determined by taking the summation of the median enrichment values of the tiles in the contiguous portions (i.e. the area under the peak). For the purpose of this analysis, hits that were within 1000 base pairs of adjacent hits were combined into hit regions. The start position of the oligo with the highest enrichment value in the hit region was deemed the center of the peak. The ranking of hits was based on the total score of all hits in a hit region. It is recommended that analysis based on this data use the peak centers expanded to a convenient size for the analysis. Credits The ChIP-chip data were generated by Ian Dunham's lab at the Sanger Institute. Contacts: Ian Dunham and Christoph Koch. The HMM analysis was performed at the EBI by Paul Flicek. Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode. encodeSangerChipCenterH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip Peak Centers (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip Peak Centers (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH4acK562 SI H4ac K562 Sanger Institute ChIP-chip Peak Centers (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acK562 SI H3ac K562 Sanger Institute ChIP-chip Peak Centers (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip Peak Centers (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip Peak Centers (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH4acGM06990 SI H4ac GM06990 Sanger Institute ChIP-chip Peak Centers (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acGM06990 SI H3ac GM06990 Sanger Institute ChIP-chip Peak Centers (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3GM06990 SI H3K4m3 GM6990 Sanger Institute ChIP-chip Peak Centers(H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2GM06990 SI H3K4m2 GM6990 Sanger Institute ChIP-chip Peak Centers(H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me1GM06990 SI H3K4m1 GM6990 Sanger Institute ChIP-chip Peak Centers (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip Hits (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip Hits (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip Hits (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip Hits (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip Hits (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acK562 SI H4ac K562 Sanger Institute ChIP-chip Hits (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acK562 SI H3ac K562 Sanger Institute ChIP-chip Hits (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip Hits (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip Hits (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acGM06990 SI H4ac GM06990 Sanger Institute ChIP-chip Hits (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acGM06990 SI H3ac GM06990 Sanger Institute ChIP-chip Hits (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3GM06990 SI H3K4m3 GM6990 Sanger Institute ChIP-chip (H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2GM06990 SI H3K4m2 GM6990 Sanger Institute ChIP-chip Hits (H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me1GM06990 SI H3K4m1 GM6990 Sanger Institute ChIP-chip Hits (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChip Stanf ChIP Stanford ChIP-chip (HCT116, Jurkat, K562 cells; Sp1, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation Description This track displays regions bound by Sp1 and Sp3, in the following three cell lines, assayed by ChIP and microarray hybridization: Cell LineClassificationIsolated From HCT 116colorectal carcinomacolon Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte K-562chronic myelogenous leukemia (CML)bone marrow Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin IP was performed as described in Trinklein et al. (2004). Amplified and labeled ChIP DNA was hybridized to oligo tiling arrays produced by NimbleGen, along with a total genomic reference sample. The data for each array were median subtracted (log 2 ratios) and normalized (divided by the standard deviation). The value given for each probe is the transformed mean ratio of ChIP DNA:Total DNA. Verification Three biological replicates and two technical replicates were performed. The Myers lab is currently testing the specificity and sensitivity using real-time PCR. Credits These data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell 15(3), 1254-61 (2004). encodeStanfordChipSuper Stanf ChIP Stanford ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Stanford ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain data for the Sp1 and Sp3 transcription factors in multiple cell lines, including HCT116 (colon epithelial carcinoma), Jurkat (T-cell lymphoblast), and K562 (myeloid leukemia). Credits The Sp1 and Sp3 data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007 Aug 2;448, 553-60. Trinklein ND, Murray JI, Hartman SJ, Botstein D, Myers RM. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell. 2004 Mar;15(3):1254-61. encodeStanfordChipK562Sp3 Stan K562 Sp3 Stanford ChIP-chip (K562 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipK562Sp1 Stan K562 Sp1 Stanford ChIP-chip (K562 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipJurkatSp3 Stan Jurkat Sp3 Stanford ChIP-chip (Jurkat cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipJurkatSp1 Stan Jurkat Sp1 Stanford ChIP-chip (Jurkat cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipHCT116Sp3 Stan HCT116 Sp3 Stanford ChIP-chip (HCT116 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipHCT116Sp1 Stan HCT116 Sp1 Stanford ChIP-chip (HCT116 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothed Stanf ChIP Score Stanford ChIP-chip Smoothed Score Pilot ENCODE Chromatin Immunoprecipitation Description This track displays smoothed (sliding-window mean) scores for regions bound by Sp1 and Sp3 in the following three cell lines, assayed by ChIP and microarray hybridization: Cell LineClassificationIsolated From HCT 116colorectal carcinomacolon Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte K-562chronic myelogenous leukemia (CML)bone marrow Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin IP was performed as described in Trinklein et al. (2004). Amplified and labeled ChIP DNA was hybridized to oligo tiling arrays produced by NimbleGen along with a total genomic reference sample. The data for each array were median subtracted (log 2 ratios) and normalized (divided by the standard deviation). The transformed mean ratios of ChIP DNA:Total DNA for all probes were then smoothed by calculating a sliding-window mean. Windows of six neighboring probes (sliding two probes at a time) were used; within each window, the highest and lowest value were dropped, and the remaining 4 values were averaged. To increase the contrast between high and low values for visual display, the average was converted to a score by the formula: score = 8^(average) * 10. These scores are for visualization purposes; for all analyses, the raw ratios, which are available in the Stanf ChIP track, should be used. Verification Three biological replicates and two technical replicates were performed. The Myers lab is currently testing the specificity and sensitivity using real-time PCR. Credits These data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell 15(3), 1254-61 (2004). encodeStanfordChipSmoothedK562Sp3 Stan Sc K562 Sp3 Stanford ChIP-chip Smoothed Score (K562 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedK562Sp1 Stan Sc K562 Sp1 Stanford ChIP-chip Smoothed Score (K562 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedJurkatSp3 Stan Sc Jurkat Sp3 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedJurkatSp1 Stan Sc Jurkat Sp1 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedHCT116Sp3 Stan Sc HCT116 Sp3 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedHCT116Sp1 Stan Sc HCT116 Sp1 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisChip UCD Ng ChIP UC Davis ChIP-chip NimbleGen (E2F1, c-Myc, TAF, POLII) Pilot ENCODE Chromatin Immunoprecipitation Description ChIP analysis was performed using antibodies to E2F1, c-Myc, TAFI and PolII in HeLa, GM06990 and/or HelaS3 cells. E2F1 and c-Myc protein are transcription factors related to growth. E2F1 is important in controlling cell division, and c-Myc is associated with cell proliferation and neoplastic disease. TAFI is a general transcription factor that is a key part of the pre-initiation complex found on the promoter. PolII is RNA polymerase II. For E2F1 and c-Myc, three independently cross-linked preparations of HeLa cells were used to provide three independent biological replicates. ChIP assays were performed (with minor modifications which can be provided upon request) using the protocol found at The Farnham Laboratory. Array hybridizations were performed using standard NimbleGen Systems conditions. For TAFI and PolII, cross-linked cells were officially supplied by the ENCODE Consortium (for reference, see The Human Genetic Cell Repository). Hence, this data may be compared to other tracks using this exact source of cells. (Note that this is different from the E2F1 and c-myc subtracks — those Hela cells were grown in the Farnham lab.) ChIP-chip and amplification procedures are according to standard protocols available in detail from the Farnham Lab website. Whole Genome Amplification (WGA) was used for these samples. Array processing was performed by NimbleGen, Inc. The supplied array data is the result of three biological replicates in each case. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Ratio intensity values (antibody vs. total) for each of three biological replicates were calculated and converted to log2. Each set of ratio values was then independently scaled by its Tukey biweight mean. The three replicates were then combined by taking the median scaled log2 ratio for each oligo. Verification For E2F1, primers were chosen to correspond to 13 individual peaks. PCR reactions were performed for each of the 13 primer sets using amplicons derived from each of three biological samples (39 reactions). The PCR reactions confirmed that all of the 13 chosen peaks were bound by E2F1 in all three biological samples. For PolII, simple verification of the ChIP sample was performed at a known positive target (the promoter for POLII) and known negative target (the DHFR 3' UTR region). Quantitative PCR verifications of sites are in progress. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Xinmin Zhang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of Farnham Lab. Reference Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUcDavisChipSuper UC Davis ChIP UC Davis ChIP-chip NimbleGen (E2F1, c-Myc, TAF, POLII) Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Farnham laboratory at the University of California, Davis. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain ChIP-chip data for several transcription factors, including E2F1 and PolII, in multiple cell lines including HeLa (cervical carcinoma) and GM06990 (lymphoblastoid). ChIP assays were performed using the protocol found at the Farnham laboratory web site. Array hybridizations were performed using standard NimbleGen Systems conditions. Data are displayed as signals and hits. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of the Farnham lab. References Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUCDavisTafHelaS3 UCD Taf_HelaS3 UC Davis ChIP-chip NimbleGen (TAF, HelaS3 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisTafGM UCD Taf_GM UC Davis ChIP-chip NimbleGen (TAF, GM06990 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisPolIIHelaS3 UCD PolII_HelaS3 UC Davis ChIP-chip NimbleGen (PolII, HelaS3 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisPolIIGM UCD PolII_GM UC Davis ChIP-chip NimbleGen (PolII, GM06990 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisChipMyc UCD C-Myc UC Davis ChIP-chip NimbleGen (C-Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisE2F1Median UCD E2F1 UC Davis ChIP-chip NimbleGen (E2F1 ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUcDavisChipHits UCD Ng ChIP Hits UC Davis ChIP-chip Hits NimbleGen (E2F1, Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation Description ChIP analysis was performed using antibodies to E2F1 and Myc in HeLa cells. E2F1 and Myc protein are transcription factors related to growth. E2F1 is important in controlling cell division, and C-Myc is associated with cell proliferation and neoplastic disease. Three independently cross-linked preparations of HeLa cells were used to provide three independent biological replicates. ChIP assays were performed using the protocol found at Farnham Lab Protocols. Array hybridizations were performed using standard NimbleGen Systems conditions. Methods Ratio intensity values (antibody vs. total) for each of three biological replicates were calculated and converted to log2. Peaks were identified independently for each of the three E2F1 and the three Myc ChIP-chip experiments using the Tamalpais program. The identified peaks from the L1 categories for the three E2F1 or three Myc experiments were then compared. All regions reported here as binding sites were identified in at least two of the three E2F1 or at least two of the three Myc ChIP-chip assays. Verification Primers were chosen to correspond to 13 individual peaks. PCR reactions were performed for each of the 13 primer sets using amplicons derived from each of three biological samples (39 reactions). The PCR reactions confirmed that all of the 13 chosen peaks were bound by E2F1 in all three biological samples. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of Farnham Lab. Reference Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUcDavisChipHitsMyc UCD c-Myc Hits UC Davis ChIP-chip Hits NimbleGen (C-Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUcDavisChipHitsE2F1 UCD E2F1 Hits UC Davis ChIP-chip Hits NimbleGen (E2F1 ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip UT-Austin ChIP University of Texas, Austin ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Description ChIP-chip analysis of c-Myc and E2F4 was performed using 2091 foreskin fibroblasts and HeLa cells. ChIP was carried out from normally-growing HeLa cells and from 2091 quiescent (0.1% serum FBS), as well as serum-stimulated (10% FBS, 4hrs), fibroblasts. Microarray hybridizations were performed using NimbleGen ENCODE arrays and protocols. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from each cell line under a given condition was cross-linked with 1% formaldehyde, sheared, precipitated with antibody, and reverse cross-linked to obtain enriched DNA fragments. ChIP material was amplified and hybridized to a NimbleGen ENCODE region array. The raw and processed files reflect fold enrichment over the mock ChIP sample, which was used as a reference in the hybridization. Verification Each of the four experiments has three independent biological replicates. Data from all three replicates were averaged to generate a single data file. The NimbleGen method for hit identification was used to generate the peaks at a false positive rate of <= 0.05. Credits These data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at the University of Texas at Austin, in collaboration with Mike Singer, Nan Jiang, and Roland Green of NimbleGen Systems, Inc. Reference Kim, J., Bhinge, A., Morgan, X.C. and Iyer, V.R. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nature Methods 2, 47-53 (2005). encodeUtexChipSuper UT-Austin ChIP University of Texas, Austin ChIP-chip and STAGE Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP data generated by the Iyer laboratory at The University of Texas at Austin. Two technologies are presented in this super-track: ChIP-chip and ChIP-STAGE. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. Instead of detecting bound fragments by microarray, ChIP-STAGE uses Sequence Tag Analysis of Genomic Enrichment, or STAGE, technology by cloning STAGE tags, sequencing and mapping to the human genome. These tracks contain ChIP data for several transcription factors, including c-Myc, E2F4 and STAT1, in cell lines including 2091 (foreskin fibroblast) and HeLa (cervical carcinoma). Credits ChIP-chip data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at The University of Texas at Austin, in collaboration with Mike Singer, Nan Jiang, and Roland Green of NimbleGen Systems, Inc. ChIP-STAGE data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab, and by Ghia Euskirchen and Michael Snyder of the Snyder lab at Yale University. References Bhinge AA, Kim J, Euskirchen G, Snyder M, Iyer VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6. Kim J, Bhinge A, Morgan XC, Iyer VR. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat Methods. 2005 Jan;2(1):47-53. encodeUtexChip2091fibE2F4Peaks UT E2F4 st-Fb Pk University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycStimPeaks UT Myc st-Fb Pk University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycPeaks UT Myc Fb Pk University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChipHeLaMycPeaks UT Myc HeLa Pk University of Texas, Austin ChIP-chip (c-Myc, HeLa) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibE2F4Raw UT E2F4 Fb University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycStimRaw UT Myc st-Fb University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycRaw UT Myc Fb University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChipHeLaMycRaw UT Myc HeLa University of Texas, Austin ChIP-chip (c-Myc, HeLa) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStage UT-Austin STAGE University of Texas, Austin STAGE (Sequence Tag Analysis of Genomic Enrichment) Pilot ENCODE Chromatin Immunoprecipitation Description This track shows putative binding loci of c-Myc and STAT1 as determined by Sequence Tag Analysis of Genomic Enrichment (STAGE). The c-Myc (cellular myelocytomatosis) protein is a transcription factor associated with cell proliferation, differentiation, and neoplastic disease. STAT1 is a signal transducer and transcription factor that binds to IFN-gamma activating sequence. STAGE was performed in HeLa cells under normal growth conditions (10% Fetal Bovine Serum) with anti-Myc, or in IFN-gamma stimulated cells with anti-STAT1 antibody. Cloned STAGE tags were sequenced and mapped to the human genome as described in Kim et al. (2005), referenced below. The Tags subtrack shows all STAGE tags within the ENCODE region and thus represents the raw data. The Peaks subtrack shows high confidence c-Myc binding regions derived from the STAGE tags. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To display only one of the subtracks, uncheck the boxes next to the track you wish to hide. Methods Each tag was assigned a probability of enrichment calculated from the frequency of occurrence of the tag in the STAGE sequencing pool and the number of times the tag is present in the genome, assuming a binomial distribution. Generally, tags that have a low frequency of occurrence in the sequencing pool and a high genomic frequency were assigned low probabilities of enrichment. Peaks were determined by using a 500 bp window to scan across each chromosome. Each window was assigned a probability based on the tags mapped within that window as described in Bhinge et al. referenced below. Verification For c-Myc, scores generated from the real data were compared to simulations where similar numbers of tags were randomly sampled from the genome. Calculating probabilities as above, a probability cut-off of 0.8 gave a false positive rate of less than 0.05. For STAT1, scores generated from the real data were compared to simulations where similar numbers of tags were randomly sampled from the genome. Calculating probabilities as described, a probability cut-off of 0.95 gave a false positive rate of less than 0.01. Additionally, 10 STAGE-detected STAT1 binding sites were assayed by qPCR analysis and 9 out 10 were confirmed as true positives, so the false positive rate is estimated at 10%. Credits These data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at the University of Texas at Austin, and by Ghia Euskirchen and Michael Snyder of the Snyder lab at Yale University Reference Kim, J., Bhinge, A., Morgan, X.C. and Iyer, V.R. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nature Methods 2, 47-53 (2005). Bhinge A. et al. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Research (accepted). encodeUtexStageMycHelaPeaks UT Myc HeLa Pk University of Texas, Austin STAGE (c-Myc, HeLa) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStageCMycHelaTags UT Myc HeLa Tags University of Texas, Austin STAGE (c-Myc, HeLa) Tags Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStageStat1HelaTags UT STAT1 HeLa Tags University of Texas, Austin STAGE (STAT1, HeLa) Tags Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChip Uppsala ChIP Uppsala University, Sweden ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Description This track displays the results of ENCODE region-wide localization for three transcription factors (HNF-3b, HNF-4a and USF-1) and acetylated histone H3 (H3ac). The heights of the peaks in the graphical display indicate the ratio of enriched non-amplified DNA to input DNA. The data for each of the transcription factors and H3ac are displayed in individual subtracks. The analysis cut-off threshold is indicated in each subtrack by a horizontal line. Tentative binding sites (TBSs) in spots passing the cut-off are displayed in a separate subtrack, ChIP-chip (HepG2) Sites. These sites are numbered corresponding to the ranking of spots based on enrichment ratios. Each TBS is assigned a value indicating how often it was found in separate BioProspector software runs for the prediction of TBSs (e.g. 1000 indicates that a TBS was found in ten out of ten runs). The raw data for this track is available at EBI ArrayExpress, as experiment E-MEXP-452. Methods Chromatin from HepG2 cells was cross-linked with formaldehyde and sonicated to produce DNA fragments of size 0.5-2 kb. Chromatin was precipitated using antibodies against HNF-4a, HNF-3b, USF-1 or H3ac. DNA from a single ChIP reaction was labeled with Cy5, and a fraction of the total input was labeled with Cy3. There was no amplification of the ChIP DNA or the input DNA prior to this step to avoid introducing bias. This DNA was combined and hybridized to PCR-based tiling path ENCODE arrays. Most array elements were printed only once on the slide, but X-chromosomal regions (ENm006 and ENr324) were printed in duplicate. There were approximately 19,000 spots/slide. The array provided about 75% coverage of the ENCODE regions. Spots flagged as bad by the image processing step were removed; those that remained were normalized. The average log2 ratio was calculated for spots that were replicated on the array. A log odds score for differential enrichment with the negative control was calculated using an empirical Bayes method. There were four log odds scores for each spot, one for each antibody. If this score was greater than 0 and the log2 ratio was greater than 1.25 (indicative of a strong positive signal), based on at least 2 replicates, the spots were considered to be enriched. Binding sites were identified using the BioProspector software. Because the software is non-deterministic, different runs may produce different results for the same data. Predictions consistent across many runs are more likely to be correct; therefore, the analysis was repeated, keeping all binding sites occurring in each top-scoring motif to generate a set of candidates. TBSs present in at least five out of ten runs were selected. Further method details are described in Rada-Iglesias et al. (2005). In the graphical display, overlapping sequences were removed by changing the start position of downstream spots to generate a continuous track. To give each track a comparable scale, the values for the most enriched spots were lowered to 15. Spots deemed as false positives, when compared to a no antibody ChIP-chip experiment, were assigned a value of 0. Verification A negative control was done using no antibody for the ChIP-chip to reduce the number of false positives. Three independent biological replicates were performed for each antibody; three negative control ChIPs were also analyzed. Semi-quantitative PCR was used to verify enrichment in at least ten positive spots for each antibody. Credits These experiments were performed in the Claes Wadelius lab. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. encodeUppsalaChipSuper Uppsala ChIP Uppsala University, Sweden ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Wadelius lab at Uppsala University, Sweden. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for transcription factors (such as HNF-3b) and acetylated histone H3 and H4 in cell lines including HepG2 (liver carcinoma). Experiments were also performed after cell treatment with Na-butyrate. Credits These experiments were performed in the Claes Wadelius lab, Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Ameur A, Yankovski V, Enroth S, Spjuth O, Komorowski J. The LCB Data Warehouse. Bioinformatics. 2006 Apr 15;22(8):1024-6. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15. encodeUppsalaChipSites UU Sites Uppsala University, Sweden ChIP-chip (HepG2) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipUsf1 UU USF-1 HepG2 Uppsala University, Sweden ChIP-chip (USF-1, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipHnf4a UU HNF-4a HepG2 Uppsala University, Sweden ChIP-chip (HNF-4a, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipHnf3b UU HNF-3b HepG2 Uppsala University, Sweden ChIP-chip (HNF-3b, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipAch3 UU H3ac HepG2 Uppsala University, Sweden ChIP-chip (H3ac, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipBut Uppsala ChIP Buty Uppsala University, Sweden ChIP-chip Na-butyrate time series Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE regions were investigated by ChIP-chip, analyzing both histone H3 acetylation (H3ac; H3 acetylated lysines 9 and14) and histone H4 acetylation (H4ac; H4 acetylated lysined 5,8,12,16). This analysis was performed using ChIP material obtained from cells that were either untreated or treated with 5mM Na-Butyrate for 12 hours. Na-Butyrate is a histone deacetylase inhibitor (HDACi) that increases bulk levels of acetylated histones. Four tracks presented in the genome browser represent the ChIP-chip signal obtained for either H3ac or H4ac, using cells that were untreated or treated with butyrate: H3ac 0h, H3ac 12h, H4ac 0h, H4ac 12h. Two additional tracks indicate those spots where H3ac or H4ac levels are significantly changed by butyrate treatment. Methods Chromatin immunoprecipitation, DNA labelling and array hybridization were exactly as previously described (Rada-Iglesias, et al. 2005). A set of enriched spots was obtained for each of H3ac 0h, H3ac 12h, H4ac 0h and H4ac 12h using the same pre-processing and analysis procedures as in (Rada-Iglesias, et al.). Enriched spots showing different histone acetylation levels between 0h and 12h treatment were then detected through an empirical Bayes method (Smyth). All spots with B-score>0 were either classified as up or down depending on whether the acetylation was increased or decreased. For spots missing all measurements at one of the time points due to filtering, the B-score was instead calculated on un-filtered, print-tip lowess normalized (Yang, et al.) raw data. Enriched spots that were not present in any of the up or down groups were classified as unchanged. The raw data for this track is available at EBI ArrayExpress, as experiment E-MEXP-693. Verification New ChIPs were performed for both H3ac and H4ac, both for untreated cells and cells treated with 5mM Na-butyrate for 12 hours. Furthermore, ChIP was performed in cells that were treated with 5mM Na-butyrate for 15 minutes, 2 hours, 6 hours and 12 hours+6 hours without butyrate. All these ChIP DNAs were analyzed by PCR, including 10 regions were loss of acetylation after 12 hours butyrate treatment was observed in ChIP-chip experiments, two regions where a trend towards increase acetylation was observed, one negative region where no acetylation and no change was observed and three control regions not included in the ENCODE array and covering promoter regions of previously known butyrate-responsive genes. Credits These experiments were performed in the Claes Wadelius lab, Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Ameur A, Yankovski V, Enroth S, Spjuth O, Komorowski J. The LCB Data Warehouse. Bioinformatics. 2006 Apr 15;22(8):1024-6. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15. encodeUppsalaChipH4acBut0vs12 UU H4ac 0h vs 12h Uppsala University, Sweden ChIP-chip (H4ac 0h vs. 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut0vs12 UU H3ac 0h vs 12h Uppsala University, Sweden ChIP-chip (H3ac 0h vs. 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH4acBut12h UU H4ac HepG2 12h Uppsala University, Sweden ChIP-chip (H4ac, HepG2, Butyrate 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH4acBut0h UU H4ac HepG2 0h Uppsala University, Sweden ChIP-chip (H4ac, HepG2, Butyrate 0h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut12h UU H3ac HepG2 12h Uppsala University, Sweden ChIP-chip (H3ac, HepG2, Butyrate 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut0h UU H3ac HepG2 0h Uppsala University, Sweden ChIP-chip (H3ac, HepG2, Butyrate 0h) Pilot ENCODE Chromatin Immunoprecipitation encodeUvaDnaRep UVa DNA Rep University of Virginia Temporal Profiling of DNA Replication Pilot ENCODE Chromatin Structure Description The five subtracks in this annotation correspond to five different time points relative to the start of the DNA synthesis phase (S-phase) of the cell cycle. Display Conventions and Configuration Regions that are replicated during the given time interval are shown in green. Varying shades of green are used to distinguish one subtrack from another. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase and DNA was isolated from them. The heavy-light(H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified heavy-light DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The raw data generated by the microarray experiments was processed by computing the enrichment of signal in a particular part of the S-phase relative to the entirety of the S-phase (10 hours). High confidence regions (P-value = 1E-04) of replication were mapped by applying the Wilcoxon Rank Sum test in a sliding window of size 10 kb using the standard Affymetrix data analysis tools and the April 2003 (hg15) version of the human genome assembly. These coordinates were then mapped to the July 2003 (hg17) assembly by UCSC using the liftOver tool. Verification The submitted data are from two biological experimental sets. Regions of significant enrichment were included from both of the biological replicates. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeUvaDnaRepSuper UVa DNA Rep University of Virginia DNA Replication Timing and Origins Pilot ENCODE Chromatin Structure Overview This super-track combines related tracks of DNA replication data from the University of Virginia. DNA replication is carefully coordinated, both across the genome and with respect to development. Earlier replication in S-phase is broadly correlated with gene density and transcriptional activity. These tracks contain temporal profiling of DNA replication and origin of DNA replication in multiple cell lines, such as HeLa cells (cervix carcinoma). Replication timing was measured by analyzing Brd-U-labeled fractions from synchronized cells on tiling arrays. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Giacca M, Pelizon C, Falaschi A. Mapping replication origins by quantifying relative abundance of nascent DNA strands using competitive polymerase chain reaction. Methods. 1997 Nov;13(3):301-12. Mesner LD, Crawford EL, Hamlin JL. Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26. Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C, Hwang DS, Gingeras TR, Dutta A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24. encodeUvaDnaRep8 UVa DNA Rep 8h University of Virginia Temporal Profiling of DNA Replication (8-10 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep6 UVa DNA Rep 6h University of Virginia Temporal Profiling of DNA Replication (6-8 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep4 UVa DNA Rep 4h University of Virginia Temporal Profiling of DNA Replication (4-6 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep2 UVa DNA Rep 2h University of Virginia Temporal Profiling of DNA Replication (2-4 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep0 UVa DNA Rep 0h University of Virginia Temporal Profiling of DNA Replication (0-2 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRepSeg UVa DNA Rep Seg University of Virginia DNA Replication Temporal Segmentation Pilot ENCODE Chromatin Structure Description The four subtracks in this annotation correspond to replication timing categories for DNA synthesis. Replication is segregated into early specific (Early), mid specific (Mid), late specific (Late), and non-specific (PanS). The first three categories correspond to regions that replicated in a time point-specific manner; the latter category encompasses regions that replicated in a temporally non-specific manner. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrDu at every two-hour interval of S-phase and DNA was isolated from them. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as temporally specific or non-specific based on whether or not at least 50% of the accumulated signal appeared in a single time point. The TR50 data was then analyzed within a 20 kb sliding window to classify regions as specific versus non-specific based on the ratio of specific to non-specific probes within the window. Specific regions were further classified as early, mid, or late replicating based on the average TR50 of specific probes within the window. The resulting regions form a non-overlapping segregation of the replication data into the four given categories of replication timing. Verification The replication experiments were completed for two biological sets in the HeLa-adherent cell line. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeUvaDnaRepPanS UVa DNA Rep PanS University of Virginia Temporal Profiling of DNA Replication (PanS) Pilot ENCODE Chromatin Structure encodeUvaDnaRepLate UVa DNA Rep Late University of Virginia Temporal Profiling of DNA Replication (Late) Pilot ENCODE Chromatin Structure encodeUvaDnaRepMid UVa DNA Rep Mid University of Virginia Temporal Profiling of DNA Replication (Mid) Pilot ENCODE Chromatin Structure encodeUvaDnaRepEarly UVa DNA Rep Early University of Virginia Temporal Profiling of DNA Replication (Early) Pilot ENCODE Chromatin Structure encodeUvaDnaRepOrigins UVa DNA Rep Ori University of Virginia DNA Replication Origins Pilot ENCODE Chromatin Structure Description The subtracks within this annotation show replication origins identified using the nascent strand method (Ori-NS), the bubble trapping method (Ori-Bubble) and the TR50 local minima method (Ori-TR50). Tracks are available for HeLa cells (cervix carcinoma) for all methods and GM06990 cells (lymphoblastoid) for Ori-NS. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To show only selected subtracks within this annotation, uncheck the boxes next to the tracks you wish to hide. Nascent Strand Method (Ori-NS) Description ENCODE region-wide mapping of replication origins was performed. Origin-centered nascent-strands purified from HeLa and GM06990 cell lines were hybridized to Affymetrix ENCODE tiling arrays. Methods Cells in their exponential stage of growth were labeled, in culture, with bromodeoxyuridine (BrdU) for 30 mins. DNA was then isolated from the cells. Nascent strands of 0.5-2.5 kb synthesized with incorporation of BrdU, representing the replication origins, were purified using a sucrose gradient followed by immunoprecipitation with BrdU antibody (Giacca et al., 1997). The purified nascent strands were amplified and then hybridized to Affymetrix ENCODE tiling arrays, which have 25-mer probes tiled every 22 bp, on average, in the non-repetitive sequence of the ENCODE regions. As an experimental control, genomic DNA was hybridized to arrays independently. Replication origins were identified by estimating the significance of the enrichment of nascent strands DNA (treatment) signal over genomic DNA (control) signal in a sliding window of 1000 bp. An estimate of significance in the window was calculated by computing the p-value using the Wilcoxon Rank-Sum test over all three biological replicates and control signal estimates in that window. The origins (Ori-NS) represented in the subtrack are the genomic regions that showed a signal enrichment pValue Verification The origin mapping experiments were completed for three biological sets. Credits Data generation and analysis for the subtracks using the Ori-NS method were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Ankit Malhotra, Gabe Robins and Anindya Dutta. Christopher Taylor and Neerja Karnani prepared the data for presentation in the UCSC Genome Browser. References Giacca M, Pelizon C, Falaschi A. Mapping replication origins by quantifying relative abundance of nascent DNA strands using competitive polymerase chain reaction. Methods. 1997;13(3):301-12. Bubble Trapping Method (Ori-Bubble) Description ENCODE region-wide mapping of replication origins in HeLa cells was performed by the bubble trapping method. Replication origins were identified by hybridization to Affymetrix ENCODE tiling arrays. Methods The bubble trapping method works on the principle that circular plasmids can be trapped in gelling agarose followed by the application of electrical current for a prolonged period of time (see Mesner et al. 2006 for more details). Entrapment occurs by an apparent physical linkage of the circular DNA with the agarose matrix. The circular bubble component of the DNA replication intermediates was therefore enriched by agarose trapping. After recovery from the agarose gel, a library of the entrapped DNA was formed by DNA cloning. Subsequently, DNA from the library was labeled and hybridized to Affymetrix ENCODE tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. As an experimental control, genomic DNA was hybridized to arrays independently. Replication origins were identified by estimating the significance of the enrichment of the bubble-trapped DNA (treatment) signal over genomic DNA (control) signal in a sliding window of 10,000 bp. An estimate of significance in the window was calculated by computing the p-value using the Wilcoxon Rank-Sum test over all three biological replicates and the control signal estimates in that window. The origins (Ori-Bubble) hence represented in the UCSC browser track are the genomic regions that showed a signal enrichment pValue Verification The origin mapping experiments were completed for two biological sets. Credits Data generation and analysis for the subtrack using the Ori-bubble method were performed by the DNA replication group in the Dutta Lab and Hamlin Lab at the University of Virginia: Neerja Karnani, Larry Mesner, Christopher Taylor, Ankit Malhotra, Gabe Robins, Anindya Dutta and Joyce Hamlin. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Mesner LD, Crawford EL, Hamlin JL. Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26. TR50 local minima method (Ori-TR50) Description ENCODE region-wide mapping of replication origins in HeLa cells was performed by the TR50 local minima method. Replication origins were identified by hybridization to Affymetrix ENCODE tiling arrays. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase. Subsequently, DNA was isolated from the cells. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as showing temporally specific replication (all alleles replicating together within a two-hour window) or temporally non-specific replication (at least one allele replicating apart from the others by at least a two hour difference). The TR50 data for the temporally specific probes was then smoothed within a 60 kb window using lowess smoothing. Local minima (within a 30 kb window) on the smoothed TR50 curve were identified which had at least 30 probes in the window on both sides of the minimum to locate possible origins of replication. A confidence value was calculated for each site as the average difference from the value of the local minimum of all TR50 values falling into the 30 kb window. Verification The replication experiments were completed for two biological sets and a technical replicate in the HeLa adherent cell line. Credits Data generation and analysis for the subtrack using the Ori-TR50 method were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C, Hwang DS, Gingeras TR, Dutta A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24. encodeUvaDnaRepOriginsTR50Hela UVa Ori-TR50 HeLa University of Virginia DNA Replication Origins, Ori-TR50, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsBubbleHela UVa Ori-Bubble HeLa University of Virginia DNA Replication Origins, Ori-Bubble, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsNSHela UVa Ori-NS HeLa University of Virginia DNA Replication Origins, Ori-NS, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsNSGM UVa Ori-NS GM University of Virginia DNA Replication Origins, Ori-NS, GM06990 Pilot ENCODE Chromatin Structure encodeUvaDnaRepTr50 UVa DNA Rep TR50 University of Virginia DNA Smoothed Timing at 50% Replication Pilot ENCODE Chromatin Structure Description This annotation shows smoothed replication timing for DNA synthesis as the time of 50% replication (TR50). Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase and DNA was isolated from them. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as temporally specific or non-specific based on whether at least 50% of the accumulated signal appeared in a single time point or not. The TR50 data for all specific probes were then lowess-smoothed within a 60 kb window to provide the profile displayed in the annotation. Verification The replication experiments were completed for two biological sets in the HeLa adherent cell line. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeYaleChIPSTAT1Pval Yale STAT1 pVal Yale ChIP-chip (STAT1 ab, HeLa cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows probable sites of STAT1 binding in HeLa cells as determined by chromatin immunoprecipitation followed by microarray analysis. STAT1 (Signal Transducer and Activator of Transcription) is a transcription factor that moves to the nucleus and binds DNA only in response to a cytokine signal such as interferon-gamma. HeLa cells are a common cell line derived from a cervical cancer. Each of the four subtracks represents a different microarray platform. The track as a whole can be used to compare results across microarray platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50-mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36-mer oligonucleotides tiled end to end Maskless design #3: 50-mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. The subtracks show the ratio of immunoprecipitated DNA from cytokine-stimulated cells vs. unstimulated cells in each of the four platforms. The ratio is calculated as -log10(p-value) in a 501-base window. The data shown is the combined result of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. These data are available at NCBI GEO as GSE2714, which also provides additional information about the experimental protocols. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other. After normalization, replicates were condensed to a single value. Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) is generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window (including replicates). Using the same procedure, a -log10(p-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows can be made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site is determined by thresholding both on fold enrichment and -log10(p-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp): log2(Cy5/Cy3) >= 1.25, -log10(p-value) >=8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp): log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp): log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits This data was generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChipSuper Yale ChIP Yale ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Yale ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data of multiple transcription factors such as STAT1 and histones in multiple cell lines such as HelaS3 (cervix epithelial adenocarcinoma). Data are displayed as signals, p-values and site predictions, as well as Regulatory Factor Binding Regions (RFBR) predictions. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. The RFBR data set was made available by the Transcriptional Regulation Group of the ENCODE Project Consortium. The RFBR cluster and desert tracks were generated by Zhengdong Zhang from Mark Gerstein's group at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Luscombe NM, Royce TE, Bertone P, Echols N, Horak CE, Chang JT, Snyder M, Gerstein M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 2003 Jul 1;31(13):3477-82. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002 Dec;32 Suppl:496-501. Efron B. Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. J Am Stat Assoc. 2004;99(465):96-104. Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein M. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 2007 Jun;17(6):787-97. encodeYaleChIPSTAT1HeLaBingRenPval Yale LI PVal Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpPval Yale 50-50 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpPval Yale 50-38 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpPval Yale 36-36 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1Sig Yale STAT1 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Signal Pilot ENCODE Chromatin Immunoprecipitation Description Each of these four tracks shows the map of signal intensity (estimating the fold enrichment [log2 scale] of ChIP DNA vs unstimulated DNA) for STAT1 ChIP-chip using Human Hela S3 cells hybridized to four different array designs/platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50-mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36-mer oligonucleotides tiled end to end Maskless design #3: 50-mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. Each track shows the combined results of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO as GSE2714, which also provides additional information about the experimental protocols. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other (both Cy3 and Cy5 channels). Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding both on fold enrichment and -log10(P-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp): log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= 8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp): log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp): log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChIPSTAT1HeLaBingRenSig Yale LI Sig Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSig Yale 50-50 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSig Yale 50-38 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSig Yale 36-36 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1Sites Yale STAT1 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Binding Sites Pilot ENCODE Chromatin Immunoprecipitation Description Each of these four tracks shows the binding sites for STAT1 ChIP-chip using Human Hela S3 cells hybridized to four different array designs/platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36mer oligonucleotides tiled end to end Maskless design #3: 50mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. Each track shows the combined results of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. See NCBI GEO GSE2714 for details of the experimental protocols. Methods Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other (both Cy3 and Cy5 channels). Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding both on fold enrichment and -log10(P-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp): log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= 8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp): log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp): log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits This data was generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChIPSTAT1HeLaBingRenSites Yale LI Sites Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSite Yale 50-50 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSite Yale 50-38 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSite Yale 36-36 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPval Yale ChIP pVal Yale ChIP-chip P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of -log10(P-value) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorescent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipPvalBaf47K562 YU BAF47 K562 P Yale ChIP-chip (BAF47 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf170K562 YU BAF170 K562 P Yale ChIP-chip (BAF170 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf155K562 YU BAF155 K562 P Yale ChIP-chip (BAF155 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH4kac4Gm06990 YU H4Kac4 GM P Yale ChIP-chip (H4Kac4 ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2nGm06990 YU Pol2N GM P Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2Gm06990 YU Pol2 GM P Yale ChIP-chip (Pol2 ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalNrsfHela YU NRSF HeLa P Yale ChIP-chip (NRSF, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalSmarca6Hela YU SMARCA6 HeLa P Yale ChIP-chip (SMARCA6, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalSmarca4Hela YU SMARCA4 HeLa P Yale ChIP-chip (SMARCA4, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalP65cHelaTnfa YU P65-C HeLa TNF P Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalP65nHelaTnfa YU P65-N HeLa TNF P Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalStat1HelaIfna YU STAT1 HeLa IF P Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH3k27me3Hela YU H3K27me3 HeLa P Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH4kac4Hela YU H4Kac4 HeLa P Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2nHela YU Pol2N HeLa P Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2Hela YU Pol2 HeLa P Yale ChIP-chip (Pol2 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalTaf YU TAF1 HeLa P Yale ChIP-chip (TAF1 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalJun YU c-Jun HeLa P Yale ChIP-chip (c-Jun ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalFos YU c-Fos HeLa P Yale ChIP-chip (c-Fos ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf170 YU BAF170 HeLa P Yale ChIP-chip (BAF170 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf155 YU BAF155 HeLa P Yale ChIP-chip (BAF155 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSig Yale ChIP Signal Yale ChIP-chip Signal Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of signal intensity (estimating the fold enrichment [log2 scale] of chromatin immunoprecipitated DNA vs. input DNA) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipSignalBaf47K562 YU BAF47 K562 S Yale ChIP-chip (BAF47 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf170K562 YU BAF170 K562 S Yale ChIP-chip (BAF170 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf155K562 YU BAF155 K562 S Yale ChIP-chip (BAF155 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH4kac4Gm06990 YU H4Kac4 GM S Yale ChIP-chip (H4Kac4 ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2nGm06990 YU Pol2N GM S Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2Gm06990 YU Pol2 GM S Yale ChIP-chip (Pol2 ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalNrsfHela YU NRSF HeLa S Yale ChIP-chip (NRSF, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalSmarca6Hela YU SMARCA6 HeLa S Yale ChIP-chip (SMARCA6, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalSmarca4Hela YU SMARCA4 HeLa S Yale ChIP-chip (SMARCA4, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalP65cHelaTnfa YU P65-C HeLa TNF S Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalP65nHelaTnfa YU P65-N HeLa TNF S Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalStat1HelaIfna YU STAT1 HeLa IF S Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH3k27me3Hela YU H3K27me3 HeLa S Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH4kac4Hela YU H4Kac4 HeLa S Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2nHela YU Pol2N HeLa S Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2Hela YU Pol2 HeLa S Yale ChIP-chip (Pol2 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalTaf YU TAF1 HeLa S Yale ChIP-chip (TAF1 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalJun YU c-Jun HeLa S Yale ChIP-chip (c-Jun ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalFos YU c-Fos HeLa S Yale ChIP-chip (c-Fos ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf170 YU BAF170 HeLa S Yale ChIP-chip (BAF170 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf155 YU BAF155 HeLa S Yale ChIP-chip (BAF155 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSites Yale ChIP Sites Yale ChIP-chip Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of -log10(P-value) of binding sites (as determined in the Methods below) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. Data may be thresholded by score and/or the user can specify the display of only the top N scoring items (default is 200) for all the subtracks. The score for each item is indicated in grayscale, with darker shades corresponding to higher scores. The details page for an item (displayed after clicking on an item in the track) shows the top 20 highest scoring items displayed in the current window. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained for experiments (ENCODE Oct 2005 Freeze) and for the other datasets, sites found using 1, 5 and 10% false discovery rates (FDR) are displayed. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Sites for data from Nov. 2006, Jan. 2007, Apr. 2007 and Jun. 2007 were determined with false discovery rates (FDR) of 1%, 5% and 10%. The lowest FDR which includes each "Site" is displayed on that site's details page. For the ENCODE Oct 2005 Freeze data (BAF155, BAF170, Fos, Jun and TAF1 in HeLa S3 cells), the top 400 sites are shown. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipSitesBaf47K562 YU BAF47 K562 Yale ChIP-chip (BAF47 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf170K562 YU BAF170 K562 Yale ChIP-chip (BAF170 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf155K562 YU BAF155 K562 Yale ChIP-chip (BAF155 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH4kac4Gm06990 YU H4Kac4 GM Yale ChIP-chip (H4Kac4 ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2nGm06990 YU Pol2N GM Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2Gm06990 YU Pol2 GM Yale ChIP-chip (Pol2 ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesNrsfHela YU NRSF HeLa Yale ChIP-chip (NRSF, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesSmarca6Hela YU SMARCA6 HeLa Yale ChIP-chip (SMARCA6, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesSmarca4Hela YU SMARCA4 HeLa Yale ChIP-chip (SMARCA4, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesP65cHelaTnfa YU P65-C HeLa TNF Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesP65nHelaTnfa YU P65-N HeLa TNF Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesStat1HelaIfna YU STAT1 HeLa IF Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH3k27me3Hela YU H3K27me3 HeLa Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH4kac4Hela YU H4Kac4 HeLa Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2nHela YU Pol2N HeLa Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2Hela YU Pol2 HeLa Yale ChIP-chip (Pol2 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesTaf YU TAF1 HeLa Yale ChIP-chip (TAF1 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesJun YU c-Jun HeLa Yale ChIP-chip (c-Jun ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesFos YU c-Fos HeLa Yale ChIP-chip (c-Fos ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf170 YU BAF170 HeLa Yale ChIP-chip (BAF170 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf155 YU BAF155 HeLa Yale ChIP-chip (BAF155 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipRfbr Yale ChIP RFBR Yale ChIP-chip Regulatory Factor Binding Regions Analysis Pilot ENCODE Chromatin Immunoprecipitation Description Regulatory Factor Binding Regions (RFBRs) were identified from ChIP-Chip experimental data; they are non-randomly distributed in the ENCODE regions with local enrichment and depletion. By mapping the full set of RFBRs onto the human genome sequence, we identified 689 genomic subregions with RFBR enrichment and 726 subregions with RFBR depletion (the RFBR clusters and deserts, respectively) in the ENCODE regions. Methods The data set analyzed in this study consists of 105 lists of transcriptional regulatory elements (TREs) in the ENCODE regions. It was released on December 13, 2005 by the Transcriptional Regulation Group. TRE lists made available after this data freeze were not included in this study. A total of 29 transcription factors (BAF155, BAF170, Brg1, CEBPe, CTCF, E2F1, E2F4, H3ac, H4ac, H3K27me3, H3K27me3, H3K4me1, H3K4me2, H3K4me3, H3K9K14me2, HisH4, c-Jun, c-Myc, P300, P63, Pol2, PU1, RARecA, SIRT1, Sp1, Sp3, STAT1, Suz12, and TAF1) were assayed by seven laboratories (Affymetrix, Sanger, Stanford, UCD, UCSD, UT, Yale) using ChIP-chip experiments on three different microarray platforms (Affymetrix tiling array, NimbleGen tiling array, and traditional PCR array) in nine cell lines (HL-60, HeLa, GM06990, K562, IMR90, HCT116, THP1, Jurkat, and fibroblasts) or at two different experimental time points (P0, before addition of gamma-interferon, and P30, 30 minutes after the addition of gamma-interferon). The raw data from these 105 ChIP-chip experiments was uniformly processed using a method based on the false discovery rate (Efron, 2004). Three sets of TRE lists were generated at 1%, 5%, and 10% false discovery rates respectively, and the list generated at the lowest (1%) false discovery rate was used in this study. The non-redundant factor-specific RFBR lists were mapped onto the ENCODE regions. Uninterrupted genomic regions that are covered by one or more RFBRs were identified as RFBR groups. Neighboring groups that are less than 1 kb apart were collected into RFBR clusters. Un-clustered groups that are covered by more than three RFBRs were promoted into clusters. Further details of the method may be found in Zhang et al. (2007). Credits The data set was made available by the Transcriptional Regulation Group of the ENCODE Project Consortium. The RFBR cluster and desert tracks were generated by Zhengdong Zhang from Mark Gerstein's group at Yale University. References Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. Journal of the American Statistical Association. 2004;99(465):96-104. Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein M. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 2007 Jun;17(6):787-97. encodeYaleChipRfbrDeserts Yale RFBR Deserts Yale ChIP-chip Regulatory Factor Binding Regions (RFBR) Deserts Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipRfbrClusters Yale RFBR Clusters Yale ChIP-chip Regulatory Factor Binding Regions (RFBR) Clusters Pilot ENCODE Chromatin Immunoprecipitation encodeUWRegulomeBase UW DNase-QCP UW DNaseI Sensitivity by QCP Pilot ENCODE Chromatin Structure Description This track shows DNaseI sensitivity measured across ENCODE regions using the Quantitative Chromatin Profiling (QCP) method (Dorschner et al. (2004)). DNaseI has long been used to map general chromatin accessibility and the DNaseI "hyperaccessibility" or "hypersensitivity" that is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. QCP provides a quantitative high-throughout method for the mapping DNaseI sensitivity as a continuous function of genome position. The moving baseline of mean DNaseI sensitivity is computed using a locally-weighted least squares (LOWESS)-based algorithm. DNaseI-treated and untreated chromatin samples from the following cell lines/phenotypes were studied: Cell LineDescription Source CD4CD4+ lymphoidPrimary CaCo2intestinal cancer ATCC CaLU3lung cancerATCC EryAdultCD34-derived primary adult erythroblasts Primary EryFetalCD34-derived primary fetal erythroblasts Primary GM06990EBV-transformed lymphoblastoid Coriell HMECmammary epitheliumCambrex HRErenal epithelialCambrex HeLacervical cancerATCC HepG2hepaticATCC Huh7hepaticJCRB K562erythroidATCC NHBEbronchial epithelialCambrex PANCpancreaticATCC SAECsmall airway epithelialCambrex SKnSHneuralATCC Key for Source entry in table: ATCC: American Type Culture Collection Cambrex: Cambrex Corporation JCRB: Japanese Collection of Research Bioresources Display Conventions and Configuration DNaseI sensitivity is expressed in standard units, where each increment of 1 unit corresponds to an increase of 1 standard deviation from the baseline. The displayed values are calculated as copies in DNaseI-untreated / copies in DNaseI-treated. Thus, increasing values represent increasing sensitivity. Major DNaseI hypersensitive sites are readily identified as peaks in the signal that exceed 2 standard deviations (corresponding to the ~95% confidence bound on outliers). This is reflected in the default viewing parameters, which apply a lower y-axis threshold of 2 (i.e., showing only sites that exceed the 95% confidence bound). The subtracks within this composite annotation track correspond to data from different tissues, and may be configured in a variety of ways to highlight different aspects of the displayed data. Four tissue types are present throughout all ENCODE regions: GM06990, CaCo2, HeLa, and SKnSH. Several Relevant tissues were also studied for several ENCODE regions that contain tissue-specific genes. These include the alpha- and beta-globin loci (ENm008 and ENm009); the apolipoprotein A1/C3 loci (ENm003); and the Th2 cytokine locus (ENm002). Color differences among the subtracks are arbitrary; they provide a visual cue for distinguishing the different cell lines/phenotypes. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods QCP was performed as described in Dorschner et al. Data were obtained from a tiling path across ENCODE that comprises 102,008 distinct amplicons (mean length = 243 +/- 13). The amplicon tiling path is available through UniSTS. The tiling path covers approximately 86% of ENCODE regions, including many repetitive regions. The Dorschner et al. article describes the methods of chromatin preparation, DNaseI digestion, and DNA purification utilized. DNaseI-treated and -untreated control samples were prepared from each tissue. For each tissue, 6-10 biological replicates (defined as replicate cultures grown from seed and harvested on different days) were pooled together to create a master sample. The relative number of intact copies of the genomic DNA sequence was quantified over the entire tiling path real-time PCR for both DNaseI-treated and -untreated samples. Four to eight technical replicates were performed for each measurement from each amplicon in each tissue. Data shown are the means of these technical replicates. The results were analyzed as described in Dorschner et al. to compute the moving baseline of mean DNaseI sensitivity and to identify outliers that correspond with DNaseI hypersensitive sites. The standard deviation of trimmed mean measurements was used to convert data to standard units. Verification Biological replicate samples were pooled as described above. Results were extensively validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting method (Navas et al., in preparation). Credits Data generation, analysis, and validation were performed by the following members of the ENCODE group at the University of Washington (UW) in Seattle. UW Medical Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka Johnson, Tristan Frum, and George Stamatoyannopoulos. UW Genome Sciences: Michael O. Dorschner, Richard Humbert, Peter J. Sabo, Scott Kuehn, Robert Thurman, Anthony Shafer, Jeff Goldy, Molly Weaver, Andrew Haydock, Kristin Lee, Fidencio Neri, Richard Sandstrom, Shane Neff, Brendan Henry, Michael Hawrylycz, Janelle Kawamoto, Paul Tittel, Jim Wallace, William S. Noble, and John A. Stamatoyannopoulos. References Dorschner MO, Hawrylycz M, Humbert R, Wallace JC, Shafer A, Kawamoto J, Mack J, Hall R, Goldy J, Sabo PJ et al. High-throughput localization of functional elements by quantitative chromatin profiling. Nat Methods. 2004 Dec;1(3):219-25. encodeUwDnaseSuper UW DNase UW DNaseI Hypersensitivity Pilot ENCODE Chromatin Structure Overview This super-track combines related tracks of DNaseI sensitivity data from University of Washington (UW). DNaseI has long been used to map general chromatin accessibility and the DNaseI "hyperaccessibility" or "hypersensitivity" that is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome, and is a universal feature of active cis-regulatory sequences in vivo. These tracks contain DNaseI analysis of multiple cell lines using the QCP method or DNaseI-chip. Credits Data generation, analysis, and validation were performed by the following members of the ENCODE group at UW in Seattle. UW Medical Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka Johnson, Tristan Frum, and George Stamatoyannopoulos. UW Genome Sciences: Michael O. Dorschner, Richard Humbert, Peter J. Sabo, Scott Kuehn, Robert Thurman, Anthony Shafer, Jeff Goldy, Molly Weaver, Andrew Haydock, Kristin Lee, Fidencio Neri, Richard Sandstrom, Shane Neff, Brendan Henry, Michael Hawrylycz, Janelle Kawamoto, Paul Tittel, Jim Wallace, William S. Noble, and John A. Stamatoyannopoulos. References Dorschner MO, Hawrylycz M, Humbert R, Wallace JC, Shafer A, Kawamoto J, Mack J, Hall R, Goldy J, Sabo PJ et al. High-throughput localization of functional elements by quantitative chromatin profiling. Nat Methods. 2004 Dec;1(3):219-25. Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat Methods. 2006 Jul;3(7):511-8. encodeUWRegulomeBaseSKnSH SKnSH SKnSH DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseSAEC SAEC SAEC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBasePANC PANC PANC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseNHBE NHBE NHBE DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseK562 K562 K562 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHuh7 Huh7 Huh7 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHepG2 HepG2 HepG2 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHeLa HeLa HeLa DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHRE HRE HRE DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHMEC HMEC HMEC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseGM GM GM DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseEryFetal EryFetal EryFetal DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseEryAdult EryAdult EryAdult DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCaLU3 CaLU3 CaLU3 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCaCo2 CaCo2 CaCo2 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCD4 CD4 CD4 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeRegulomeDnaseArray UW DNase-array UW DNaseI Hypersensitivity by DNase-array Pilot ENCODE Chromatin Structure Description This track displays DNaseI sensitivity/hypersensitivity mapped over ENCODE regions in lymphoblastoid cells (ENCODE common cell line GM06990) using the DNase-array methodology described in Sabo et al. (2006). DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome, and is a universal feature of active cis-regulatory sequences in vivo. Peaks in DNaseI sensitivity signal measured using DNase/Array represent DNaseI hypersensitive sites. Methods DNase-array comprises the following steps: (1) treatment of nuclear chromatin with DNaseI; (2) isolation of short (avg. length ~450 bp) DNA segments released by two DNaseI �hits� occurring in close proximity on the same nuclear chromatin template; (3) differential labeling of fragments and a control (DNaseI-treated naked DNA); (4) hybridization to a tiling DNA microarray (Nimblegen ENCODE array), without amplification. Signal peaks correspond to DNaseI hypersensitive sites. Validation The data have been extensively validated by conventional DNaseI hypersensitivity assays (indirect end-label + Southern blotting method). The data have an overall sensitivity of 91.7%, and specificity of >99.5% for DNaseI hypersensitive sites. Note that the tiling array covers only non-repetitive regions. Credits These data were generated by the UW ENCODE group. References Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A, Weaver M, Shafer A, Lee K, Neri F, Humbert R, Singer MA, Richmond TA, Dorschner MO, McArthur M, Hawrylycz M, Green RD, Navas PA, Noble WS, Stamatoyannopoulos JA. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nature Methods 3:511-18 (2006) encodeRegulomeDnaseGM06990Sites DnaseI HSs UW DNase-array GM06990 HSs Pilot ENCODE Chromatin Structure encodeRegulomeDnaseGM06990Sens DnaseI Sens UW DNase-array GM06990 Sensitivity Pilot ENCODE Chromatin Structure encodeMsaTbaDec07 36-Way TBA TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation Description This track displays human-centric multiple sequence alignments and conserved elements in the ENCODE regions for the 36 vertebrates included in the December 2007 ENCODE MSA freeze. The alignments in this track were generated using the Threaded Blockset Aligner (TBA). The conservation subtracks display conserved elements generated by two methods: BinCons, a binomial-based method that calculates a conservation score in sliding windows with normalization for phylogenetic bias, and Chai Cons, a DNA structure-informed constraint detection algorithm that uses hydroxyl radical cleavage patterns as a measure of DNA structure. The multiple alignments are based on comparative sequence data generated for the ENCODE project from NIH Intramural Sequencing Center (NISC) as well as whole-genome assemblies residing at UCSC, as listed: OrganismSpeciesVersion HumanHomo sapiens UCSC hg18 ArmadilloDasypus novemcinctus NISC BaboonPapio anubis NISC Bat (rfbat)Rhinolophus ferrumequinum NISC Bat (sbbat)Myotis lucifugus NISC CatFelis catus NISC ChickenGallus gallus UCSC galGal3 ChimpanzeePan troglodytes UCSC panTro2 Colobus MonkeyColobus guereza NISC CowBos taurus UCSC bosTau3 DogCanis familiaris UCSC canFam2 Dusky titiCallicebus moloch NISC ElephantLoxodonta africana NISC Flying FoxPteropus vampyrus NISC GalagoOtolemur garnettii NISC GibbonNomascus leucogenys leucogenys NISC Guinea pigCavia porcellus NISC HedgehogAtelerix albiventris NISC HorseEquus caballus NISC MacaqueMacaca mulatta UCSC rheMac2 MarmosetCallithrix jacchus NISC MouseMus musculus UCSC mm9 Mouse LemurMicrocebus murinus NISC OpossumMonodelphis domestica UCSC monDom4 OrangutanPongo abelii UCSC ponAbe2 Owl MonkeyAotus nancymaae NISC PlatypusOrnithorhychus anatinus NISC RabbitOryctolagus cuniculus NISC RatRattus norvegicus UCSC rn4 Rock hyraxProcavia capensis NISC ShrewSorex araneus NISC Squirrel monkeySaimiri boliviensis boliviensis NISC SquirrelSpermophilus tridecemlineatus NISC TenrecEchinops telfairi NISC Tree shrewTupaia belangeri NISC Vervet monkeyChlorocebus aethiops NISC Display Conventions and Configuration In full display mode, this track shows pairwise alignments of each species aligned to the human genome. In dense mode, the alignments are depicted using a gray-scale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the Default species for translation; pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen. Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Gencode Geneshuman UCSC Genesmouse Known Genesrat RefSeq Geneschimp Ensembl Genesrhesus, opossum Nonethe remaining 30 species Methods TBA TBA was used to align sequences in the December 2007 ENCODE sequence data freeze. Multiple alignments were seeded from a series of combinatorial pairwise blastz alignments (not referenced to any one species). The specific combinations were determined by the species guide tree. The resulting multiple alignments were projected onto the human reference sequence. BinCons The binCons score is based on the cumulative binomial probability of detecting the observed number of identical bases (or greater) in sliding 25 bp windows (moving one bp at a time) between the reference sequence and each other species, given the neutral rate at four-fold degenerate sites. Neutral rates are calculated separately at each targeted region. For targets with no gene annotations, the average percent identity across all alignable sequence was instead used to weight the individual species binomial scores; this latter weighting scheme was found to closely match 4D weights. Clusters of bases that exceeded the given conservation score threshold were designated as conserved elements. The minimum length of a conserved element is 25 bases. Strict cutoffs were used: if even one base fell below the conservation score threshold, it separates an element into two distinct regions. Regions reported here exceed a 5% False Discovery Rate threshold, using a window size of 7 bases. More details on binCons can be found in Margulies et. al. (2003) cited below. Chai Chai is a DNA structure-informed evolutionary conservation algorithm that works in a manner analogous to the primary sequence-based binCons. Instead of computing the binomial probability of observed base substitutions between species, Chai calculates the difference between DNA structural profiles as a measure of similarity. Single nucleotide resolution structure profiles for genomic DNA are predicted using the algorithm described in Greenbaum et. al (2007), below. Regions reported here exceed a 5% False Discovery Rate threshold. Credits The TBA multiple alignments were created by Gayle McEwen & Elliott Margulies of NHGRI. BinCons was developed by Elliott Margulies (Margulies et al. 2003). Chai was developed by Steve Parker & Tom Tullius (Boston University), Elliott Margulies(NHGRI) and Loren Hansen (NCBI). The programs Blastz and TBA, which were used to generate the alignments, were provided by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. The phylogenetic tree is based on Murphy et al. (2001). References Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Greenbaum JA, Pang B, Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007 Jun;17(6):947-53. Margulies EH, Blanchette, M, NISC Comparative Sequencing Program, Haussler, D and Green, ED. Identification and characterization of multi-species conserved sequences. Genome Res. 2003 Dec;13(12): 2507-18. Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. encodeMsaTbaDec07Viewcons Conservation TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation encodeTbaChaiConsDec07 Chai Cons Conserved Elements in TBA 36-Way Alignments in the ENCODE Regions, Chai Method Pilot ENCODE Comparative Genomics and Variation encodeTbaBinConsDec07 BinCons Conserved Elements in TBA 36-Way Alignments in the ENCODE Regions, BinCons Method Pilot ENCODE Comparative Genomics and Variation encodeMsaTbaDec07Viewalign Alignments TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation encodeTbaAlignDec07 TBA Align TBA Alignments of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation ntSssTop5p 5% Lowest S Selective Sweep Scan (S): 5% Smallest S scores Neandertal Assembly and Analysis Description This track shows regions of the human genome with a strong signal for depletion of Neandertal-derived alleles (regions from the Sel Swp Scan (S) track with S scores in the lowest 5%), which may indicate an episode of positive selection in early humans. Display Conventions and Configuration Grayscale shading is used as a rough indicator of the strength of the score; the darker the item, the stronger its negative score. The strongest negative score (-8.7011) is shaded black, and the shading lightens from dark to light gray as the negative score weakens (weakest score is -4.3202). Methods Green et al. identified single-base sites that are polymorphic among five modern human genomes of diverse ancestry (in the Modern Human Seq track) plus the human reference genome, and determined ancestral or derived state of each single nucleotide polymorphism (SNP) by comparison with the chimpanzee genome. The SNPs are displayed in the S SNPs track. The human allele states were used to estimate an expected number of derived alleles in Neandertal in the 100,000-base window around each SNP, and a measure called the S score was developed, displayed in the Sel Swp Scan (S) track, to compare the observed number of Neandertal alleles in each window to the expected number. An S score significantly less than zero indicates a reduction of Neandertal-derived alleles (or an increase of human-derived alleles not found in Neandertal), consistent with the scenario of positive selection in the human lineage since divergence from Neandertals. Genomic regions of 25,000 or more bases in which all polymorphic sites were at least 2 standard deviations below the expected value were identified, and S was recomputed on each such region. Regions with S scores in the lowest 5% (strongest negative scores) were prioritized for further analysis as described in Green et al.. Credits This track was produced at UCSC using data generated by Ed Green. References Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178

using a Hidden Markov Model (HMM). In total, fifteen states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. GM12878 - lymphoblastoid cells H1-ESC - embryonic stem cells HepG2 - hepatocellular carcinoma HUVEC - Human Umbilical Vein Endothelial Cell HMEC - Human Mammary Epithelial Cells HSMM - Normal Human Skeletal Muscle Myoblasts K562 - erythroleukemia cells NHEK - Normal Human Epidermal Keratinocytes NHLF - Normal Human Lung Fibroblasts --> Display Conventions and Configuration This track is a composite track that contains multiple subtracks. Each subtrack represents data for a different cell type and displays individually on the browser. Instructions for configuring tracks with multiple subtracks are here. The fifteen states of the HMM, their associated segment color, and the candidate annotations are as follows: State 1 - Bright Red - Active Promoter State 2 - Light Red -Weak Promoter State 3 - Purple - Inactive/poised Promoter State 4 - Orange - Strong enhancer State 5 - Orange - Strong enhancer State 6 - Yellow - Weak/poised enhancer State 7 - Yellow - Weak/poised enhancer State 8 - Blue - Insulator State 9 - Dark Green - Transcriptional transition State 10 - Dark Green - Transcriptional elongation State 11 - Light Green - Weak transcribed State 12 - Gray - Polycomb-repressed State 13 - Light Gray - Heterochromatin; low signal State 14 - Light Gray - Repetitive/Copy Number Variation State 15 - Light Gray - Repetitive/Copy Number Variation Methods ChIP-seq data from the Broad Histone track was used to generate this track. Data for nine factors plus input and nine cell types was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the nine cell types, first the genomes were concatenated across the cell types. For each of the nine cell types, each 200 base pair interval was then assigned to its most likely state under the model. Detailed information about the model parameters and state enrichments can be found in (Ernst et al, accepted). Release Notes This is release 1 (Jun 2011) of this track, and it is based on the NCBI36/hg18 release of the Broad Histone track. This track has also been lifted over to GRCh37/hg19. It is anticipated that the HMM methods will be run on the newer GRCh37/hg19 Broad Histone data and will replace the lifted version. Credits The ChIP-seq data were generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School, and the chromatin state segmentation was produced in Manolis Kellis's Computational Biology group at the Massachusetts Institute of Technology. Contact: Jason Ernst. Data generation and analysis was supported by funds from the NHGRI (ENCODE), the Burroughs Wellcome Fund, Howard Hughes Medical Institute, NSF, Sloan Foundation, Massachusetts General Hospital and the Broad Institute. References Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010 Aug;28(8):817-25. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011 May 5;473(7345):43-9. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. There is no restriction on the use of segmentation data. wgEncodeBroadHmmNhlfHMM NHLF ChromHMM NHLF Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 792 Bernstein Broad ChromHMM_ENCODEDynamicPaper wgEncodeBroadHmmNhlfHMM HMM lung fibroblasts Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in NHLF cells) Regulation wgEncodeBroadHmmNhekHMM NHEK ChromHMM NHEK Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 791 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmNhekHMM HMM epidermal keratinocytes Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in NHEK cells) Regulation wgEncodeBroadHmmHsmmHMM HSMM ChromHMM HSMM Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 787 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHsmmHMM HMM skeletal muscle myoblasts Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HSMM cells) Regulation wgEncodeBroadHmmHmecHMM HMEC ChromHMM HMEC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 786 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHmecHMM HMM mammary epithelial cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HMEC cells) Regulation wgEncodeBroadHmmHuvecHMM HUVEC ChromHMM HUVEC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 788 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHuvecHMM HMM umbilical vein endothelial cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HUVEC cells) Regulation wgEncodeBroadHmmHepg2HMM HepG2 ChromHMM HepG2 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 789 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHepg2HMM HMM hepatocellular carcinoma Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HepG2 cells) Regulation wgEncodeBroadHmmK562HMM K562 ChromHMM K562 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 790 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmK562HMM HMM leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in K562 cells) Regulation wgEncodeBroadHmmH1hescHMM H1-hESC ChromHMM H1-hESC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 785 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmH1hescHMM HMM embryonic stem cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in H1-hESC cells) Regulation wgEncodeBroadHmmGm12878HMM GM12878 ChromHMM GM12878 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 784 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmGm12878HMM HMM B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in GM12878 cells) Regulation wgEncodeBroadChipSeq Broad Histone ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation Description This track displays maps of chromatin state generated by the Broad/MGH ENCODE group using ChIP-seq. Chemical modifications (methylation, acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. The ChIP-seq method involves cross-linking histones and other DNA associated proteins to genomic DNA within cells using formaldehyde. The cross-linked chromatin is subsequently extracted, mechanically sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. ENCODE tracks typically contain one or more of the following views: Peaks Regions of signal enrichment based on processed data (usually normalized data from pooled replicates). ENCODE Peaks tables contain fields for statistical significance. Peaks for this track include a signalValue and pValue. The signalValue represents the fold enrichment of reads across the length of the interval, relative to random expectation. The pValue reflects the likelihood of observing an interval of the given length and signalValue at random. A long interval with a moderate signalValue and a short interval with a high signalValue can therefore have the same pValue. SignalDensity graph (wiggle) of signal enrichment based on processed data. Additional data that were used to generate these tracks are located in the ENCODE Mappability track: Alignability The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. Methods Cells were grown according to the approved ENCODE cell culture protocols. Chromatin immunoprecipitation was performed with each of the histone antibodies listed above. Isolated DNA was then end-repaired, adapter-ligated and sequenced using Illumina Genome Analyzers. Sequence reads from each IP experiment were aligned to the human reference genome (hg18) using MAQ. Discrete intervals of ChIP-seq fragment enrichment were identified using a scan statistics approach, assuming a uniform background signal. More details of the experimental protocol and analysis are available here. Release Notes Release 3 (Mar 2010) of this track adds the HSMM cell line and includes new experiments for H1-hESC and NHLF. No previously released data has been replaced in this release. Update to Release 3 (Jun 2010) of this track consists of a display change to the Signal subtracks. This update provides a better display of the data when zoomed in to a range spanning less than 16,500 base pairs. Release 2 did contain newer versions of previously released data, however. All versioned data are marked with "submittedDataVersion=V2" in the metadata, along with the reason for the change. Previous versions of these files are available for download from the FTP site. Please note that an antibody previously labeled "Pol2 (b)" is, in fact, Covance antibody MMS-128P with the target POLR2A. Credits The ChIP-seq data were generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School. Contact: Noam Shoresh. Data generation and analysis was supported by funds from the NHGRI, the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute. References Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, Huebert DJ, McMahon S, Karlsson EK, Kulbokas EJ 3rd, Gingeras TR et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005 Jan 28;120(2):169-81. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006 Apr 21;125(2):315-26. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007 Aug 2;448(7153):553-60. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeBroadChipSeqViewSignal Signal ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation wgEncodeBroadChipSeqSignalNhlfControl NHLF Control S Input NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 105 Bernstein Broad input wgEncodeBroadChipSeqSignalNhlfControl Signal lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (NHLF control) Regulation wgEncodeBroadChipSeqSignalNhlfH4k20me1 NHLF H4K20me1 S H4K20me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 104 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k36me3 NHLF H3K36me3 S H3K36me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 99 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k27me3 NHLF H3K27me3 S H3K27me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 98 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k27ac NHLF H3K27ac S H3K27ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 97 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k9ac NHLF H3K9ac S H3K9ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 103 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me3 NHLF H3K4me3 S H3K4me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 102 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me2 NHLF H3K4me2 S H3K4me2 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 101 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me1 NHLF H3K4me1 S H3K4me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 100 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfCtcf NHLF CTCF Sig CTCF NHLF ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 120 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, NHLF) Regulation wgEncodeBroadChipSeqSignalNhekControl NHEK Control S Input NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 72 Bernstein Broad input wgEncodeBroadChipSeqSignalNhekControl Signal epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (NHEK Control) Regulation wgEncodeBroadChipSeqSignalNhekPol2b NHEK Pol2 S Pol2(b) NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 73 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekPol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH4k20me1 NHEK H4K20me1 S H4K20me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 71 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k36me3 NHEK H3K36me3 S H3K36me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 66 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k27me3 NHEK H3K27me3 S H3K27me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 65 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k27ac NHEK H3K27ac S H3K27ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 64 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k9me1 NHEK H3K9me1 S H3K9me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 70 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k9ac NHEK H3K9ac S H3K9ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 69 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me3 NHEK H3K4me3 S H3K4me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 68 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me2 NHEK H3K4me2 S H3K4me2 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 67 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me1 NHEK H3K4me1 S H3K4me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 62 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekCtcf NHEK CTCF S CTCF NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 63 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, NHEK) Regulation wgEncodeBroadChipSeqSignalK562Control K562 Control S Input K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 52 Bernstein Broad input wgEncodeBroadChipSeqSignalK562Control Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (K562 control) Regulation wgEncodeBroadChipSeqSignalK562Pol2b K562 Pol2 S Pol2(b) K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 53 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562Pol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, K562) Regulation wgEncodeBroadChipSeqSignalK562H4k20me1 K562 H4K20me1 S H4K20me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 51 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k36me3 K562 H3K36me3 S H3K36me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 45 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k27me3 K562 H3K27me3 S H3K27me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 44 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k27ac K562 H3K27ac S H3K27ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 43 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k9me1 K562 H3K9me1 S H3K9me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 50 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k9ac K562 H3K9ac S H3K9ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 49 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me3 K562 H3K4me3 S H3K4me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 48 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me2 K562 H3K4me2 S H3K4me2 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 47 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me1 K562 H3K4me1 S H3K4me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 46 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, K562) Regulation wgEncodeBroadChipSeqSignalK562Ctcf K562 CTCF S CTCF K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 42 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, K562) Regulation wgEncodeBroadChipSeqSignalHuvecControl HUVEC Control S Input HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 60 Bernstein Broad input wgEncodeBroadChipSeqSignalHuvecControl Signal umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HUVEC control) Regulation wgEncodeBroadChipSeqSignalHuvecPol2b HUVEC Pol2 S Pol2(b) HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 61 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecPol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH4k20me1 HUVEC H4K20me1 S H4K20me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 59 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k36me3 HUVEC H3K36me3 S H3K36me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 56 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k27me3 HUVEC H3K27me3 S H3K27me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 38 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k27ac HUVEC H3K27ac S H3K27ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 55 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k9me1 HUVEC H3K9me1 S H3K9me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 58 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k9ac HUVEC H3K9ac S H3K9ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 57 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me3 HUVEC H3K4me3 S H3K4me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 41 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me2 HUVEC H3K4me2 S H3K4me2 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 40 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me1 HUVEC H3K4me1 S H3K4me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 39 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecCtcf HUVEC CTCF S CTCF HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 54 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HUVEC) Regulation wgEncodeBroadChipSeqSignalHsmmControl HSMM Control S Input HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 119 Bernstein Broad input wgEncodeBroadChipSeqSignalHsmmControl Signal skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Control, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH4k20me1 HSMM H4K20me1 S H4K20me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 118 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k36me3 HSMM H3K36me3 S H3K36me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 113 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k27me3 HSMM H3K27me3 S H3K27me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 112 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k27ac HSMM H3K27ac S H3K27ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 111 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k9ac HSMM H3K9ac S H3K9ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 117 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me3 HSMM H3K4me3 S H3K4me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 116 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me2 HSMM H3K4me2 S H3K4me2 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 115 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me1 HSMM H3K4me1 S H3K4me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 114 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmCtcf HSMM CTCF S CTCF HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 110 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HSMM) Regulation wgEncodeBroadChipSeqSignalHmecControl HMEC Control S Input HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 93 Bernstein Broad input wgEncodeBroadChipSeqSignalHmecControl Signal mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HMEC control) Regulation wgEncodeBroadChipSeqSignalHmecH4k20me1 HMEC H4K20me1 S H4K20me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 92 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k36me3 HMEC H3K36me3 S H3K36me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 78 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k27me3 HMEC H3K27me3 S H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 77 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k27ac HMEC H3K27ac S H3K27ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 76 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k9ac HMEC H3K9ac S H3K9ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 79 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me3 HMEC H3K4me3 S H3K4me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 91 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me2 HMEC H3K4me2 S H3K4me2 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 90 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me1 HMEC H3K4me1 S H3K4me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 89 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecCtcf HMEC CTCF S CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-27 75 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HMEC) Regulation wgEncodeBroadChipSeqSignalHepg2Control HepG2 Control S Input HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 84 Bernstein Broad input wgEncodeBroadChipSeqSignalHepg2Control Signal hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HepG2 control) Regulation wgEncodeBroadChipSeqSignalHepg2H4k20me1 HepG2 H4K20me1 S H4K20me1 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 96 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k36me3 HepG2 H3K36me3 S H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 81 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k27ac HepG2 H3K27ac S H3K27ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 94 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k9ac HepG2 H3K9ac S H3K9ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 83 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k4me3 HepG2 H3K4me3 S H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 95 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k4me2 HepG2 H3K4me2 S H3K4me2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 82 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2Ctcf HepG2 CTCF S CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 80 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HepG2) Regulation wgEncodeBroadChipSeqSignalH1hescControl H1ES Control S Input H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 88 Bernstein Broad input wgEncodeBroadChipSeqSignalH1hescControl Signal embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H1-hESC control) Regulation wgEncodeBroadChipSeqSignalH1hescH4k20me1 H1ES H4K20me1 S H4K20me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 87 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k36me3 H1ES H3K36me3 S H3K36me3 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 107 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k27me3 H1ES H3K27me3 S H3K27me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 74 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k9ac H1ES H3K9ac S H3K9ac H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 109 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me3 H1ES H3K4me3 S H3K4me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 86 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me2 H1ES H3K4me2 S H3K4me2 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 108 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me1 H1ES H3K4me1 S H3K4me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 106 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescCtcf H1ES CTCF S CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 85 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, H1-hESC) Regulation wgEncodeBroadChipSeqSignalGm12878Control GM128 Control S Input GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 37 Bernstein Broad input wgEncodeBroadChipSeqSignalGm12878Control Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (GM12878 control) Regulation wgEncodeBroadChipSeqSignalGm12878H4k20me1 GM128 H4K20me1 S H4K20me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 36 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k36me3 GM128 H3K36me3 S H3K36me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 32 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k27me3 GM128 H3K27me3 S H3K27me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 31 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k27ac GM128 H3K27ac S H3K27ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 30 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k9ac GM12878 H3K9ac S H3K9ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 35 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me3 GM128 H3K4me3 S H3K4me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-04 2009-10-04 28 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me2 GM128 H3K4me2 S H3K4me2 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 34 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me1 GM128 H3K4me1 S H3K4me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 33 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878Ctcf GM12878 CTCF S CTCF GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 29 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, GM12878) Regulation wgEncodeBroadChipSeqViewPeaks Peaks ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation wgEncodeBroadChipSeqPeaksNhlfH4k20me1 NHLF H4K20me1 P H4K20me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 104 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k36me3 NHLF H3K36me3 P H3K36me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 99 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k27me3 NHLF H3K27me3 P H3K27me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 98 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k27ac NHLF H3K27ac P H3K27ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 97 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k9ac NHLF H3K9ac P H3K9ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 103 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me3 NHLF H3K4me3 P H3K4me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 102 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me2 NHLF H3K4me2 P H3K4me2 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 101 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me1 NHLF H3K4me1 P H3K4me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 100 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfCtcf NHLF CTCF Pk CTCF NHLF ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 120 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhekPol2b NHEK Pol2 P Pol2(b) NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 73 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekPol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH4k20me1 NHEK H4K20me1 P H4K20me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 71 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k36me3 NHEK H3K36me3 P H3K36me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 66 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k27me3 NHEK H3K27me3 P H3K27me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 65 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k27ac NHEK H3K27ac P H3K27ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 64 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k9me1 NHEK H3K9me1 P H3K9me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 70 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k9ac NHEK H3K9ac P H3K9ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 69 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me3 NHEK H3K4me3 P H3K4me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 68 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me2 NHEK H3K4me2 P H3K4me2 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 67 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me1 NHEK H3K4me1 P H3K4me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 62 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekCtcf NHEK CTCF P CTCF NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 63 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, NHEK) Regulation wgEncodeBroadChipSeqPeaksK562Pol2b K562 Pol2 P Pol2(b) K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 53 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562Pol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, K562) Regulation wgEncodeBroadChipSeqPeaksK562H4k20me1 K562 H4K20me1 P H4K20me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 51 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k36me3 K562 H3K36me3 P H3K36me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 45 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k27me3 K562 H3K27me3 P H3K27me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 44 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k27ac K562 H3K27ac P H3K27ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 43 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k9me1 K562 H3K9me1 P H3K9me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 50 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k9ac K562 H3K9ac P H3K9ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 49 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me3 K562 H3K4me3 P H3K4me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 48 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me2 K562 H3K4me2 P H3K4me2 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 47 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me1 K562 H3K4me1 P H3K4me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 46 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562Ctcf K562 CTCF P CTCF K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 42 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, K562) Regulation wgEncodeBroadChipSeqPeaksHuvecPol2b HUVEC Pol2 P Pol2(b) HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 61 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecPol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH4k20me1 HUVEC H4K20me1 P H4K20me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 59 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k36me3 HUVEC H3K36me3 P H3K36me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 56 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k27me3 HUVEC H3K27me3 P H3K27me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 38 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k27ac HUVEC H3K27ac P H3K27ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 55 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k9me1 HUVEC H3K9me1 P H3K9me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 58 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k9ac HUVEC H3K9ac P H3K9ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 57 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me3 HUVEC H3K4me3 P H3K4me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 41 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me2 HUVEC H3K4me2 P H3K4me2 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 40 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me1 HUVEC H3K4me1 P H3K4me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 39 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecCtcf HUVEC CTCF P CTCF HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 54 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHsmmH4k20me1 HSMM H4K20me1 P H4K20me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 118 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k36me3 HSMM H3K36me3 P H3K36me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 113 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k27me3 HSMM H3K27me3 P H3K27me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 112 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k27ac HSMM H3K27ac P H3K27ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 111 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k9ac HSMM H3K9ac P H3K9ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 117 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me3 HSMM H3K4me3 P H3K4me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 116 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me2 HSMM H3K4me2 P H3K4me2 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 115 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me1 HSMM H3K4me1 P H3K4me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 114 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmCtcf HSMM CTCF P CTCF HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 110 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HSMM) Regulation wgEncodeBroadChipSeqPeaksHmecH4k20me1 HMEC H4K20me1 P H4K20me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 92 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k36me3 HMEC H3K36me3 P H3K36me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 78 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k27me3 HMEC H3K27me3 P H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 77 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k27ac HMEC H3K27ac P H3K27ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 76 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k9ac HMEC H3K9ac P H3K9ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 79 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me3 HMEC H3K4me3 P H3K4me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 91 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me2 HMEC H3K4me2 P H3K4me2 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 90 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me1 HMEC H3K4me1 P H3K4me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 89 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecCtcf HMEC CTCF P CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-27 75 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HMEC) Regulation wgEncodeBroadChipSeqPeaksHepg2H4k20me1 HepG2 H4K20me1 P H4K20me1 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 96 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k36me3 HepG2 H3K36me3 P H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 81 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k27ac HepG2 H3K27ac P H3K27ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 94 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k9ac HepG2 H3K9ac P H3K9ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 83 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k4me3 HepG2 H3K4me3 P H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 95 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k4me2 HepG2 H3K4me2 P H3K4me2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 82 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2Ctcf HepG2 CTCF P CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 80 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HepG2) Regulation wgEncodeBroadChipSeqPeaksH1hescH4k20me1 H1ES H4K20me1 P H4K20me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 87 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k36me3 H1ES H3K36me3 P H3K36me3 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 107 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k27me3 H1ES H3K27me3 P H3K27me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 74 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k9ac H1ES H3K9ac P H3K9ac H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 109 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me3 H1ES H3K4me3 P H3K4me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 86 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me2 H1ES H3K4me2 P H3K4me2 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 108 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me1 H1ES H3K4me1 P H3K4me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 106 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescCtcf H1ES CTCF P CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 85 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksGm12878H4k20me1 GM128 H4K20me1 P H4K20me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 36 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k36me3 GM128 H3K36me3 P H3K36me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 32 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k27me3 GM128 H3K27me3 P H3K27me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 31 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k27ac GM128 H3K27ac P H3K27ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 30 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k9ac GM12878 H3K9ac P H3K9ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 35 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me3 GM128 H3K4me3 P H3K4me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-04 2009-10-04 28 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me2 GM128 H3K4me2 P H3K4me2 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 34 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me1 GM128 H3K4me1 P H3K4me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 33 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878Ctcf GM12878 CTCF P CTCF GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 29 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, GM12878) Regulation encodeBuFirstExon BU First Exon Boston University First Exon Activity Pilot ENCODE Transcription Description This track displays expression levels of computationally identified first exons and a constitutive exon of genes in ENCODE regions, based on the real competitive Polymerase Chain Reaction (rcPCR) technique described in Ding et al. (2003). Expression levels are indicated by color, ranging from black (no expression) to red (high expression). Experiments were performed on total RNA samples of ten normal human tissues purchased from Clontech (Palo Alto, CA): cerebral cortex, colon, heart, kidney, liver, lung, skeletal muscle, spleen, stomach, and testis. The name for each alternative transcript starts with the gene name, followed by an identifier for the alternative first exon or the constitutive exon. For example, for gene CAV1, there are three alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the third exon is chosen as the constitutively expressed exon (CAV1-E3). Methods Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted using PromoSer, an in-house computational tool. PromoSer computationally identifies the TSS by considering alignments of a large number of partial and full-length mRNA sequences and ESTs to genomic DNA, with provision for alternative promoters. In PromoSer, the treatment of alternative first exons (or the resulting TSSs) is as follows: all transcripts (mRNA, full-length mRNA and EST) from the same gene cluster are examined individual ESTs are not considered for alternative TSSs; only the 5'-most positions from all ESTs in the cluster are considered a potential TSS if multiple 5'-end positions are more than 20 bp apart, they are reported as alternative TSSs For each gene, all alternative first exons were identified based on manual selection of PromoSer predictions. An exon that is shared by all transcripts (called the constitutive exon) was also selected. The selection process involved visually examining the structure of the cluster, preferably using the latest data available on UCSC, to identify distinct first exons that were well formed (having multiple supporting sequences) and had no evidence (especially from newer sequences) of additional sequence that made them internal exons. After the first exon was identified, a subsequence (between 100-300 bases) was selected for use in the experiment. The selection process avoided repeat sequences as much as possible and if the two first exons partially overlapped, the non-overlapping region was selected. If those conditions caused the remaining sequence to be too short (or the first exon itself was too short), a junction with the second exon was used. A constitutive exon was also selected that was included in all (or most) of the alternative transcripts and suitable sequences were then extracted as above (no exon junctions are used). The absolute expression levels of all exons were individually quantified by rcPCR by designing four assays with PCR amplicons corresponding to each exon. Amplicons were designed according to transcript sequences and can span a large distance on the genomic sequence. In addition, some amplicons were designed across the junctions between first exons and the constitutive second exons, and thus these amplicons may overlap with the amplicons that correspond to the constitutive second exons. The rcPCR technique combined competitive PCR and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) for gene expression analysis. To measure the expression level of a gene, an oligonucleotide standard (60-80 bases) of known concentration, complementary to the target sequence with a single base mismatch in the middle, was added as the competitor for PCR. The gene of interest and the oligonucleotide standard resembled two alleles of a heterozygous locus in an allele frequency analysis experiment, and thus could be quantified by the high-throughput MALDI-TOF MS based MassARRAY system (Sequenom Inc.). After PCR, a base extension reaction was carried out with an extension primer, a ThermoSequenase and a mixture of ddNTPs/dNTP (for example, a mixture of ddA, ddC, ddT, and dG). The extension primer annealed the immediate 5’-upstream sequence of the mismatch position. Depending on the nature of the mismatch and the mixture composition of ddNTPs/dNTP, one or two bases were added to the extension primer, producing two extension products with one base-length difference. These two extension products were then detected and quantified by MALDI-TOF MS. Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3, CAV1-E1C/CAV1-E3) indicate the relative abundance of alternative first exons. 18S rRNA was used for exon absolute expression normalization among different tissues. Values shown on this track represent the relative abundance of the alternative first exons with respect to the 18S rRNA. The raw values have been log10 transformed and scaled to show graded colors on the browser. Verification One biological replicate was performed for each gene. Two to four competitor concentrations were used to detect the expression level of each exon. Two to six technical replicates were performed for each competitor concentration. One more biological replicate will be performed in the future. Credits Data generation and analysis for this track were performed by ZLAB at Boston University. The following people contributed: Shengnan Jin, Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming Ding, Charles R. Cantor, and Zhiping Weng. References Ding, C. and Cantor, C.R. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100(6), 3059-64 (2003). Ding, C. and Cantor, C.R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc Natl Acad Sci U S A 100(13), 7449-53 (2003). Halees, A.S., Leyfer, D. and Weng, Z. PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res. 31(13), 3554-9 (2003). Halees, A.S. and Weng, Z. PromoSer: improvements to the algorithm, visualization and accessibility. Nucleic Acids Res., 32, W191-W194 (2004). encodeBuFirstExonTestis BU Testis Boston University First Exon Activity in Testis Pilot ENCODE Transcription encodeBuFirstExonStomach BU Stomach Boston University First Exon Activity in Stomach Pilot ENCODE Transcription encodeBuFirstExonSpleen BU Spleen Boston University First Exon Activity in Spleen Pilot ENCODE Transcription encodeBuFirstExonSkMuscle BU Skel. Muscle Boston University First Exon Activity in Skeletal Muscle Pilot ENCODE Transcription encodeBuFirstExonLung BU Lung Boston University First Exon Activity in Lung Pilot ENCODE Transcription encodeBuFirstExonLiver BU Liver Boston University First Exon Activity in Liver Pilot ENCODE Transcription encodeBuFirstExonKidney BU Kidney Boston University First Exon Activity in Kidney Pilot ENCODE Transcription encodeBuFirstExonHeart BU Heart Boston University First Exon Activity in Heart Pilot ENCODE Transcription encodeBuFirstExonColon BU Colon Boston University First Exon Activity in Colon Pilot ENCODE Transcription encodeBuFirstExonCerebrum BU Cere. Cortex Boston University First Exon Activity in Cerebral Cortex Pilot ENCODE Transcription wgEncodeBuOrchid BU ORChID ENCODE Boston Univ (Tullius Lab) ORChID Predicted DNA Cleavage Sites Mapping and Sequencing Description This set of tracks displays the predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the genome. Because the hydroxyl radical cleavage intensity is proportional to the solvent accessible surface area of the deoxyribose hydrogen atoms (Balasubramanian et al., 1998), these tracks represent a structural profile of the DNA in the genome. For additional details, please visit the Tullius lab website. Display Conventions and Configuration These tracks may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page. For more information, click the Graph configuration help link. In the full and pack display modes, positive intensity values are shown in red and negative intensity values are shown in tan. In the squish and dense display modes, intensity is represented in grayscale (the darker the shading, the higher the intensity). To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods Hydroxyl radical cleavage intensity predictions were performed using an in-house sliding tetramer window (STW) algorithm. This algorithm draws data from the ·OH Radical Cleavage Intensity Database (ORChID), which contains more than 150 experimentally determined cleavage patterns. The ORChID Version 1 predictions are performed on the + strand of the DNA sequence. These predictions are fairly accurate, with a Pearson coefficient of 0.88 between the predicted and experimentally determined cleavage intensities. For ORChID Version 2, two predictions are performed, one on the + strand and the other on the - strand, and then the average of the predicted cleavage intensity for nucleotides in close proximity across the minor groove is presented. For more details on the hydroxyl radical cleavage method, see below for reference (Greenbaum et al. 2007). Verification The STW algorithm has been cross-validated by removing each test sequence from the training set and performing a prediction. The mean correlation coefficient (between predicted and experimental cleavage patterns) from this study was 0.88. Credits These data were generated at Boston University and NHGRI. Contact: Tom Tullius These data are the result of the combined efforts of Bo Pang (now at MIT), Jason Greenbaum (now at The La Jolla Institute for Allergy and Immunology), Steve Parker and Elliott Margulies at The National Human Genome Research Institute, National Institutes of Health, and Eric Bishop and Tom Tullius at Boston University. References Balasubramanian B, Pogozelski WK, and Tullius TD. DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. USA. 1998 Aug 18;95(17):9738-43. Price MA, and Tullius TD. Using the Hydroxyl Radical to Probe DNA Structure. Meth. Enzymol. 1992;212:194-219. Tullius TD. Probing DNA Structure with Hydroxyl Radicals. Curr Protoc Nucleic Acid Chem. 2002 Feb;Chapter 6:Unit 6.7. Review. Greenbaum JA, Pang B, and Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007 Jun;17(6):947-53. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeBuOrchidSignalView Signal ENCODE Boston Univ (Tullius Lab) ORChID Predicted DNA Cleavage Sites Mapping and Sequencing wgEncodeBuOrchidSignalRep2Gm12878 ORChID V2 Orchid ENCODE Jan 2010 Freeze 2010-01-24 2010-10-24 1 Tullius BU 2 wgEncodeBuOrchidSignalRep2Gm12878 Signal ORChID DNA Cleavage Tullius Tullius - Boston University Signal ENCODE Boston Univ. OH Radical Cleavage Intensity Database (ORChID) V2 Mapping and Sequencing wgEncodeBuOrchidSignalRep1Gm12878 ORChID V1 Orchid ENCODE Jan 2010 Freeze 2010-01-24 2010-10-24 1 Tullius BU 1 wgEncodeBuOrchidSignalRep1Gm12878 Signal ORChID DNA Cleavage Tullius Tullius - Boston University Signal ENCODE Boston Univ. OH Radical Cleavage Intensity Database (ORChID) V1 Mapping and Sequencing burgeRnaSeqGemMapperAlign Burge RNA-seq Burge Lab RNA-seq Aligned by GEM Mapper Expression Description RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq was performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing on an Illumina Genome Analyser. This track shows the RNA-seq data published by Chris Burge's lab (Wang et al.,2008) mapped to the genome using GEM Mapper by the Guigó lab at the Center for Genomic Regulation (CRG). The subtracks display RNA-seq data from various tissues/cell lines: Brain Liver Heart Muscle Colon Adipose Testes Lymph Node Breast BT474 - Breast Tumour Cell Line HME - Human Mammary Epithelial Cell Line MCF7 - Breast Adenocarcinoma Cell Line MB-435 - Breast Ductal Adenocarcinoma Cell Line* T-47D - Breast Ductal Carcinoma Cell Line Tissues were obtained from unrelated anonymous donors. HME is a mammary epithelial cell line immortalized with telomerase reverse transcriptase (TERT). The other cell lines are breast cancer cell lines produced from invasive ductal carcinomas (ATCC). *NOTE: studies have shown that the MDA-MB-435 cell line appears to have been contaminated with the M14 melanoma cell line. See this entry on the American Type Culture Collection (ATCC) website for more details. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: Raw Signal Density graph (bedGraph) of signal enrichment based on a normalized aligned read density (counts per million mapped reads for each subtrack). This normalized measure assists in visualizing the relative amount of a given transcript across multiple samples. Alignments The Alignments view shows reads mapped to the genome. Methods The group at CRG obtained RNA-seq reads, generated by Wang et al. (2008), from the Short Read Archive section of GEO at NCBI under accession number GSE12946. Using their GEM mapper program, CRG mapped the RNA-seq reads to the genome and transcriptome (GENCODE Release 2b, February 2009 Freeze). GEM mapper was run using default parameters and allowing up to two mismatches in the read alignments. Since mapping to the transcriptome depends on length of the reads mapped, reads were only mapped for the 14 tissues or cell lines where reads were of length 32 bp. This excluded reads from MAQC human cell lines (mixed human brain) and MAQC UHR (mixed human cell lines). Credits These data were generated by Chris Burge's lab at the Massachusetts Institute of Technology and by Roderic Guigó's lab at the Center for Genomic Regulation (CRG) in Barcelona, Spain. GTF files of the mapped data were provided by Thomas Derrien and Paolo Ribeca from CRG. GEM mapper software can be obtained here. References Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008 Nov 27;456(7221):470-6. burgeRnaSeqGemMapperAlignViewRawSignal All Raw Signal Burge Lab RNA-seq Aligned by GEM Mapper Expression burgeRnaSeqGemMapperAlignTestesAllRawSignal RNA-seq Testes Sig Burge Lab RNA-seq 32mer Reads from Testes, Raw Signal Expression burgeRnaSeqGemMapperAlignSkelMuscleAllRawSignal RNA-seq Muscle Sig Burge Lab RNA-seq 32mer Reads from Skeletal Muscle, Raw Signal Expression burgeRnaSeqGemMapperAlignLymphNodeAllRawSignal RNA-seq Lymph Node Sig Burge Lab RNA-seq 32mer Reads from Lymph Node, Raw Signal Expression burgeRnaSeqGemMapperAlignLiverAllRawSignal RNA-seq Liver Sig Burge Lab RNA-seq 32mer Reads from Liver, Raw Signal Expression burgeRnaSeqGemMapperAlignHeartAllRawSignal RNA-seq Heart Sig Burge Lab RNA-seq 32mer Reads from Heart, Raw Signal Expression burgeRnaSeqGemMapperAlignColonAllRawSignal RNA-seq Colon Sig Burge Lab RNA-seq 32mer Reads from Colon, Raw Signal Expression burgeRnaSeqGemMapperAlignBreastAllRawSignal RNA-seq Breast Sig Burge Lab RNA-seq 32mer Reads from Breast, Raw Signal Expression burgeRnaSeqGemMapperAlignBrainAllRawSignal RNA-seq Brain Sig Burge Lab RNA-seq 32mer Reads from Brain, Raw Signal Expression burgeRnaSeqGemMapperAlignAdiposeAllRawSignal RNA-seq Adipose Sig Burge Lab RNA-seq 32mer Reads from Adipose, Raw Signal Expression burgeRnaSeqGemMapperAlignT47DAllRawSignal RNA-seq T47D Sig Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignMCF7AllRawSignal RNA-seq MCF7 Sig Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignMB435AllRawSignal RNA-seq MB435 Sig Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignHMEAllRawSignal RNA-seq HME Sig Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignBT474AllRawSignal RNA-seq BT474 Sig Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumour Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignViewAlignments Alignments Burge Lab RNA-seq Aligned by GEM Mapper Expression burgeRnaSeqGemMapperAlignTestes RNA-seq Testes Burge Lab RNA-seq 32mer Reads from Testes Expression burgeRnaSeqGemMapperAlignSkelMuscle RNA-seq Muscle Burge Lab RNA-seq 32mer Reads from Skeletal Muscle Expression burgeRnaSeqGemMapperAlignLymphNode RNA-seq Lymph Node Burge Lab RNA-seq 32mer Reads from Lymph Node Expression burgeRnaSeqGemMapperAlignLiver RNA-seq Liver Burge Lab RNA-seq 32mer Reads from Liver Expression burgeRnaSeqGemMapperAlignHeart RNA-seq Heart Burge Lab RNA-seq 32mer Reads from Heart Expression burgeRnaSeqGemMapperAlignColon RNA-seq Colon Burge Lab RNA-seq 32mer Reads from Colon Expression burgeRnaSeqGemMapperAlignBreast RNA-seq Breast Burge Lab RNA-seq 32mer Reads from Breast Expression burgeRnaSeqGemMapperAlignBrain RNA-seq Brain Burge Lab RNA-seq 32mer Reads from Brain Expression burgeRnaSeqGemMapperAlignAdipose RNA-seq Adipose Burge Lab RNA-seq 32mer Reads from Adipose Expression burgeRnaSeqGemMapperAlignT47D RNA-seq T47D Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line Expression burgeRnaSeqGemMapperAlignMCF7 RNA-seq MCF7 Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line Expression burgeRnaSeqGemMapperAlignMB435 RNA-seq MB435 Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line Expression burgeRnaSeqGemMapperAlignHME RNA-seq HME Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line Expression burgeRnaSeqGemMapperAlignBT474 RNA-seq BT474 Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumor Cell Line Expression wgEncodeCaltechRnaSeq Caltech RNA-seq GSE23316 ENCODE Caltech RNA-seq Expression Description This track is produced as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing, which was done here on an Illumina Genome Analyzer (GA2) (Mortazavi et al., 2008). The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-Seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-Seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The "randomly primed" reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts. These tracks show 1x32 n.t. or 2x75 n.t. or 1x75 n.t. directed sequence reads of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The 32 n.t. sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs. The 1x75D n.t. reads are strand-specific reads. The 2x75 n.t. reads were mapped serially, first with the Bowtie program (Langmead et al., 2009) against the genome and UCSC known-gene splice junctions (Splice Sites). Bowtie-unmapped reads were then mapped using BLAT to find evidence of novel splicing, by requiring at least 10 bp on the short-side of the splice. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: Plus Raw Signal Density graph (wiggle) of signal enrichment on the positive strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Minus Raw Signal Density graph (wiggle) of signal enrichment on the negative strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Raw Signal Density graph (wiggle) of signal enrichment based on a normalized aligned read density (RPKM) for non strand-specific reads. The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Splice Sites RNA-seq tags aligning to mRNA splice sites. Alignments The Alignments view shows reads mapped to the genome. Alignments are colored by cell type. Methods Cells were grown according to the approved ENCODE cell culture protocols. The cells (either 2 X 107 or 4 X 107 cells — GM12878 and K562, and 8 X 107 cells HepG2) were lysed in either 4mls (GM12878 and K562) or 12 mls (HepG2) of RLT buffer (Qiagen RNEasy kit), and processed on either 2 (GM12878 and K562) or 3 (HepG2) RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNAse digestion step to remove residual genomic DNA. 75 µgs of total RNA was selected twice with oligodT beads (Dynal) according to the manufacturer's protocol to isolate mRNA from each of the preparations. 100 ngs of mRNA was then processed according to the protocol in Mortazavi et al (2008), and prepared for sequencing on the Genome Analyzer flow cell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). Following alignment of the sequence reads to the genome assembly as described above, the sequence reads were further analyzed using the ERANGE 3.0 software package, which quantifies the number of reads falling within the mapped boundaries of known transcripts from the Gencode annotations. ERANGE assigns both genomically unique reads and reads that occur in 2-10 genomic locations for quantification. ERANGE also contains a subroutine (RNAFAR) which allows the consolidation of reads that align close to, but outside the mapped borders of known transcripts, and the identification of novel transcribed regions of the genome using either a 20 kb radius for the 1x32 datasets or paired-end information for 2x75 datasets. For 2x75 datasets, raw Illumina reads (RawData files on the download page, fasta format) are run through bowtie 0.9.8.1 with up to 2 mismatches and the resulting mappings are stored (RawData2 files, bowtie format) for up to ten matches per-read to the genome, spiked controls and UCSC knownGene splice junctions. Reads that were not mapped by bowtie (RawData3 files, fasta format) are then mapped onto the genome using blat and filtered using pslReps (RawData4 files, psl format). The bowtie and blat mappings are then analyzed by ERANGE3.0.2 to generate wiggles (RawSignal view, wiggle format), bed files of all reads and splices (Alignments and Paired Alignments views, bed format), all bowtie and blat splices (Splice Sites view, bed format) and blat-only splices (Splice Sites view, bed format), as well as RPKM expression level measurements at the gene-level (RawData5 files, rpkm format), exon-level (RawData6 files, rpkm format), and candidate novel exons (RawData7 files, rpkm format). Fasta files for splice sites (hg18splice75.fa.gz) and spikes (spikes.fa.gz) can be found on the downloads page. Verification Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads. Known spliced exons are detected at the expected frequency for transcripts of given abundance. Linear range detection of spiked in RNA transcripts from Arabidopsis and phage lambda over 5 orders of magnitude. Endpoint RTPCR confirms presence of selected RNAFAR 3′UTR extensions. Correlation to published microarray data r = 0.62 Release Notes This is release 2 of the Caltech RNA-seq track. This release adds five new cell types: H1-hESC, HeLa-S3, HepG2, HUVEC, and NHEK. Also, stranded 75 nt reads are now provided for each cell type. Credits Wold Group: Ali Mortazavi, Brian Williams, Diane Trout, Brandon King, Ken McCue, Lorian Schaeffer. Myers Group: Norma Neff, Florencia Pauli, Fan Zhang, Tim Reddy, Rami Rauch. Illumina gene expression group: Gary Schroth, Shujun Luo, Eric Vermaas. Contacts: Diane Trout (informatics) and Brian Williams (experimental). References Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-Seq Nature Methods. 2008 Jul; 5(7):621-628. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology. 2009 Mar; 10:R25. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCaltechRnaSeqViewSplices Splice Sites ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R2x75 NHEK 2x75 SW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R2x75 Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R1x75d NHEK 1x75D SW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R1x75d Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBlat34R1x75d NHEK 1x75D SL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBlat34R1x75d Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R2x75 HUVEC 2x75 SW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R2x75 Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R2x75 HUVEC 2x75 SW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R2x75 Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R1x75d HUVEC 1x75D SW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBlat34R1x75d HUVEC 1x75D SL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBlat34R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R1x75d HUVEC 1x75D SW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBlat34R1x75d HUVEC 1x75D SL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBlat34R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R2x75 HepG2 2x75 SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R2x75 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R2x75 HepG2 2x75 SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R2x75 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x75d HepG2 1x75D SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBlat34R1x75d HepG2 1x75D SL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBlat34R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x75d HepG2 1x75D SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBlat34R1x75d HepG2 1x75D SL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBlat34R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x32 HepG2 1x32 SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x32 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x32 HepG2 1x32 SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x32 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R2x75 HeLa3 2x75 SW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R2x75 Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R2x75 HeLa3 2x75 SW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R2x75 Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R1x75d HeLa3 1x75D SW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBlat34R1x75d HeLa3 1x75D SL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBlat34R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R1x75d HeLa3 1x75D SW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBlat34R1x75d HeLa3 1x75D SL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBlat34R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBb12x75 K562 2x75 SW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBb12x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBlat342x75 K562 2x75 SL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/blat34 cell blat34 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBlat342x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBb12x75 K562 2x75 SW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBb12x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBlat342x75 K562 2x75 SL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/blat34 cell blat34 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBlat342x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBb2R1x75d K562 1x75D SW2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBb2R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBlat34R1x75d K562 1x75D SL2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBlat34R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBb2R1x75d K562 1x75D SW1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBb2R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBlat34R1x75d K562 1x75D SL1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBlat34R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBow0981x32 K562 1x32 SW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBow0981x32 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBow0981x32 K562 1x32 SW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBow0981x32 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep4H1hescCellPapBb2R2x75 H1ESC 2x75 SW4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 4 polyA wgEncodeCaltechRnaSeqSplicesRep4H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep3H1hescCellPapBb2R2x75 H1ESC 2x75 SW3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 3 polyA wgEncodeCaltechRnaSeqSplicesRep3H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R2x75 H1ESC 2x75 SW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75 H1ESC 2x75 SW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75Il400 H1ESC 2x75 S41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75Il400 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R1x75d H1ESC 1x75D SW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBlat34R1x75d H1ESC 1x75D SL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBlat34R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R1x75d H1ESC 1x75D SW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBlat34R1x75d H1ESC 1x75D SL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBlat34R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 SW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBb12x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBlat342x75 GM128 2x75 SL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/blat34 cell blat34 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBlat342x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R2x75Il400 GM128 2x75 S42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R2x75Il400 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 SW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBb12x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBlat342x75 GM128 2x75 SL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/blat34 cell blat34 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBlat342x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R1x75d GM128 1x75D SW2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBlat34R1x75d GM128 1x75D SL2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBlat34R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBb2R1x75d GM128 1x75D SW1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBb2R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBlat34R1x75d GM128 1x75D SL1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBlat34R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBow0981x32 GM128 1x32 SW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBow0981x32 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBow0981x32 GM128 1x32 SW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBow0981x32 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqViewRawSignal Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqRawSignalRep1NhekCellPapBb2R2x75 NHEK 2x75 RW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1NhekCellPapBb2R2x75 RawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2HuvecCellPapBb2R2x75 HUVEC 2x75 RW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2HuvecCellPapBb2R2x75 RawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1HuvecCellPapBb2R2x75 HUVEC 2x75 RW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1HuvecCellPapBb2R2x75 RawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R2x75 HepG2 2x75 RW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R2x75 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R2x75 HepG2 2x75 RW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R2x75 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R1x32 HepG2 1x32 RW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R1x32 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R1x32 HepG2 1x32 RW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R1x32 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Helas3CellPapBb2R2x75 HeLa3 2x75 RW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Helas3CellPapBb2R2x75 RawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Helas3CellPapBb2R2x75 HeLa3 2x75 RW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Helas3CellPapBb2R2x75 RawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaBb12x75 K562 2x75 RW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaBb12x75 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaBb12x75 K562 2x75 RW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaBb12x75 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaErng3b1x32 K562 1x32 RW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaErng3b1x32 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaErng3b1x32 K562 1x32 RW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaErng3b1x32 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep4H1hescCellPapBb2R2x75 H1ESC 2x75 RW4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 4 polyA wgEncodeCaltechRnaSeqRawSignalRep4H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep3H1hescCellPapBb2R2x75 H1ESC 2x75 RW3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 3 polyA wgEncodeCaltechRnaSeqRawSignalRep3H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2H1hescCellPapBb2R2x75 H1ESC 2x75 RW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75 H1ESC 2x75 RW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75Il400 H1ESC 2x75 R41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75Il400 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 RW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaBb12x75 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellPapBb2R2x75Il400 GM128 2x75 R42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellPapBb2R2x75Il400 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 RW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaBb12x75 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaErng3b1x32 GM128 1x32 RW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaErng3b1x32 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaErng3b1x32 GM128 1x32 RW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaErng3b1x32 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqViewPlusRawSignal Plus Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1NhekCellPapBb2R1x75d NHEK 1x75D +S1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1NhekCellPapBb2R1x75d PlusRawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2HuvecCellPapBb2R1x75d HUVEC 1x75D +S2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2HuvecCellPapBb2R1x75d PlusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1HuvecCellPapBb2R1x75d HUVEC 1x75D +S1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1HuvecCellPapBb2R1x75d PlusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Hepg2CellPapBb2R1x75d HepG2 1x75D +S2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Hepg2CellPapBb2R1x75d PlusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Hepg2CellPapBb2R1x75d HepG2 1x75D +S1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Hepg2CellPapBb2R1x75d PlusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Helas3CellPapBb2R1x75d HeLa3 1x75D +S2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Helas3CellPapBb2R1x75d PlusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Helas3CellPapBb2R1x75d HeLa3 1x75D +S1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Helas3CellPapBb2R1x75d PlusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2K562CellPapBb2R1x75d K562 1x75D +S2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2K562CellPapBb2R1x75d PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1K562CellPapBb2R1x75d K562 1x75D +S1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1K562CellPapBb2R1x75d PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2H1hescCellPapBb2R1x75d H1ESC 1x75D +S2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2H1hescCellPapBb2R1x75d PlusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1H1hescCellPapBb2R1x75d H1ESC 1x75D +S1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1H1hescCellPapBb2R1x75d PlusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Gm12878CellPapBb2R1x75d GM128 1x75D +S2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Gm12878CellPapBb2R1x75d PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Gm12878CellPapBb2R1x75d GM128 1x75D +S1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Gm12878CellPapBb2R1x75d PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqViewMinusRawSignal Minus Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1NhekCellPapBb2R1x75d NHEK 1x75D -S1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1NhekCellPapBb2R1x75d MinusRawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2HuvecCellPapBb2R1x75d HUVEC 1x75D -S2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2HuvecCellPapBb2R1x75d MinusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1HuvecCellPapBb2R1x75d HUVEC 1x75D -S1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1HuvecCellPapBb2R1x75d MinusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Hepg2CellPapBb2R1x75d HepG2 1x75D -S2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Hepg2CellPapBb2R1x75d MinusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Hepg2CellPapBb2R1x75d HepG2 1x75D -S1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Hepg2CellPapBb2R1x75d MinusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Helas3CellPapBb2R1x75d HeLa3 1x75D -S2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Helas3CellPapBb2R1x75d MinusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Helas3CellPapBb2R1x75d HeLa3 1x75D -S1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Helas3CellPapBb2R1x75d MinusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2K562CellPapBb2R1x75d K562 1x75D -S2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2K562CellPapBb2R1x75d MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1K562CellPapBb2R1x75d K562 1x75D -S1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1K562CellPapBb2R1x75d MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2H1hescCellPapBb2R1x75d H1ESC 1x75D -S2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2H1hescCellPapBb2R1x75d MinusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1H1hescCellPapBb2R1x75d H1ESC 1x75D -S1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1H1hescCellPapBb2R1x75d MinusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Gm12878CellPapBb2R1x75d GM128 1x75D -S2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Gm12878CellPapBb2R1x75d MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Gm12878CellPapBb2R1x75d GM128 1x75D -S1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Gm12878CellPapBb2R1x75d MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqViewAligns Alignments ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75 NHEK 2x75 AL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75 Alignments epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1NhekCellPapBb2R1x75d NHEK 1x75D AL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1NhekCellPapBb2R1x75d Alignments epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2HuvecCellPapErng32aR2x75 HUVEC 2x75 AL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2HuvecCellPapErng32aR2x75 Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75 HUVEC 2x75 AL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75 Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2HuvecCellPapBb2R1x75d HUVEC 1x75D AL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2HuvecCellPapBb2R1x75d Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1HuvecCellPapBb2R1x75d HUVEC 1x75D AL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1HuvecCellPapBb2R1x75d Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75 HepG2 2x75 AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Paired Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR2x75 HepG2 2x75 AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR2x75 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Hepg2CellPapBb2R1x75d HepG2 1x75D AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Hepg2CellPapBb2R1x75d Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Hepg2CellPapBb2R1x75d HepG2 1x75D AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Hepg2CellPapBb2R1x75d Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR1x32 HepG2 1x32 AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 1x32 2 polyA wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR1x32 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR1x32 HepG2 1x32 AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 1x32 1 polyA wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR1x32 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75 HeLa3 2x75 AL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75 Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Helas3CellPapErng32aR2x75 HeLa3 2x75 AL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Helas3CellPapErng32aR2x75 Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Helas3CellPapBb2R1x75d HeLa3 1x75D AL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Helas3CellPapBb2R1x75d Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Helas3CellPapBb2R1x75d HeLa3 1x75D AL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Helas3CellPapBb2R1x75d Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2K562CellLongpolyaBb12x75 K562 2x75 AL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2K562CellLongpolyaBb12x75 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1K562CellLongpolyaBb12x75 K562 2x75 AL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1K562CellLongpolyaBb12x75 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2K562CellPapBb2R1x75d K562 1x75D AL2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2K562CellPapBb2R1x75d Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1K562CellPapBb2R1x75d K562 1x75D AL1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1K562CellPapBb2R1x75d Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2K562CellLongpolyaBow0981x32 K562 1x32 AL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqAlignsRep2K562CellLongpolyaBow0981x32 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1K562CellLongpolyaBow0981x32 K562 1x32 AL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqAlignsRep1K562CellLongpolyaBow0981x32 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75 H1ESC 2x75 AL4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 4 polyA wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep3H1hescCellPapErng32aR2x75 H1ESC 2x75 AL3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 3 polyA wgEncodeCaltechRnaSeqPairedRep3H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2H1hescCellPapErng32aR2x75 H1ESC 2x75 AL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75 H1ESC 2x75 AL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75Il400 H1ESC 2x75 A41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75Il400 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2H1hescCellPapBb2R1x75d H1ESC 1x75D AL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2H1hescCellPapBb2R1x75d Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1H1hescCellPapBb2R1x75d H1ESC 1x75D AL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1H1hescCellPapBb2R1x75d Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 AL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Gm12878CellLongpolyaBb12x75 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Gm12878CellPapErng32aR2x75Il400 GM128 2x75 A42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Gm12878CellPapErng32aR2x75Il400 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 AL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Gm12878CellLongpolyaBb12x75 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellPapBb2R1x75d GM128 1x75D AL2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellPapBb2R1x75d Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellPapBb2R1x75d GM128 1x75D AL1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellPapBb2R1x75d Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellLongpolyaBow0981x32 GM128 1x32 AL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellLongpolyaBow0981x32 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellLongpolyaBow0981x32 GM128 1x32 AL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellLongpolyaBow0981x32 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Aligns Expression ntOoaHaplo Cand. Gene Flow Candidate Regions for Gene Flow from Neandertal to Non-African Modern Humans Neandertal Assembly and Analysis Description This track shows 13 regions of the human genome in which there is considerably more haplotype diversity among non-African genomes than within African genomes. A prediction of Neandertal-to-modern human gene flow is that these deeply divergent haplotypes which exist only in non-African populations entered the human gene pool from Neandertals. Of the 12 candidate gene flow regions with tag SNP data, there are 10 regions in which Neandertals match the deep haplotype clade unique to non-Africans (out of Africa, OOA) instead of the cosmopolitan haplotype clade shared by Africans and non-Africans (cosmopolitan, COS). The table below was copied from Table 5, "Non-African haplotypes match Neandertal at an unexpected rate", from Green et al.: RegionGenomic SizeST AverageFrequencyin OOAAMDMANDN QualitativeAssessment chr1:168,110,001-168,220,000110,0002.96.3%51010OOA chr1:223,760,001-223,910,000150,0002.86.3%1400OOA chr4:171,180,001-171,280,000100,0001.95.2%1200OOA chr5:28,950,001-29,070,000120,0003.83.1%161660OOA chr6:66,160,001-66,260,000100,0005.728.1%6600OOA chr9:32,940,001-33,040,000100,0002.84.2%71400OOA chr10:4,820,001-4,920,000100,0002.69.4%9500OOA chr10:38,000,001-38,160,000160,0003.58.3%5920OOA chr10:69,630,001-69,740,000110,0004.219.8%2201OOA chr15:45,250,001-45,350,000100,0002.51.1%5610OOA chr17:35,500,001-35,600,000100,0002.9(no tags)n/an/an/an/an/a chr20:20,030,001-20,140,000110,0005.164.6%00105COS chr22:30,690,001-30,820,000130,0003.54.2%0252COS ST = estimated ratio of OOA/African gene tree depth. Average Frequency in OOA = average (across tag SNPs in the region) of the population frequency in the 48 OOA individuals of the OOA-only allele for each tag SNP. AM = Neandertal has ancestral allele and matches OOA-specific clade. DM = Neandertal has derived allele and matches OOA-specific clade. AN = Neandertal has ancestral allele and does not match OOA-specific clade. DN = Neandertal has derived allele and does not match OOA-specific clade. Display Conventions and Configuration A region is colored green if its qualitative assessment is OOA, blue if COS, and gray if unknown (no tag SNPs in region). Methods Green et al. used 1,263,750 Perlegen Class A SNPs, identified in 71 individuals of diverse ancestry (see Hinds et al.), to identify 13 candidate gene flow regions (Supplemental Online Materials Text 17). 24 individuals of European ancestry and 24 individuals of Han Chinese ancestry were used to represent the non-African population, and the remaining 23 individuals, of African American ancestry, were used to represent the African population. From the 1,263,750 Perlegen Class A SNPs, they identified 166 tag SNPs that separate (see below) 12 of the haplotype clades in non-Africans (OOA) from the cosmopolitan haplotype clades shared between Africans and non-Africans (COS) and for which they had data from the Neandertals. Of the 13 regions, one had no tag SNPs so could not be assessed, two were COS, and 10 were OOA (see final column Table 1). Overall, the Neandertals match the deep clade unique to non-Africans (OOA) at 133 of the 166 tag SNPs. They assessed the rate at which Neandertal matches each of these clades by further subdividing the 133 tag SNPs based on their ancestral or derived status in Neandertal and whether they matched the OOA-specific clade or not. Candidate regions were qualitatively assessed to be OOA matches for Neandertal when the proportion of tag SNPs matching the OOA-specific clade is much more than 50%. Credits This track was produced at UCSC using data generated by Ed Green. References Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science. 2005 Feb 18;307(5712):1072-9. PMID: 15718463 ccdsGene CCDS Consensus CDS Genes and Gene Predictions Description This track shows human genome high-confidence gene annotations from the Consensus Coding Sequence (CCDS) project. This project is a collaborative effort to identify a core set of human protein-coding regions that are consistently annotated and of high quality. The long-term goal is to support convergence towards a standard set of gene annotations on the human genome. Collaborators include: European Bioinformatics Institute (EBI) National Center for Biotechnology Information (NCBI) University of California, Santa Cruz (UCSC) Wellcome Trust Sanger Institute (WTSI) For more information on the different gene tracks, see our Genes FAQ. Methods CDS annotations of the human genome were obtained from two sources: NCBI RefSeq and a union of the gene annotations from Ensembl and Vega, collectively known as Hinxton. Genes with identical CDS genomic coordinates in both sets become CCDS candidates. The genes undergo a quality evaluation, which must be approved by all collaborators. The following criteria are currently used to assess each gene: an initiating ATG (Exception: a non-ATG translation start codon is annotated if it has sufficient experimental support), a valid stop codon, and no in-frame stop codons (Exception: selenoproteins, which contain a TGA codon that is known to be translated to a selenocysteine instead of functioning as a stop codon) ability to be translated from the genome reference sequence without frameshifts recognizable splicing sites no intersection with putative pseudogene predictions supporting transcripts and protein homology conservation evidence with other species A unique CCDS ID is assigned to the CCDS, which links together all gene annotations with the same CDS. CCDS gene annotations are under continuous review, with periodic updates to this track. Credits This track was produced at UCSC from data downloaded from the CCDS project web site. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009 Jul;19(7):1316-23. PMID: 19498102; PMC: PMC2704439 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 cgapSage CGAP SAGE CGAP Long SAGE mRNA and EST Description This track displays genomic mappings for human LongSAGE tags from the The Cancer Genome Anatomy Project. SAGE (Serial Analysis of Gene Expression) [Velculescu 1995] is a quantitative technique for measuring gene expression. For a brief overview of SAGE, see the CGAP SAGE information page. Display Conventions and Configuration Genomic mappings of 17-base LongSAGE tags are displayed. Tag counts are normalized to tags per million (TPM) in each tissue or library. Tags with higher TPM are more darkly shaded. The CATG restriction site before the start of the tag is rendered as a thick line; the 17 bases of the tag are drawn as a thinner line. Thus the thin end of the tag points in the direction of transcription. The track display modes are: dense - Draws locations of mapped tags on a single line. squish - Draws one item per tag per library without labels. pack - Draws one item per tag per tissue with labels. The label includes the number of libraries of each tissue type containing the tag. Clicking on an item lists the libraries containing the tag, with the libraries from the selected tissue in bold. Clicking on a library in the list displays detailed information about that library. full - Draws one item per tag per library. Clicking on an item displays information about the library, along with other libraries containing the tag. The track can be configured to display only tags from a selected tissue. Methods Tag and library data, along with genomic mappers, were obtained from The Cancer Genome Anatomy Project. Information about the various SAGE libraries, data downloads and other tools for exploring and analyzing these data is available from the CGAP SAGE Genie web site. Mapping SAGE tags to the human genome The goal of the SAGE tag mapping is to identify the genomic loci of the associated mRNAs. Since it is impossible to disambiguate tags that map to multiple loci, only unique genomic mappings are kept. To compensate for polypmorphisms between the reference genome and the mRNA libraries, SNPs are considered by the mapping algorithm. For each position in the genome on both strands, all possible 21-mers, given all combinations of SNPs, were considered. The 21-mers beginning with CATG were generated for use in mapping. Only 21-mers that were unique across the genome were used in placing SAGE tags. Only SNPs from dbSNP with the following characteristics were used: single-base maps to a single genomic location reference allele matches reference genome does not occur in a tandem repeat Human embryonic stem cell (ESC) library construction Detailed information regarding the human ESC lines used in this study can be found at https://stemcells.nih.gov and in Hirst et al. 2007. The ESC tags were generated from RNA purified from human ESCs maintained under conditions that promote their maintenance in an undifferentiated state. A complete set of embryonic stem cell LongSAGE tags is available through the CGAP web portal. Credits Many thanks to Martin Hirst of Canada's Michael Smith Genome Sciences Centre for his assistance in developing this track. The LongSAGE data and genomic mappings were provided by the The Cancer Genome Anatomy Project of the National Cancer Institute, U.S. National Institutes of Health. The human embryonic stem cell library was supported by funds from the National Cancer Institute, National Institutes of Health, under Contract No. N01-C0-12400 and by grants from Genome Canada, Genome British Columbia and the Canadian Stem Cell Network. References Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, De Souza SJ et al. An anatomy of normal and malignant gene expression. Proc Natl Acad Sci U S A. 2002 Aug 20;99(17):11287-92. PMID: 12119410; PMC: PMC123249 Hirst M, Delaney A, Rogers SA, Schnerch A, Persaud DR, O'Connor MD, Zeng T, Moksa M, Fichter K, Mah D et al. LongSAGE profiling of nine human embryonic stem cell lines. Genome Biol. 2007;8(6):R113. PMID: 17570852; PMC: PMC2394759 Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K et al. Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. Genome Res. 2007 Jan;17(1):108-16. PMID: 17135571; PMC: PMC1716260 Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K et al. A public database for gene expression in human cancers. Cancer Res. 1999 Nov 1;59(21):5403-7. PMID: 10554005 Liang P. SAGE Genie: a suite with panoramic view of gene expression. Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11547-8. PMID: 12195021; PMC: PMC129301 Riggins GJ, Strausberg RL. Genome and genetic resources from the Cancer Genome Anatomy Project. Hum Mol Genet. 2001 Apr;10(7):663-7. PMID: 11257097 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE. Using the transcriptome to annotate the genome. Nat Biotechnol. 2002 May;20(5):508-12. PMID: 11981567 Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R, Barber S, Beland J, Bohacec S et al. A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18485-90. PMID: 16352711; PMC: PMC1311911 Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):484-7. PMID: 7570003 cytoBand Chromosome Band Chromosome Bands Localized by FISH Mapping Clones Mapping and Sequencing Description The chromosome band track represents the approximate location of bands seen on Giemsa-stained chromosomes. Chromosomes are displayed in the browser with the short arm first. Cytologically identified bands on the chromosome are numbered outward from the centromere on the short (p) and long (q) arms. At low resolution, bands are classified using the nomenclature [chromosome][arm][band], where band is a single digit. Examples of bands on chromosome 3 include 3p2, 3p1, cen, 3q1, and 3q2. At a finer resolution, some of the bands are subdivided into sub-bands, adding a second digit to the band number, e.g. 3p26. This resolution produces about 500 bands. A final subdivision into a total of 862 sub-bands is made by adding a period and another digit to the band, resulting in 3p26.3, 3p26.2, etc. Methods A full description of the method by which the chromosome band locations are estimated can be found in Furey and Haussler, 2003. Barbara Trask, Vivian Cheung, Norma Nowak and others in the BAC Resource Consortium used fluorescent in-situ hybridization (FISH) to determine a cytogenetic location for large genomic clones on the chromosomes. The results from these experiments are the primary source of information used in estimating the chromosome band locations. For more information about the process, see the paper, Cheung, et al., 2001. and the accompanying web site, Human BAC Resource. BAC clone placements in the human sequence are determined at UCSC using a combination of full BAC clone sequence, BAC end sequence, and STS marker information. Credits We would like to thank all the labs that have contributed to this resource: Fred Hutchinson Cancer Research Center (FHCRC) National Cancer Institute (NCI) Roswell Park Cancer Institute (RPCI) The Wellcome Trust Sanger Institute (SC) Cedars-Sinai Medical Center (CSMC) Los Alamos National Laboratory (LANL) UC San Francisco Cancer Center (UCSF) References Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen XN, Furey TS, Kim UJ, Kuo WL, Olivier M et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001 Feb 15;409(6822):953-8. PMID: 11237021 Furey TS, Haussler D. Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet. 2003 May 1;12(9):1037-44. PMID: 12700172 cytoBandIdeo Chromosome Band (Ideogram) Chromosome Bands Localized by FISH Mapping Clones (for Ideogram) Mapping and Sequencing iscaComposite ClinGen CNVs Clinical Genome Resource (ClinGen) CNVs Phenotype and Disease Associations The ClinGen CNVs track is no longer being updated. These data, along with updates, can be found in the ClinVar Copy Number Variants (ClinVar CNVs) track. See our news archive for more information. Description NOTE: These data are for research purposes only. While the ClinGen data are open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal medical questions. UCSC presents these data for use by qualified professionals, and even such professionals should use caution in interpreting the significance of information found here. No single data point should be taken at face value and such data should always be used in conjunction with as much corroborating data as possible. No treatment protocols should be developed or patient advice given on the basis of these data without careful consideration of all possible sources of information. No attempt to identify individual patients should be undertaken. No one is authorized to attempt to identify patients by any means. The Clinical Genome Resource (ClinGen) is a National Institutes of Health (NIH)-funded program dedicated to building a genomic knowledge base to improve patient care. This will be accomplished by harnessing the data from both research efforts and clinical genetic testing, and using it to propel expert and machine-driven curation activities. By facilitating collaboration within the genomics community, we will all better understand the relationship between genomic variation and human health. ClinGen will work closely with the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM), which will distribute this information through its ClinVar database. The ClinGen dataset displays clinical microarray data submitted to dbGaP/dbVar at NCBI by ClinGen member laboratories (dbVar study nstd37), as well as clinical data reported in Kaminsky et al., 2011 (dbVar study ntsd101) (see reference below). This track shows copy number variants (CNVs) found in patients referred for genetic testing for indications such as intellectual disability, developmental delay, autism and congenital anomalies. Additionally, the ClinGen "Curated Pathogenic" and "Curated Benign" tracks represent genes/genomic regions reviewed for dosage sensitivity in an evidence-based manner by the ClinGen Structural Variation Working Group (dbVar study nstd45). The CNVs in this study have been reviewed for their clinical significance by the submitting ClinGen laboratory. Some of the deletions and duplications in the track have been reported as causative for a phenotype by the submitting clinical laboratory; this information was based on current knowledge at the time of submission. However, it should be noted that phenotype information is often vague and imprecise and should be used with caution. While all samples were submitted because of a phenotype in a patient, only 15% of patients had variants determined to be causal, and most patients will have additional variants that are not causal. CNVs are separated into subtracks and are labeled as: Pathogenic Uncertain: Likely Pathogenic Uncertain Uncertain: Likely Benign Benign The user should be aware that some of the data were submitted using a 3-class system, with the two "Likely" categories omitted. Two subtracks, "Path Gain" and "Path Loss", are aggregate tracks showing graphically the accumulated level of gains and losses in the Pathogenic subtrack across the genome. Similarly, "Benign Gain" and "Benign Loss" show the accumulated level of gains and losses in the Benign subtrack. These tracks are collectively called "Coverage" tracks. Many samples have multiple variants, not all of which are causative of the phenotype. The CNVs in these samples have been decoupled, so it is not possible to connect multiple imbalances as coming from a single patient. It is therefore not possible to identify individuals via their genotype. Methods and Color Convention The samples were analyzed by arrays from patients referred for cytogenetic testing due to clinical phenotypes. Samples were analyzed with a probe spacing of 20-75 kb. The minimum CNV breakpoints are shown; if available, the maximum CNV breakpoints are provided in the details page, but are not shown graphically on the Browser image. Data were submitted to dbGaP at NCBI and thence decoupled as described into dbVar for unrestricted release. The entries are colored red for loss and blue for gain. The names of items use the ClinVar convention of appending "_inheritance" indicating the mechanism of inheritance, if known: "_pat, _mat, _dnovo, _unk" as paternal, maternal, de novo and unknown, respectively. Verification Most data were validated by the submitting laboratory using various methods, including FISH, G-banded karyotype, MLPA and qPCR. Credits Thank you to ClinGen and NCBI for technical coordination and consultation, and to the UCSC Genome Browser staff for engineering the track display. References Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010 May 14;86(5):749-64. PMID: 20466091; PMC: PMC2869000 Kaminsky EB, Kaul V, Paschall J, Church DM, Bunke B, Kunig D, Moreno-De-Luca D, Moreno-De-Luca A, Mulle JG, Warren ST et al. An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med. 2011 Sep;13(9):777-84. PMID: 21844811; PMC: PMC3661946 iscaViewDetail CNVs Clinical Genome Resource (ClinGen) CNVs Phenotype and Disease Associations iscaUncertain Uncertain ClinGen CNVs: Uncertain Phenotype and Disease Associations iscaPathogenic Pathogenic ClinGen CNVs: Pathogenic Phenotype and Disease Associations iscaCuratedPathogenic Curated Path ClinGen CNVs: Curated Pathogenic Phenotype and Disease Associations iscaLikelyPathogenic Uncert Path ClinGen CNVs: Uncertain: Likely Pathogenic Phenotype and Disease Associations iscaLikelyBenign Uncert Ben ClinGen CNVs: Uncertain: Likely Benign Phenotype and Disease Associations iscaBenign Benign ClinGen CNVs: Benign Phenotype and Disease Associations iscaCuratedBenign Curated Ben ClinGen CNVs: Curated Benign Phenotype and Disease Associations wgEncodeHudsonalphaCnv Common Cell CNV ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats Description This track shows copy number variation (CNV) in the ENCODE Tier 1 and Tier 2 human cell lines GM12878, HepG2, and K562 as determined by Illumina's Human 1M-Duo Infinium HD BeadChip assay and CNV analysis by circular binary segmentation (CBS). Two biological replicates were generated for each cell line. Because biological replicates gave very similar results, the replicates were averaged to provide a single genotyping dataset in order to apply these data to other ENCODE experiments. Possible uses of this data are for correction of copy number in peak-calling for interactome, transcriptome, DNase hypersensitivity, and methylome determinations. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. Regions Regions of the genome where copy number variation has been assesed. CNV regions are colored by type: blue = amplified black = normal orange = heterozygous deletion red = homozygous deletion Signal Mean log R ratio for each region. See Methods below. Signals are colored by cell type, not by copy number variation. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods Cells were grown according to the approved ENCODE cell culture protocols. Isolation of genomic DNA and hybridization Genomic DNA was extracted using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. For each biological replicate of each cell line, DNA concentrations and a level of quality were determined by UV absorbance. Genotypes were determined from 400 nanograms of each sample at 1 million loci using Illumina Human 1M-Duo arrays and standard Illumina protocols. Processing and Analysis Genotypes were ascertained from the 1M-Duo Arrays with BeadStudio using default settings and formatting with the A/B genotype designation for each SNP (see 1M-Duo manifest file for specific nucleotide). Copy Number Variation (CNV) analysis was performed using circular binary segmentation (DNAcopy) of the log R ratio values at each probe (Olshen et al., 2004). The parameters used were alpha=0.001, nperm=5000, sd.undo=1. Copy number segments are reported with the mean log R ratio for each chromosomal segment called by CBS. Log ratios of ~-0.2 to -1.5 can be considered heterozygous deletions, 0.2 amplifications. The coordinates for the genotypes and copy number calls are from Human Genome Build 36. Release Notes Release 2 (April 2011) of this track updates the colors used in the Regions view subtracks (the data remains unchanged). The colors now adhere to the color standards determined at the first annual International Standards for Cytogenomic Arrays (ISCA) Scientific Conference. Credits Tim Reddy, Rebekka Sprouse, Richard Myers, Devin Absher from HudsonAlpha Institute. Contact: Flo Pauli. References Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004 Oct;5(4)557-572. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeHudsonalphaCnvViewSignal Signal ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats wgEncodeHudsonalphaCnvSignalK562 K562 Signal K562 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 275 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalK562 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (K562 cells) Variation and Repeats wgEncodeHudsonalphaCnvSignalHepG2 HepG2 Signal HepG2 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 274 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalHepG2 Signal hepatocellular carcinoma Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (HepG2 cells) Variation and Repeats wgEncodeHudsonalphaCnvSignalGM12878 GM12878 Signal GM12878 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 273 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalGM12878 Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (GM12878 cells) Variation and Repeats wgEncodeHudsonalphaCnvViewRegions Regions ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats wgEncodeHudsonalphaCnvRegionsK562V2 K562 Regions K562 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 275 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsK562V2 Regions leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (K562 cells) Variation and Repeats wgEncodeHudsonalphaCnvRegionsHepG2V2 HepG2 Regions HepG2 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 274 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsHepG2V2 Regions hepatocellular carcinoma Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (HepG2 cells) Variation and Repeats wgEncodeHudsonalphaCnvRegionsGM12878V2 GM12878 Regions GM12878 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 273 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsGM12878V2 Regions B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (GM12878 cells) Variation and Repeats contrastGene CONTRAST CONTRAST Gene Predictions Genes and Gene Predictions Description This track shows protein-coding gene predictions generated by CONTRAST. Each predicted exon is colored according to confidence level: green (high confidence), orange (medium confidence), or red (low confidence). Methods CONTRAST predicts protein-coding genes from a multiple genomic alignment using a combination of discriminative machine learning techniques. A two-stage approach is used, in which output from local classifiers is combined with a global model of gene structure. CONTRAST is trained using a novel procedure designed to maximize expected coding region boundary detection accuracy. Please see the CONTRAST web site for details on how these predictions were generated and an estimate of accuracy. Credits Thanks to Samuel Gross of the Batzoglou lab at Stanford University for providing these predictions. References Gross SS, Do CB, Sirota M, Batzoglou S. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol. 2007;8(12):R269. PMID: 18096039; PMC: PMC2246271 clonePos Coverage Clone Coverage Mapping and Sequencing Description In dense display mode, this track shows the coverage level of the genome. Finished regions are depicted in black. Draft regions are shown in various shades of gray that correspond to the level of coverage. In full display mode, this track shows the position of each contig inside each draft or finished clone ("fragment") in the assembly. For some assemblies, clones in the sequencing center tiling path are displayed with blue rather than gray backgrounds. wgEncodeCshlLongRnaSeq CSHL Long RNA-seq GSE26284 ENCODE Cold Spring Harbor Labs Long RNA-seq Expression Description This track depicts high throughput sequencing of long RNAs (>200 nt) from RNA samples from tissues or subcellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. Display Conventions and Configuration This track is a multi-view composite track that contains the following views: Alignments The Alignments view shows reads mapped to the genome. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. Raw Signals The Raw Signal views show the density of aligned tags on the plus, minus, and on both strands. Methods Cells were grown according to the approved ENCODE cell culture protocols. Sample preparation and sequencing K562 and GM12878 total cell, total RNA Standard Illumina Pair-end kit with the sole exception that a "tagged" random hexamer was used to prime the 1st strand synthesis: 5′ACTGTAGGN6-3′. The addition of this tag is what permits us to make strand assignments for the reads. The sequence of the tag is reported in the 5′ end of the read. Asymmetric PCR can place the tag on either the 1st or 2nd read depending on which strand it used as a template. Strand assignments are made by looking for the tag at the 5′ end of either read 1 or read 2. Read 1 is physically linked to read 2. Therefore, if a tag is present on one end strand assignments are made for both ends. We noted during analysis that the tags are generally 5′ truncated. We only "strand" reads that contain ACTGTAGG, CTGTAGG, TGTAGG, GTAGG. Between 63-68% of reads could be stranded in these libraries. It is possible to cull additional stranded reads that contain non-templated TAGG, AGG, GG, or G sequences at their 5′ end. The peak in insert size distribution is between 200-250 nucleotides. K562 cytosol, polyA+ RNA Oligo-dT selected poly-A+ RNA was RiboMinus-treated according to the manufacturer's protocol (Invitrogen). The RNA was treated with tobacco alkaline pyrophosphatase to eliminate any 5′ cap structures and hydrolyzed to ~200 bases via alkaline hydrolysis. The 3′ end was repaired using calf intestinal alkaline phosphatase, and poly-A polymerase was used to catalyze the addition of Cs to the 3′ end. The 5′ end was phosphorylated using T4 PNK, and an RNA linker was ligated onto the 5′ end. Reverse transcription was carried out using a poly-G oligo with a defined 5′ extension. The inserts were then amplified using oligos targeting the 5′ linker and poly-G extension. This cloning protocol generated stranded reads that were read from the 5′ ends of the inserts. The library was sequenced on a Solexa platform for a total of 36 cycles; however, the reads underwent post-processing, resulting in trimming of their 3′ ends. Consequently, the mapped read lengths are variable. Analysis K562 and GM12878 total cell, total RNA Tags were removed from the 5′ ends of the reads in accordance to their lengths and strand assignments made. Subsequently, the reads were trimmed from their 3′ ends to a final length of 50 nucleotides and were mapped using NexAlign, a program developed by Timo Lassman, RIKEN. We allowed up to 2 mismatches across the entire length and only report reads that mapped to a single/unique locus in the assembled hg18 genome. K562 cytosol, polyA+ RNA Reads were mapped to the human (hg18, March 2006) assembly using Nexalign, with only uniquely mapping (one loci), exactly matching (no mis-matches) aligned reads reported in the processed files, as follows: Collect the read sequences from Illumina non-filtered output files. Filter out all reads that contain undefined nucleotides ('N') Perform iterative alignment/C-tail chopping algorithm (below). On each alignment step, the reads are aligned to the genome with 100% identity. All reads that align to a single locus are withdrawn from the alignment pool and only the reads that could not be aligned continue to the next step. Align to the hg18 genome using Nexalign 1.3.3 (© Timo Lassmann) without chopping off any nucleotides Chop off any C-blocks (until the first non-C) at the ends of the reads Align to the genome -> remove and save those that align Chop off any non-Cs until the next C Chop off C-block until the next non-C Align to the genome -> remove and save those that align Repeat steps d, e, and f until the reads align to the genome, or chopping results in the reduction of the reads' lengths to below 16 (default), or there are no non-Cs left. Verification Verification was done by comparison of referential data generated from 8 individual sequencing lanes (Illumina technology). Release Notes This is Release 2 (Nov 2009) of this track. It includes data from additional experiments, and changes in formatting for the existing data described below. The K562 cytosol alignments are exactly the same data as Release 1, but the alignments are now formatted in the bed14 format described below. These data have the string submittedDataVersion="V2 - file format change" in their metadata and the table names are appended with the string "V2". The data format for the alignments in this track are provided in bigBed format. Each record is in bed 14 format with the first 12 fields described here. The final two fields are the two paired sequences, or in the case of single alignments, the 13th field is the sequence and the 14th field is a single N. Credits K562 cytosol, polyA+ RNA These data were generated and analyzed by the transcriptome group at Cold Spring Harbor Laboratories, and the Center for Genomic Regulation (Barcelona), who are participants in the ENCODE Transcriptome Group. K562 and GM12878 total cell, total RNA Credits: Carrie A. Davis, Jorg Drenkow, Huaien Wang, Alex Dobin and Tom Gingeras Contacts: Carrie Davis and Tom Gingeras (CSHL). Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCshlLongRnaSeqView1PlusRawSignal Plus Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ +S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CytosolLongpolyaV2 PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep2K562CellTotal K562 cell to +S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqPlusRawSigRep2K562CellTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CellTotal K562 cell to +S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CellTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep2Gm12878CellTotal GM12 cell to +S2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqPlusRawSigRep2Gm12878CellTotal PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1Gm12878CellTotal GM12 cell to +S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqPlusRawSigRep1Gm12878CellTotal PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView2MinusRawSignal Minus Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ -S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CytosolLongpolyaV2 MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep2K562CellTotal K562 cell to -S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqMinusRawSigRep2K562CellTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CellTotal K562 cell to -S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CellTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep2Gm12878CellTotal GM12 cell to -S2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqMinusRawSigRep2Gm12878CellTotal MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1Gm12878CellTotal GM12 cell to -S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqMinusRawSigRep1Gm12878CellTotal MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView3AllRawSignal All Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqAllRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ AS1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqAllRawSigRep1K562CytosolLongpolyaV2 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqAllRawSigRep2K562CellTotal K562 cell to AS2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAllRawSigRep2K562CellTotal RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep1K562CellTotal K562 cell to AS1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAllRawSigRep1K562CellTotal RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep2Gm12878CellTotal GM12 cell to AS2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAllRawSigRep2Gm12878CellTotal RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep1Gm12878CellTotal GM12 cell to AS1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAllRawSigRep1Gm12878CellTotal RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView4Alignments Alignments ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqAlignmentsRep1K562CytosolLongpolyaV2 K562 cyto A+ Al1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 GSM646524 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqAlignmentsRep1K562CytosolLongpolyaV2 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqAlignmentsRep2K562CellTotal K562 cell to Al2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 GSM646523 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAlignmentsRep2K562CellTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep1K562CellTotal K562 cell to Al1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 GSM646523 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAlignmentsRep1K562CellTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep2Gm12878CellTotal GM12 cell to Al2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 GSM646522 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAlignmentsRep2Gm12878CellTotal Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep1Gm12878CellTotal GM12 cell to Al1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 GSM646522 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAlignmentsRep1Gm12878CellTotal Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (GM12878 whole cell) Expression wgEncodeCshlShortRnaSeq CSHL Sm RNA-seq GSE24565 ENCODE Cold Spring Harbor Labs Small RNA-seq Expression Description This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. This cloning protocol generates directional libraries that are read from the 5′ ends of the inserts, which should largely correspond to the 5′ ends of the mature RNAs. The libraries were sequenced on a Solexa platform for a total of 36, 50 or 76 cycles however the reads undergo post-processing resulting in trimming of their 3′ ends. Consequently, the mapped read lengths are variable. Display Conventions and Configuration To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments. Transfrags Identical reads were collapsed while maintaining their multiplicity information and reported as "transfrags". "Y" means that the transfrag underwent clipping prior to mapping. "N" indicates that the transfrag did not undergo clipping. The Transfrags view includes all transfrags before filtering. Raw Signals The Raw Signal views show the density of aligned tags on the plus and minus strands. Alignments The Alignments view shows reads mapped to the genome and indicates where bases may mismatch. Every mapped read is displayed, i.e. uncollapsed. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. The score of each alignment is the number of times it was aligned to the entire genome, that is, a score of two means that this particular read was aligned to the genome twice in two different locations. Methods Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5′ cap structures. Poly-A Polymerase was used to catalyze the addition of C's to the 3′ end. The 5′ ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5′ end. Reverse transcription was carried out using a poly-G oligo with a defined 5′ extension. The inserts were then amplified using oligos targeting the 5′ linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number of mappable reads are obtained, additional lanes are run. Sequence reads underwent quality filtration using Illumina standard pipeline (Gerlad). The read lengths may exceed the insert sizes and consequently introduce 3′ adaptor sequence into the 3′ end of the reads. The 3′ sequencing adaptor was removed from the reads using a custom clipper program, which aligned the adaptor sequence to the short-reads, allowing up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using Nexalign (Lassmann et al., not published). The alignment parameters are tuned to tolerate up to 2 mismatches with no indels and will allow for trimmed portions as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer times. Note: Data obtained from each lane is processed and mapped independently. The processed/mapped data from each lane is then complied as a single track without additional processing and submitted to UCSC. Consequently, identical reads within a lane were collapsed and their value is reported as the "transfrag" signal value. However, the redundancy between lanes has not been eliminated so the same transfrag may appear multiple times within a track. Verification Comparison of referential data generated from 8 individual sequencing lanes (Illumina technology). Credits Hannon lab members: Katalin Fejes-Toth, Vihra Sotirova, Gordon Assaf, Jon Preall And members of the Gingeras and Guigo labs. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCshlShortRnaSeqView1Transfrags Transfrags ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqTransfragsProstateCellShort pros cell tot TF prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 GSM605626 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsProstateCellShort Transfrags prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleolusShort K562 nlos tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 GSM605628 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleolusShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqTransfragsK562ChromatinShort K562 chrm tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 GSM605632 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqTransfragsK562ChromatinShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleoplasmShort K562 nplm tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 GSM605634 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleoplasmShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleusShort K562 nucl tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 GSM605635 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleusShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqTransfragsK562CytosolShort K562 cyto tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 GSM605629 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqTransfragsK562CytosolShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqTransfragsK562PolysomeShort K562 psom tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 GSM605631 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqTransfragsK562PolysomeShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqTransfragsK562CellShort K562 cell tot TF K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 GSM605630 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsK562CellShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-Seq Transfrags (short in K562 cell) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878NucleusShort GM12 nucl tot TF GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 GSM605633 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878NucleusShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878CytosolShort GM12 cyto tot TF GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 GSM605627 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878CytosolShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878CellShort GM12 cell tot TF GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 GSM605625 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878CellShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-Seq Transfrags (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView2PlusRawSignal Plus Raw Signal ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqPlusRawSignalProstateCellShort pros cell tot +S prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalProstateCellShort PlusRawSignal prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleolusShort K562 nlos tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleolusShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562ChromatinShort K562 chrm tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562ChromatinShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleoplasmShort K562 nplm tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleoplasmShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleusShort K562 nucl tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleusShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562CytosolShort K562 cyto tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562CytosolShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562PolysomeShort K562 psom tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562PolysomeShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562CellShort K562 cell tot +S K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562CellShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-Seq Plus Strand Raw Signal (short in K562 cell) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878NucleusShort GM12 nucl tot +S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878NucleusShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CytosolShort GM12 cyto tot +S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CytosolShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CellShort GM12 cell tot +S GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CellShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-Seq Plus Strand Raw Signal (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView3MinusRawSignal Minus Raw Signal ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqMinusRawSignalProstateCellShort pros cell tot -S prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalProstateCellShort MinusRawSignal prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleolusShort K562 nlos tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleolusShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562ChromatinShort K562 chrm tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562ChromatinShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleoplasmShort K562 nplm tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleoplasmShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleusShort K562 nucl tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleusShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562CytosolShort K562 cyto tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562CytosolShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562PolysomeShort K562 psom tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562PolysomeShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562CellShort K562 cell tot -S K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562CellShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-Seq Minus Strand Raw Signal (short in K562 cell) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878NucleusShort GM12 nucl tot -S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878NucleusShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CytosolShort GM12 cyto tot -S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CytosolShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CellShort GM12 cell tot -S GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CellShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-Seq Minus Strand Raw Signal (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView4Alignments Alignments ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqAlignmentsProstateCellShort pros cell tot AL prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 GSM605626 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsProstateCellShort Alignments prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleolusShort K562 nlos tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 GSM605628 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleolusShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqAlignmentsK562ChromatinShort K562 chrm tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 GSM605632 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562ChromatinShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleoplasmShort K562 nplm tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 GSM605634 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleoplasmShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleusShort K562 nucl tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 GSM605635 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleusShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqAlignmentsK562CytosolShort K562 cyto tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 GSM605629 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562CytosolShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqAlignmentsK562PolysomeShort K562 psom tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 GSM605631 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562PolysomeShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqAlignmentsK562CellShort K562 cell tot AL K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 GSM605630 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562CellShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-Seq Tags (short in K562 cell) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878NucleusShort GM12 nucl tot AL GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 GSM605633 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878NucleusShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878CytosolShort GM12 cyto tot AL GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 GSM605627 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878CytosolShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878CellShort GM12 cell tot AL GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 GSM605625 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878CellShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-Seq Tags (short in GM12878 cell) Expression decodeRmap deCODE Recomb deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing Description The deCODE recombination rate track represents calculated rates of recombination based on the deCODE recombination maps in 10 Kb bins from October 2010. Sex averaged-, female- and male-specific recombination rates can be displayed by choosing the appropriate options on the track visibility controls. Corresponding to each of these tracks are separate tracks for carriers and non-carriers of the PRDM9 14/15 composite allele which can be displayed as well. There are also tracks depicting the difference between male and female recombination rates, and a track showing recombination hotspots (i.e., bins with standardized recombination rates higher than 10). In addition to the deCODE display, three data tracks from the HapMap project are included. CEU, YRI and combined maps from release #24 can be turned on with the track visibility controls. Methods The deCODE genetic map was created at deCODE Genetics and is based on 289,658 and 8,411 SNPs on the autosomal and X chromosomes, respectively, for 15,257 parent-offspring pairs. For more information on this map, see Kong, et al., 2010. Each base is assigned the recombination rate calculated by assuming a linear genetic distance across the immediately flanking genetic markers. The recombination rate assigned to each 10 Kb window is the average recombination rate of the bases contained within the window. The recombination rates are standardized, bringing the average to 1 for all bins used for the standardization. Credits This track was produced at UCSC using data that are freely available for the deCODE genetic maps. Thanks to all who played a part in the creation of these maps. References Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G et al. A high-resolution recombination map of the human genome. Nat Genet. 2002 Jul;31(3):241-7. PMID: 12053178 Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, Walters GB, Jonasdottir A, Gylfason A, Kristinsson KT et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature. 2010 Oct 28;467(7319):1099-103. PMID: 20981099 avgView Sex Avg deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeSexAveragedNonCarrier Sex Avg Non-carry deCODE recombination map, sex-average non-carrier Mapping and Sequencing decodeSexAveragedCarrier Sex Avg Carry deCODE recombination map, sex-average carrier Mapping and Sequencing decodeSexAveraged Sex Avg deCODE recombination map, sex-average Mapping and Sequencing diffView Male-Female deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeMaleFemaleDifference Sex Difference deCODE recombination map, male minus female difference Mapping and Sequencing maleView Male deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeMaleNonCarrier Male Non-carry deCODE recombination map, male non-carrier Mapping and Sequencing decodeMaleCarrier Male Carry deCODE recombination map, male carrier Mapping and Sequencing decodeMale Male deCODE recombination map, male Mapping and Sequencing hotView Hot Spots deCODE recombination map, Female and Male hot spots, >= 10.0 Mapping and Sequencing decodeHotSpotFemale Hot Spot Female deCODE recombination map, female >= 10.0 Mapping and Sequencing decodeHotSpotMale Hot Spot Male deCODE recombination map, male >= 10.0 Mapping and Sequencing otherMaps HapMap HapMap Release 24 recombination maps Mapping and Sequencing hapMapRelease24YRIRecombMap HapMap YRI HapMap Release 24 YRI recombination map Mapping and Sequencing hapMapRelease24CEURecombMap HapMap CEU HapMap Release 24 CEU recombination map Mapping and Sequencing hapMapRelease24CombinedRecombMap HapMap HapMap Release 24 combined recombination map Mapping and Sequencing femaleView Female deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeFemaleNonCarrier Female Non-carry deCODE recombination map, female non-carrier Mapping and Sequencing decodeFemaleCarrier Female Carry deCODE recombination map, female carrier Mapping and Sequencing decodeFemale Female deCODE recombination map, female Mapping and Sequencing bamSLDenisova Denisova Denisova Sequence Reads Denisova Assembly and Analysis Denisova cave entrance in the Altai Mountains of Siberia, Russia where the bones were found from which DNA was sequenced (Copyright (C) 2010, Johannes Krause) Description The Denisova track shows Denisova sequence reads mapped to the human genome. The Denisova sequence was generated from a phalanx bone excavated from Denisova Cave in the Altai Mountains in southern Siberia. Methods Denisova sequence libraries were prepared by treating DNA extracted from a single phalanx bone with two enzymes: uracil-DNA-glycosylase, which removes uracil residues from DNA to leave abasic sites, and endonuclease VIII, which cuts DNA at the 59 and 39 sides of abasic sites. Subsequent incubation with T4 polynucleotide kinase and T4 DNA polymerase was used to generate phosphorylated blunt ends that are amenable to adaptor ligation. Because the great majority of uracil residues occur close to the ends of ancient DNA molecules, this procedure leads to only a moderate reduction in average length of the molecules in the library, but a several-fold reduction in uracil-derived nucleotide misincorporation. Reads were aligned to human sequence Mar. 2006 (NCBI36/hg18) using the Burrows-Wheeler Aligner. Download the Denisova track data sets from the Genome Browser downloads server. References Briggs A.W., Stenzel U., Meyer M., Krause J., Kircher M., Pääbo S. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 2009 Dec 22:38(6) e87. Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., Briggs A.W., Stenzel U., Johnson P.L.F. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010 Dec 23;468:1053-1060. Credits This track was produced at UCSC using data generated by the Max Planck Institute for Evolutionary Anthropology. dgvPlus DGV Struct Var Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) Variation and Repeats Description This track displays copy number variants (CNVs), insertions/deletions (InDels), inversions and inversion breakpoints annotated by the Database of Genomic Variants (DGV), which contains genomic variations observed in healthy individuals. DGV focuses on structural variation, defined as genomic alterations that involve segments of DNA that are larger than 1000 bp. Insertions/deletions of 50 bp or larger are also included. Display Conventions This track contains three subtracks: Structural Variant Regions: annotations that have been generated from one or more reported structural variants at the same location. Supporting Structural Variants: the sample-level reported structural variants. Gold Standard Variants: curated variants from a selected number of studies in DGV. Color is used in both subtracks to indicate the type of variation: Inversions and inversion breakpoints are purple. CNVs and InDels are blue if there is a gain in size relative to the reference. CNVs and InDels are red if there is a loss in size relative to the reference. CNVs and InDels are brown if there are reports of both a loss and a gain in size relative to the reference. The DGV Gold Standard subtrack utilizes a boxplot-like display to represent the merging of records as explained in the Methods section below. In this track, the middle box (where applicable), represents the high confidence location of the CNV, while the thin lines and end boxes represent the possible range of the CNV. Clicking on a variant leads to a page with detailed information about the variant, such as the study reference and PubMed abstract link, the study's method and any genes overlapping the variant. Also listed, if available, are the sequencing or array platform used for the study, a sample cohort description, sample size, sample ID(s) in which the variant was observed, observed gains and observed losses. If the particular variant is a merged variant, links to genome browser views of the supporting variants are listed. If the particular variant is a supporting variant, a link to the genome browser view of its merged variant is displayed. A link to DGV's Variant Details page for each variant is also provided. For most variants, DGV uses accessions from peer archives of structural variation (dbVar at NCBI or DGVa at EBI). These accessions begin with either "essv", "esv", "nssv", or "nsv", followed by a number. Variant submissions processed by EBI begin with "e" and those processed by NCBI begin with "n". Accessions with ssv are for variant calls on a particular sample, and if they are copy number variants, they generally indicate whether the change is a gain or loss. In a few studies the ssv represents the variant called by a single algorithm. If multiple algorithms were used, overlapping ssv's from the same individual would be combined to generate a sample level sv. If there are many samples analyzed in a study, and if there are many samples which have the same variant, there will be multiple ssv's with the same start and end coordinates. These sample level variants are then merged and combined to form a representative variant that highlights the common variant found in that study. The result is called a structural variant (sv) record. Accessions with sv are for regions asserted by submitters to contain structural variants, and often span ssv elements for both losses and gains. dbVar and DGVa do not record numbers of losses and gains encompassed within sv regions. DGV merges clusters of variants that share at least 70% reciprocal overlap in size/location, and assigns an accession beginning with "dgv", followed by an internal variant serial number, followed by an abbreviated study id. For example, the first merged variant from the Shaikh et al. 2009 study (study accession=nstd21) would be dgv1n21. The second merged variant would be dgv2n21 and so forth. Since in this case there is an additional level of clustering, it is possible for an "sv" variant to be both a merged variant and a supporting variant. For most sv and dgv variants, DGV displays the total number of sample-level gains and/or losses at the bottom of their variant detail page. Since each ssv variant is for one sample, its total is 1. Methods Published structural variants are imported from peer archives dbVar and DGVa. DGV then applies quality filters and merges overlapping variants. For data sets where the variation calls are reported at a sample-by-sample level, DGV merges calls with similar boundaries across the sample set. Only variants of the same type (i.e. CNVs, Indels, inversions) are merged, and gains and losses are merged separately. Sample level calls that overlap by ≥ 70% are merged in this process. The initial criteria for the Gold Standard set require that a variant is found in at least two different studies and found in at least two different samples. After filtering out low-quality variants, the remaining variants are clustered according to 50% minimum overlap, and then merged into a single record. Gains and losses are merged separately. The highest ranking variant in the cluster defines the inner box, while the outer lines define the maximum possible start and stop coordinates of the CNV. In this way, the inner box forms a high-confidence CNV location and the thin connecting lines indicate confidence intervals for the location of CNV. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/hg38/dgv/dgvMerged.bb -chrom=chr6 -start=0 -end=1000000 stdout Credits Thanks to the Database of Genomic Variants for providing these data. In citing the Database of Genomic Variants please refer to MacDonald et al. References Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51. PMID: 15286789 MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. PMID: 24174537; PMC: PMC3965079 Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res. 2006;115(3-4):205-14. PMID: 17124402 dgvSupporting DGV Supp Var Database of Genomic Variants: Supporting Structural Var (CNV, Inversion, In/del) Variation and Repeats dgvMerged DGV Struct Var Database of Genomic Variants: Structural Var Regions (CNV, Inversion, In/del) Variation and Repeats wgEncodeDukeAffyExonArray Duke Affy Exon ENCODE Duke Affy All-Exon Arrays Expression Description This track displays human tissue microarray data using Affymetrix Human Exon 1.0 ST expression arrays. This RNA expression track was produced as part of the ENCODE Project. RNA was extracted from cells that were also analyzed by DNaseI hypersensitivity, FAIRE, and ChIP (Open Chromatin track). Display Convention and Configuration The display for this track shows probe location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks. Items with scores between 900-1000 are in the highest 10% quantile for signal value of that particular cell type. Similarly, items scoring 800-900 are the next 10% quantile and at the bottom of scale, items scoring 100-200 are in the lowest 20% quantile for signal value. The subtracks within this composite annotation track correspond to data from different cell types and tissues. The configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For information regarding specific microarray probes, turn on the Affy Exon Probes track, which can be found inside the Affy Exon supertrack in the Expression track group. Methods Cells were grown according to the approved ENCODE cell culture protocols. Total RNA was isolated from these cells using trizol extraction followed by cleanup on RNEasy column (Qiagen) that included a DNase step. The RNA was checked for quality using a nanodrop and an Agilent Bioanalyzer . RNA (1ug) deemed to be of good quality was then processed according to the standard Affymetrix Whole transcript Sense Target labeling protocol that included a riboreduction step. The fragmented biotin-labeled cDNA was hybridized over 16h to Affymetrix Exon 1.0 ST arrays and scanned on an Affymetrix Scanner 3000 7G using AGCC software. Exon expression analyses were carried out using Affymetrix Expression Console 1.1 software tools. Samples were quantile normalized for background correction and Probe Logarithmic Intensity Error summarized. Only values for the CORE probes were calculated as these seem to be the most robust. Verification Data were verified by sequencing biological replicates displaying Pearson correlation coefficient >0.9. Release Notes This is Release 2 (June 2011) of this track, which excludes the LHSR cell line (treated and untreated). The data has been withdrawn by the submitting lab for DNase, FAIRE and exon array. Previous version of these files are available for download from the FTP site. Credits RNA was extracted from each cell type by Greg Crawford's group at Duke University. RNA was purified and hybridized to Affymetrix Exon arrays by Sridar Chittur and Scott Tenenbaum at the University of Albany-SUNY. Data analyses were performed by Holly Dressman, Darin London, and Zhancheng Zhang at Duke University. Contact: Terry Furey Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeDukeAffyExonArraySimpleSignalRep2Progfib ProgFib 2 ProgFib AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 241 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Progfib None fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in ProgFib cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Progfib ProgFib 1 ProgFib AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 241 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Progfib None fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in ProgFib cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Osteobl Osteobl 2 Osteobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 240 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Osteobl None osteoblasts (NHOst) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Osteobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Osteobl Osteobl 1 Osteobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 240 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Osteobl None osteoblasts (NHOst) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Osteobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Nhek NHEK 2 NHEK AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 239 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Nhek None epidermal keratinocytes Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in NHEK cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Nhek NHEK 1 NHEK AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 239 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Nhek None epidermal keratinocytes Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in NHEK cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Medullo Medullo 2 Medullo AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 238 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Medullo None medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Medullo cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Medullo Medullo 1 Medullo AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 238 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Medullo None medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Medullo cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Vehicle MCF7 2 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 237 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Vehicle vehicle mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University Charcoal stripped hormone-free FBS for 72 hours (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Vehicle MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 237 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Vehicle vehicle mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University Charcoal stripped hormone-free FBS for 72 hours (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Estro MCF7 2 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 236 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Estro estrogen mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University 45 min with 100 nM Estradiol (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Estro MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 236 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Estro estrogen mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University 45 min with 100 nM Estradiol (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7 MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 235 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7 None mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2LncapAndro LNCaP 2 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 234 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2LncapAndro androgen prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1LncapAndro LNCaP 1 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 234 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1LncapAndro androgen prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Lncap LNCaP 2 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 233 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Lncap None prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Lncap LNCaP 1 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 233 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Lncap None prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Huvec HUVEC 2 HUVEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 226 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Huvec None umbilical vein endothelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HUVEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Huvec HUVEC 1 HUVEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 226 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Huvec None umbilical vein endothelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HUVEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Hmec HMEC 2 HMEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 225 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Hmec None mammary epithelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HMEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Hmec HMEC 1 HMEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 225 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Hmec None mammary epithelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HMEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Hepg2 HepG2 3 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Hepg2 HepG2 2 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Hepg2 HepG2 1 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifng4h HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 229 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifng4h IFNg4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifng4h HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 229 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifng4h IFNg4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifna4h HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 228 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifna4h IFNa4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifna4h HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 228 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifna4h IFNa4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Helas3 HeLaS3 3 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3 HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3 HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gliobla Gliobla 2 Gliobla AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 224 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gliobla None glioblastoma, these cells (aka H54 and D54) come from a surgical resection from a patient with glioblastoma multiforme (WHO Grade IV). D54 is a commonly studied glioblastoma cell line (Bao et al., 2006) that has been thoroughly described by S Bigner (1981). (PMID: 7252524) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Gliobla cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gliobla Gliobla 1 Gliobla AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 224 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gliobla None glioblastoma, these cells (aka H54 and D54) come from a surgical resection from a patient with glioblastoma multiforme (WHO Grade IV). D54 is a commonly studied glioblastoma cell line (Bao et al., 2006) that has been thoroughly described by S Bigner (1981). (PMID: 7252524) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Gliobla cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19240 GM19240 2 GM19240 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 223 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19240 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19240 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19240 GM19240 1 GM19240 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 223 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19240 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19240 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19239 GM19239 2 GM19239 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 222 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19239 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19239 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19239 GM19239 1 GM19239 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 222 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19239 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19239 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19238 GM19238 2 GM19238 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 221 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19238 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19238 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19238 GM19238 1 GM19238 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 221 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19238 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19238 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Gm18507 GM18507 3 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm18507 GM18507 2 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm18507 GM18507 1 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12892 GM12892 2 GM12892 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 219 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12892 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12892 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12892 GM12892 1 GM12892 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 219 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12892 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12892 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12891 GM12891 2 GM12891 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 218 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12891 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12891 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12891 GM12891 1 GM12891 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 218 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12891 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12891 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Gm12878 GM12878 3 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12878 GM12878 2 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12878 GM12878 1 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Fibrobl Fibrobl 2 Fibrobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 216 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Fibrobl None child fibroblast Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Fibrobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Fibrobl Fibrobl 1 Fibrobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 216 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Fibrobl None child fibroblast Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Fibrobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Chorion Chorion 1 Chorion AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 215 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Chorion None chorion cells (outermost of two fetal membranes), fetal membranes were collected from women who underwent planned cesarean delivery at term, before labor and without rupture of membranes. Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Chorion cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Astrocy Astrocy 2 Astrocy AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 214 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Astrocy None astrocytes, Astrocy is the same as cell line NH-A Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Astrocy cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Astrocy Astrocy 1 Astrocy AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 214 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Astrocy None astrocytes, Astrocy is the same as cell line NH-A Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Astrocy cells) Expression eioJcviNAS EIO/JCVI NAS Eur. Inst. Oncology/J. C. Venter Inst. Nuclease Accessible Sites Regulation Description Genes in metazoa are controlled by a complex array of cis-regulatory elements that include core and distal promoters, enhancers, insulators, silencers, etc. (Levine and Tjian, 2003). In living cells, functionally active cis-regulatory elements bear a unifying feature, which is a chromatin-based epigenetic signature known as nuclease hypersensitivity (Elgin, 1988; Gross and Garrard, 1988; Wolffe, 1998). This track presents the results of a collaboration between J. Craig Venter Institute (JCVI, Rockville MD) and the European Institute of Oncology (Milan, Italy) to isolate nuclease accessible sites (NAS) from primary human CD34+ hematopoietic stem and progenitor cells, and from CD34- cells, maturating myeloid cells generated by in vitro differentiation of CD34+ cells (Gargiulo et al., submitted). This effort made use of a method (originally developed at Sangamo BioSciences, Richmond, CA) to isolate such NAS from living cells using restriction enzymes (RE), leading to minimal, if any, contamination from bulk DNA. High throughput 454 sequencing was then used to generate NAS libraries in CD34+ and CD34- cells: this technology has been named "NA-Seq" (Gargiulo et al., submitted). Display Conventions The track annotates the location of NAS in the genome of human CD34+ and CD34- cells in the form of tags, generated by NA-Seq and obtained by merging NAS within 600 bp. Note that the method identifies a specific position in chromatin that is sensitive to nucleases, but does not map the boundaries of a regulatory element per se. A conservative estimate of element size would be the space occupied by one nucleosome, i.e., 180 - 200 bp surrounding the tag, although there is precedent in the literature for nuclease hypersensitive sites that span more than the length of one nucleosome (Turner, 2001; Wolffe, 1998; Boyle, 2008). Methods CD34+ cells (enriched in hematopoietic stem and progenitor cells) were prepared from healthy donors following guidelines established by the Ethics Committee of the European Institute of Oncology (IEO), Milan. Mobilization of CD34+ cells to the peripheral blood was stimulated by G-CSF treatment according to standard procedures. After mobilization, donors were subjected to leukaphereses, and <10% of the sample was used in the experiment. CD34+ cells were purified using a magnetic positive selection procedure ("EASYSEP"; Stemcell, Vancouver, Canada). Purity of separation was evaluated by FACS after staining with an anti-Human CD34 FITC-conjugate antibody (Stemcell). Upon purification, the cell cycle status of the CD34+ cells was monitored by propidium iodide staining and FACS analysis. G0/G1 cells varied from approximately 90% to >95% of the total cells. Cells were immediately used for the isolation of NAS using the nuclease hypersensitive site isolation protocol (Gargiulo et al., submitted). Verification The method was initially validated on human tissue culture cells by examining the colocalization of DNA fragments isolated from cells with experimentally determined nuclease hypersensitive sites in chromatin as mapped by indirect end-labeling and Southern blotting (Nedospasov and Georgiev, 1980; Wu, 1980). Nineteen out of nineteen randomly chosen clones from those libraries represented bona fide DNAse I hypersensitive sites in chromatin (Fyodor Urnov, unpublished results). These data confirmed that the method yields very high-content libraries of active cis-regulatory DNA elements, supporting its application to human CD34+ cells. In collaboration with scientists at the J. Craig Venter Institute and the European Institute of Oncology, libraries of NAS were prepared using this method in HT 454 sequencing from CD34+ and CD34- cells, and showed that 41 out of 51 randomly chosen clones - >80% - coincided with DNAse I hypersensitive sites (Gargiulo et al., submitted). Credits The library of Nuclease Accessible sites (NAS) from human CD34+/CD34- cells was prepared and validated by Saverio Minucci and colleagues at the European Institute of Oncology. Sequencing was performed by Sam Levy and colleagues (J. Craig Venter Institute). This method was initially developed and validated by Fyodor Urnov, Alan Wolffe, and colleagues at Sangamo BioSciences, Inc. References Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22. PMID: 18243105; PMC: PMC2669738 Elgin SC. The formation and function of DNase I hypersensitive sites in the process of gene activation. J Biol Chem. 1988 Dec 25;263(36):19259-62. PMID: 3198625 Gargiulo G, Levy S, et al. A Global Analysis of chromatin Accessibility and Dynamics during Hematopoietic Differentiation. Submitted. Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annu Rev Biochem. 1988;57:159-97. PMID: 3052270 Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003 Jul 10;424(6945):147-51. PMID: 12853946 Nedospasov SA, Georgiev GP. Non-random cleavage of SV40 DNA in the compact minichromosome and free in solution by micrococcal nuclease. Biochem Biophys Res Commun. 1980 Jan 29;92(2):532-9. PMID: 6243943 Turner BM. Chromatin and Gene Regulation: Mechanisms in Epigenetics. Blackwell Science Ltd., Oxford. 2001. Wolffe AP. Chromatin: Structure and Function. Academic Press, San Diego, CA. 1998. Wu C. The 5' ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature. 1980 Aug 28;286(5776):854-60. PMID: 6774262 eioJcviNASNeg EIO/JCVI CD34- NAS CD34- cells Nuclease Accessible Sites Regulation eioJcviNASPos EIO/JCVI CD34+ NAS CD34+ cells Nuclease Accessible Sites Regulation encodeRegions ENCODE Regions Encyclopedia of DNA Elements (ENCODE) Regions Pilot ENCODE Regions and Genes Description This track depicts target regions for the NHGRI ENCODE project. The long-term goal of this project is to identify all functional elements in the human genome sequence to facilitate a better understanding of human biology and disease. During the pilot phase, 44 regions comprising 30 Mb — approximately 1% of the human genome — have been selected for intensive study to identify, locate and analyze functional elements within the regions. These targets are being studied by a diverse public research consortium to test and evaluate the efficacy of various methods, technologies, and strategies for locating genomic features. The outcome of this initial phase will form the basis for a larger-scale effort to analyze the entire human genome. See the NHGRI target selection process web page for a description of how the target regions were selected. To open a UCSC Genome Browser with a menu for selecting ENCODE regions on the human genome, use ENCODE Regions in the UCSC Browser. The UCSC resources provided for the ENCODE project are described on the UCSC ENCODE Portal. Credits Thanks to the NHGRI ENCODE project for providing this initial set of data. ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for hg18 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 eponine Eponine TSS Eponine Predicted Transcription Start Sites Regulation Description The Eponine program provides a probabilistic method for detecting transcription start sites (TSS) in mammalian genomic sequence, with good specificity and excellent positional accuracy. Methods Eponine models consist of a set of DNA weight matrices recognizing specific sequence motifs. Each of these is associated with a position distribution relative to the TSS. Eponine has been tested by comparing the output with annotated mRNAs from human chromosome 22. From this work, we estimate that using the default threshold (0.999) it detects >50% of transcription start sites with approximately 70% specificity. However, it does not always predict the direction of transcription correctly—an effect that seems to be common among computational TSS finders. Credits Thanks to Thomas Down at the Sanger Institute for providing the Eponine program (version 2, March 6, 2002) which was run at UCSC to produce this track. References Down TA, Hubbard TJP. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 2002 Mar;12(3):458-61. evofold EvoFold EvoFold Predictions of RNA Secondary Structure Genes and Gene Predictions Description This track shows RNA secondary structure predictions made with the EvoFold program, a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. Display Conventions and Configuration Track elements are labeled using the convention ID_strand_score. When zoomed out beyond the base level, secondary structure prediction regions are indicated by blocks, with the stem-pairing regions shown in a darker shade than unpaired regions. Arrows indicate the predicted strand. When zoomed in to the base level, the specific secondary structure predictions are shown in parenthesis format. The confidence score for each position is indicated in grayscale, with darker shades corresponding to higher scores. The details page for each track element shows the predicted secondary structure (labeled SS anno), together with details of the multiple species alignments at that location. Substitutions relative to the human sequence are color-coded according to their compatibility with the predicted secondary structure (see the color legend on the details page). Each prediction is assigned an overall score and a sequence of position-specific scores. The overall score measures evidence for any functional RNA structures in the given region, while the position-specific scores (0 - 9) measure the confidence of the base-specific annotations. Base-pairing positions are annotated with the same pair symbol. The offsets are provided to ease visual navigation of the alignment in terms of the human sequence. The offset is calculated (in units of ten) from the start position of the element on the positive strand or from the end position when on the negative strand. The graphical display may be filtered to show only those track elements with scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page. Methods Evofold makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist of both a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score between a phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions. The predictions for this track were based on the conserved elements of an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish, and Fugu assemblies. NOTE: These predictions were originally computed on the hg17 (May 2004) human assembly, from which the hg16 (July 2003), hg18 (May 2006), and hg19 (Feb 2009) predictions were lifted. As a result, the multiple alignments shown on the track details pages may differ from the 8-way alignments used for their prediction. Additionally, some weak predictions have been eliminated from the set displayed on hg18 and hg19. The hg17 prediction set corresponds exactly to the set analyzed in the EvoFold paper referenced below. Credits The EvoFold program and browser track were developed by Jakob Skou Pedersen of the UCSC Genome Bioinformatics Group, now at Aarhus University, Denmark. The RNA secondary structure is rendered using the VARNA Java applet. References EvoFold Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. PMID: 16628248; PMC: PMC1440920 Phylo-SCFGs Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999 Jun;15(6):446-54. PMID: 10383470 Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004;32(16):4925-36. PMID: 15448187; PMC: PMC519121 PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 evofoldV2 EvoFold v.2 EvoFold v.2 Predictions of RNA Secondary Structure Genes and Gene Predictions Description This track shows RNA secondary structure predictions made with the EvoFold (v.2) program, a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. Display Conventions and Configuration Track elements are labeled using the convention ID_strand_score. When zoomed out beyond the base level, secondary structure prediction regions are indicated by blocks, with the stem-pairing regions shown in a darker shade than unpaired regions. Arrows indicate the predicted strand. When zoomed in to the base level, the specific secondary structure predictions are shown in parenthesis format. The confidence score for each position is indicated in grayscale, with darker shades corresponding to higher scores. The details page for each track element shows the predicted secondary structure (labeled SS anno), together with details of the multiple species alignments at that location. Substitutions relative to the human sequence are color-coded according to their compatibility with the predicted secondary structure (see the color legend on the details page). Each prediction is assigned an overall score and a sequence of position-specific scores. The overall score measures evidence for any functional RNA structures in the given region, while the position-specific scores (0 - 9) measure the confidence of the base-specific annotations. Base-pairing positions are annotated with the same pair symbol. The offsets are provided to ease visual navigation of the alignment in terms of the human sequence. The offset is calculated (in units of ten) from the start position of the element on the positive strand or from the end position when on the negative strand. The graphical display may be filtered to show only those track elements with scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page. Methods Evofold makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist of both a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score between a phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions. The predictions for this track were based on the conserved segments of a human-referenced (hg18) 31-way vertebrate alignment comprising 28 mammalian assemblies and three other vertebrate assemblies (see Parker et al for details). The 31-way alignment is a subset of the 44-way alignment displayed on hg18. Additional resources Auxiliary data sets and a family classification of the predictions can be browsed on a mirror site from here. Credits The EvoFold program and browser track were developed by Jakob Skou Pedersen initially at UCSC Genome Bioinformatics Group and later at University of Copenhagen and at Aarhus University, Denmark (current position). Parker et al. describes the current set of predictions and their family classification. The multiple alignments used for the analysis were generated at UCSC as part of the 29 Mammals Sequencing and Analysis Consortium (Lindblad-Toh et al.). The RNA secondary structure is rendered using the VARNA Java applet. References EvoFold Parker BJ, Moltke I, Roth A, Washietl S, Wen J, Kellis M, Breaker R, and Pedersen JS. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res. in press. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. Phylo-SCFGs Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999 Jun;15(6):446-54. Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004 Sep 24;32(16):4925-36. Alignments and conserved elements Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, et al. A high-resolution map of evolutionary constraint in the human genome based on 29 eutherian mammals. In review. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. exaptedRepeats Exapted Repeats Repeats Exapted as Conserved Non-Exonic Elements Variation and Repeats Description This track displays conserved non-exonic elements that have been deposited by mobile elements (repeats), a process termed "exaptation" (Gould et al., 1982). These regions were identified during a genome-wide survey (Lowe et al., 2007) with the expectation that regions of this type may act as distal transcriptional regulators for nearby genes. A previous case study experimentally verified an exapted mobile element acting as a distal enhancer (Bejerano et al. , 2006). Methods All regions were identified as having originated as mobile element insertions by RepeatMasker (Smit et al.). A subset of elements that have clear repeat homology can be identified by very significant BLASTZ (Schwartz et al., 2003) alignments to consensus sequences in RepBase (Jurka et al., 2000). This dataset is from a genome-wide survey of mobile elements being exapted as conserved non-exonic sequence; a full explanation of methods can be found in Lowe et al., 2007. References Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006 May 4;441(7089):87-90. Gould SJ, Vrba ES. Exaptation; a missing term in the science of form. Paleobiology. 1982 Jan 1;8(1):4-15. Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. Lowe CB, Bejerano G, Haussler D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc Natl Acad Sci U S A. 2007 May 8;104(19):8005-10. Epub 2007 Apr 26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Smit AFA et al. www.repeatmasker.org exoniphy Exoniphy Exoniphy Human/Mouse/Rat/Dog Genes and Gene Predictions Description The exoniphy program identifies evolutionarily conserved protein-coding exons in a multiple alignment using a phylogenetic hidden Markov model (phylo-HMM), a statistical model that simultaneously describes exon structure and exon evolution. This track shows exoniphy predictions for the human Mar. 2006 (hg18), mouse Feb. 2006 (mm8), rat Nov. 2004 (rn4), and dog May 2005 (canFam2) genomes, as aligned by the multiz program. For this track, only alignments on the "syntenic net" between human and each other species were considered. Methods For a description of exoniphy, see Siepel et al. (2004). Multiz is described in Blanchette et al. (2004). The alignment chaining methods behind the "syntenic net" are described in Kent et al. (2003). Acknowledgments Thanks to Brona Brejova of Cornell University for producing these predictions. References Blanchette M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708-175. Kent WJ. et al. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. P. Natl. Acad. Sci. USA. 2003;100(20):11484-11489. Siepel A, Haussler D. Computational identification of evolutionarily conserved exons. RECOMB '04. 2004. firstEF FirstEF FirstEF: First-Exon and Promoter Prediction Regulation Description This track shows predictions from the FirstEF (First Exon Finder) program. Three types of predictions are displayed: exon, promoter and CpG window. If two consecutive predictions are separated by less than 1000 bp, FirstEF treats them as one cluster of alternative first exons that may belong to same gene. The cluster number is displayed in the parentheses of each item. For example, "exon(405-)" represents the exon prediction in cluster number 405 on the minus strand. The exon, promoter and CpG-window are interconnected by this cluster number. Alternative predictions within the same cluster are denoted by "#N" where "N" is the serial number of an alternative prediction in the cluster. Each predicted exon is either CpG-related or non-CpG-related, based on a score of the frequency of CpG dinucleotides. An exon is classified as CpG-related if the CpG score is greater than a threshold value, and non-CpG-related if less than the threshold. If an exon is CpG-related, its associated CpG-window is displayed. The browser displays features with higher scores in darker shades of gray/black. Method FirstEF is a 5' terminal exon and promoter prediction program. It consists of different discriminant functions structured as a decision tree. The probabilistic models are optimized to find potential first donor sites and CpG-related and non-CpG-related promoter regions based on discriminant analysis. For every potential first donor site (GT) and an upstream promoter region, FirstEF decides whether or not the intermediate region can be a potential first exon, based on a set of quadratic discriminant functions. FirstEF calculates the a posteriori probabilities of exon, donor, and promoter for a given GT and an upstream window of length 570 bp. For a description of the FirstEF program and the underlying classification models, refer to Davuluri et al., 2001. Credits The predictions for this track are produced by Ramana V. Davuluri of Ohio State University and Ivo Grosse and Michael Q. Zhang of Cold Spring Harbor Lab. References Davuluri RV, Grosse I, Zhang MQ. Computational identification of promoters and first exons in the human genome. Nat Genet. 2001 Dec;29(4):412-7. fishClones FISH Clones Clones Placed on Cytogenetic Map Using FISH Mapping and Sequencing Description This track shows the location of fluorescent in situ hybridization (FISH)-mapped clones along the assembly sequence. The locations of these clones were obtained from the NCBI Human BAC Resource here. Earlier versions of this track obtained this information directly from the paper Cheung, et al. (2001). More information about the BAC clones, including how they may be obtained, can be found at the Human BAC Resource and the Clone Registry web sites hosted by NCBI. To view Clone Registry information for a clone, click on the clone name at the top of the details page for that item. Using the Filter This track has a filter that can be used to change the color or include/exclude the display of a dataset from an individual lab. This is helpful when many items are shown in the track display, especially when only some are relevant to the current task. The filter is located at the top of the track description page, which is accessed via the small button to the left of the track's graphical display or through the link on the track's control menu. To use the filter: In the pulldown menu, select the lab whose data you would like to highlight or exclude in the display. Choose the color or display characteristic that will be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display clones from the lab selected in the pulldown list. If "include" is selected, the browser will display clones only from the selected lab. When you have finished configuring the filter, click the Submit button. Credits We would like to thank all of the labs that have contributed to this resource: Fred Hutchinson Cancer Research Center (FHCRC) National Cancer Institute (NCI) Roswell Park Cancer Institute (RPCI) The Wellcome Trust Sanger Institute (SC) Cedars-Sinai Medical Center (CSMC) Los Alamos National Laboratory (LANL) UC San Francisco Cancer Center (UCSF) References Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen XN, Furey TS, Kim UJ, Kuo WL, Olivier M et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001 Feb 15;409(6822):953-8. PMID: 11237021 fosEndPairs Fosmid End Pairs Fosmid End Pairs Mapping and Sequencing Description A valid pair of fosmid end sequences must be at least 30 kb but no more than 50 kb away from each other. The orientation of the first fosmid end sequence must be "+" and the orientation of the second fosmid end sequence must be "-". Note: For hg19 and hg18 assemblies, the Fosmid End Pairs track is a main track under the "Mapping and Sequencing" track category. On the hg38 assembly, the FOSMID End Pairs track is a subtrack within the Clone Ends track under the "Mapping and Sequencing" track category. Under the list of subtracks on the Clone Ends Track Settings page, the FOSMID End Pairs track is now named "WIBR-2 Fosmid library." With the WIBR-2 Fosmid library track setting on full, individual clone end mapping items are listed in the browser; click into any item to see details from NCBI. Methods End sequences were trimmed at the NCBI using ssahaCLIP written by Jim Mullikin. Trimmed fosmid end sequences were placed on the assembled sequence using Jim Kent's blat program. Credits Sequencing of the fosmid ends was done at the Eli & Edythe L. Broad Institute of MIT and Harvard University. Clones are available through the BACPAC Resources Center at Children's Hospital Oakland Research Institute (CHORI). fox2ClipSeqComp FOX2 CLIP-seq FOX2 Adaptor-trimmed CLIP-seq reads Regulation Description The FOX2 CLIP-seq track shows adaptor-trimmed CLIP-seq reads that mapped uniquely to the repeat-masked human genome (hg17). The reads were converted to hg18 coordinates using the UCSC LiftOver tool. Reads on the forward strand are displayed in blue; those on the reverse strand are shown in red. Methods Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-seq) of cell type-specific splicing regulator FOX2 (also known as RBM9) was performed in human embryonic stem cells. MosaikAligner was utilized to align the reads to the repeat-masked genome. Briefly, HUES6 human embryonic stem cells were treated with UV irradiation to stabilize in vivo protein-RNA interactions, followed by antibody-mediated precipitation of specific RNA-protein complexes. SDS-PAGE was then utilized to isolate protein-RNA adducts after RNA trimming with nuclease, 3'RNA linkers were ligated, and nucleotides were 5' end labeled with γ-32P-ATP. Recovered RNA was ligated to a 5' linker before amplification by RT-PCR. Both linkers were designed to be compatible with Illumina 1G genome analyzer sequencing. Approximately 4 million reads were uniquely mapped to the repeat-masked human genome by MosaikAligner. To identify CLIP clusters, we performed the following steps: (i) CLIP reads were associated with protein-coding genes as defined by the region from the annotated transcriptional start to the end of each gene locus. (ii) CLIP reads were separated into the categories of sense or antisense to the transcriptional direction of the gene. (iii) Sense CLIP reads were extended by 100 nt in the 5'-to-3' direction. The height of each nucleotide position is the number of reads that overlap that position. (iv) The count distribution of heights is as follows from 1, 2, ...h, ...H-1, H: {n1, n2, ...nh, ...nH-1, nH; N = Σni (i = 1:H)}. For a particular height, h, the associated probability of observing a height of at least h is Ph = Σni(i = h:H) /N. (v) We computed the background frequency after randomly placing the same number of extended reads within the gene for 100 iterations. This controls for the length of the gene and the number of reads. For each iteration, the count distribution and probabilities for the randomly placed reads (Ph,random) was generated as in step (iv). (vi) Our modified FDR for a peak height was computed as FDR(h) = (μh + σh)/Ph, where μh and σh is the average and s.d., respectively, of Ph,random across the 100 iterations. For each gene loci, we chose a threshold peak height h* as the smallest height equivalent to FDR(h*) < 0.001. We identified FOX2 binding clusters by grouping nucleotide positions satisfying h > h* and occurred within 50 nt of each other. For further details of the method used to generate this annotation please refer to Yeo et al. (2009). Credits Thanks to Gene Yeo at the University of California, San Diego for providing this annotation. For additional information on FOX2 CLIP-seq reads, please contact geneyeo@ucsd. edu directly. References Yeo GW, Coufal NG, Liang YL, Peng GE, Fu XD, Gage FH. An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cells. Nat. Struct. Mol. Biol. 2009 Jan 11;16:130-137. fox2ClipSeqCompViewreads Reads FOX2 Adaptor-trimmed CLIP-seq reads Regulation fox2ClipSeq FOX2 CLIP-seq FOX2 Adaptor-trimmed CLIP-seq Reads Regulation fox2ClipSeqCompViewdensity Density FOX2 Adaptor-trimmed CLIP-seq reads Regulation fox2ClipSeqDensityReverseStrand Density Reverse FOX2 Adaptor-trimmed CLIP-seq Density Reverse Strand Regulation fox2ClipSeqDensityForwardStrand Density Forward FOX2 Adaptor-trimmed CLIP-seq Density Forward Strand Regulation fox2ClipSeqCompViewclusters Clusters FOX2 Adaptor-trimmed CLIP-seq reads Regulation fox2ClipClusters FOX2 clusters FOX2 Binding Site Clusters Regulation gad GAD View Genetic Association Studies of Complex Diseases and Disorders Phenotype and Disease Associations Disclaimer The Genetic Association Database (GAD) is intended for use primarily by medical scientists and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the GAD database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. These data are provided by the GAD and do not represent any additional curation by UCSC. Description After serving the scientific community for more than 10 years, the Genetic Association Database (GAD) has been retired and all data is "frozen" as of 09/01/2014. The Genetic Association Database is an archive of human genetic association studies of complex diseases and disorders. The goal of the database is to allow the user to rapidly identify medically relevant polymorphism from the large volume of polymorphism and mutational data, in the context of standardized nomenclature. If the track is displayed in "pack" or "full" mode, mousing over an entry of this track will show a pop-up message listing all associated diseases. In "full" mode, each feature is labeled with the associated disease class code (as defined below). Disease Class Disease Class Code AGING AGE CANCER CAN CARDIOVASCULAR CARD CHEMICAL DEPENDENCY CHEM DEVELOPMENTAL DEV HEMATOLOGICAL HEM IMMUNE IMM INFECTION INF METABOLIC MET MITOCHONDRIAL MITO NEUROLOGICAL NEUR NORMAL VARIATION NV OTHER OTH PHARMACOGENOMICS PHARM PSYCHIATRIC PSY RENAL REN REPRODUCTION REP UNKNOWN UNK VISION VIS Methods Study data are recorded in the context of official human gene nomenclature with additional molecular reference numbers and links. The data are gene-centered; that is, each record is based on a gene or marker. For example, if a study investigated six genes for a particular disorder, there will be six records. Gene information is standardized and annotated with molecular information, enabling integration with other molecular and genomic data resources. Data Data are added to GAD on a periodic basis by the curator or investigators. A majority of the records in GAD are extracted from the online HuGE Navigator database, which is sponsored by the Centers for Disease Control and Prevention. HuGE Navigator provides access to a continuously updated, curated knowledge base of gene-disease associations, meta-analyses, and related information on genes and diseases extracted from NCBI PubMed. A gene-centered view is available via Genopedia. Contacts For more information on this dataset, contact Kevin G. Becker, PhD, Yongqing Zhang, PhD, and John Garner, MS, from the DNA Array Unit, NIA, NIH. References Becker KG, Barnes KC, Bright TJ, Wang AS. The Genetic Association Database. Nature Genetics 2004 May; 36(5):431-432. gap Gap Gap Locations Mapping and Sequencing Description This track depicts gaps in the assembly. These gaps - with the exception of intractable heterochromatic gaps - will be closed during the finishing process. Gaps are represented as black boxes in this track. If the relative order and orientation of the contigs on either side of the gap is known, it is a bridged gap and a white line is drawn through the black box representing the gap. This assembly contains the following principal types of gaps: Fragment - gaps between the contigs of a draft clone. (In this context, a contig is a set of overlapping sequence reads.) These may be bridged or not. Clone - gaps between clones in the same map contig. These may be bridged or not. Contig - non-bridged gaps between map contigs. Centromere - non-bridged gaps from centromeres. Telomere - non-bridged gaps from telomeres. Heterochromatin - non-bridged gaps from large blocks of heterochromatin. Short Arm - non-bridged long gaps on the short arm of the chromosome. gc5Base GC Percent GC Percent in 5-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows. High GC content is typically associated with gene-rich areas. This track may be configured in a variety of ways to highlight different apsects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. wgEncodeSangerGencode Gencode Genes ENCODE Gencode Gene Annotations Genes and Gene Predictions Release Notes This release of the Gencode Genes track (Version 3c, October 2009) shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. Version 3 of the Gencode gene set presents a full merge between HAVANA and ENSEMBL, giving priority to the manually curated Havana objects and using ENSEMBL objects where they are different or fall into un-annotated regions. The annotation was carried out on genome assembly GRCh37 (hg19), features are projected back to NCBI36 (hg18) where possible. Gencode 3c is a small update of version 3b (July 09 freeze) mainly for chromosomes 3 & 4 for which the latest annotation was held back and QC'ed again to be used in the RNASeq Genome Annotation Assessment Project. Statistics about this release can be found here. Display Conventions and Configuration The annotations are divided into separate tracks based on source/confidence. The Gencode project recommends that the annotations from level 1 & 2 be used as the reference gene annotation, level 3 was added to fill gaps for methods that analyze the entire genome and require a full set. Level 1: validated At this time only pseudogene loci, that were predicted by the analysis-pipelines from YALE, UCSC as well as by HAVANA manual annotation from WTSI. Level 2: manual annotation HAVANA manual annotation from WTSI. The following regions are considered "fully annotated" and contain level 2 annotation from HAVANA only, although they will still be updated: chromosomes 1, 2, 6, 9, 10, 13, 20, 21, 22, X, Y, ENCODE pilot regions, chr11:2353995-3878750. HAVANA manual annotation from WTSI. --> Level 3: automated annotation ENSEMBL loci in regions where no HAVANA annotation can be found. NOTE: The release cycles for Gencode, Havana and Ensembl differ. Users are cautioned to compare release dates to determine which annotation is most current. The gene annotations are colored based on the HAVANA annotation type and the confidence level. See the table below for the color key, as well as more detail about the transcript and feature types. Class Color Description Transcript Types (see Vega Transcript Types) Validated_coding Dark Orange Level 1 Validated:coding regions protein_coding Validated_processed Light Orange Level 1 Validated:processed processed_transcript Validated_processed_pseudogene Dark Pink Level 1 Validated:processed pseudogenes processed_pseudogene, processed_transcript, transcribed_processed_pseudogene Validated_unprocessed_pseudogeneMedium Pink Level 1 Validated:unprocessed pseudogenes transcribed_unprocessed_pseudogene, unprocessed_pseudogene Validated_pseudogene Light Pink Level 1 Validated:pseudogenes IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Havana_coding Dark Orange Level 2 Manual annotation:coding IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding Havana_nonsense Medium OrangeLevel 2 Manual annotation:nonsense nonsense_mediated_decay Havana_non_coding Light Orange Level 2 Manual annotation:non-coding ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron Havana_polyA Black Level 2 Manual annotation:polyA polyA_signal, polyA_site, pseudo_polyA Havana_processed_pseudogene Dark Pink Level 2 Manual annotation:processed pseudogene processed_pseudogene, transcribed_processed_pseudogene Havana_unprocessed_pseudogene Medium Pink Level 2 Manual annotation:unprocessed pseudogene transcribed_unprocessed_pseudogene, unprocessed_pseudogene Havana_pseudogene Light Pink Level 2 Manual annotation:pseudogene IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Havana_TEC Grey Level 2 Manual annotation:TEC TEC, artifact Ensembl_coding Dark Red Level 3 Automated annotation:coding IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding Ensembl_non_coding Light Orange Level 3 Automated annotation:non-coding antisense, non_coding, processed_transcript, retained_intron Ensembl_pseudogene Dark Pink Level 3 Automated annotation:pseudogene IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Ensembl_processed_pseudogene Medium Pink Level 3 Automated annotation:processed pseudogene processed_pseudogene Ensembl_unprocessed_pseudogene Light Pink Level 3 Automated annotation:unprocessed pseudogeneunprocessed_pseudogene Ensembl_RNA Light Red Level 3 Automated annotation:RNA transcripts Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan 2way_consensus_pseudogene Dark Purple Level 3 Automated annotation:pseudogenes pseudogenes Validated_coding Dark Blue Level 1 Validated:coding regions protein_coding Validated_processed Dark Orange Level 1 Validated:processed processed_transcript Validated_pseudogene Dark Red Level 1 Validated:pseudogenes IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Validated_processed_pseudogene Dark Purple Level 1 Validated:processed pseudogenes processed_pseudogene, processed_transcript, transcribed_processed_pseudogene Validated_unprocessed_pseudogeneDark Pink Level 1 Validated:unprocessed pseudogenes transcribed_unprocessed_pseudogene, unprocessed_pseudogene Havana_coding Medium Blue Level 2 Manual annotation:coding IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding Havana_non_coding Medium OrangeLevel 2 Manual annotation:non-coding ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron Havana_nonsense Grey-Blue Level 2 Manual annotation:nonsense nonsense_mediated_decay Havana_polyA Black Level 2 Manual annotation:polyA polyA_signal, polyA_site, pseudo_polyA Havana_pseudogene Medium Red Level 2 Manual annotation:pseudogene IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Havana_processed_pseudogene Medium PurpleLevel 2 Manual annotation:processed pseudogene processed_pseudogene, transcribed_processed_pseudogene Havana_unprocessed_pseudogene Medium Pink Level 2 Manual annotation:unprocessed pseudogene transcribed_unprocessed_pseudogene, unprocessed_pseudogene Havana_TEC Grey-Yellow Level 2 Manual annotation:TEC TEC, artifact Ensembl_coding Light Blue Level 3 Automated annotation:coding IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding Ensembl_non_coding Light Orange Level 3 Automated annotation:non-coding antisense, non_coding, processed_transcript, retained_intron Ensembl_pseudogene Light Red Level 3 Automated annotation:pseudogene IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Ensembl_processed_pseudogene Light Purple Level 3 Automated annotation:processed pseudogene processed_pseudogene Ensembl_unprocessed_pseudogene Light Pink Level 3 Automated annotation:unprocessed pseudogeneunprocessed_pseudogene Ensembl_RNA Grey-Red Level 3 Automated annotation:RNA transcripts Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan And more history... Validated_coding Dark Blue Level 1 Validated:coding regions protein_coding Validated_processed Light Blue Level 1 Validated:processed processed_transcript Validated_processed_pseudogene Dark Purple Level 1 Validated:processed pseudogenes processed_pseudogene, processed_transcript, transcribed_processed_pseudogene Validated_unprocessed_pseudogeneMedium PurpleLevel 1 Validated:unprocessed pseudogenes transcribed_unprocessed_pseudogene, unprocessed_pseudogene Validated_pseudogene Light Purple Level 1 Validated:pseudogenes IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Havana_coding Dark Orange Level 2 Manual annotation:coding IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding Havana_non_coding Light Orange Level 2 Manual annotation:non-coding ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron Havana_nonsense Medium OrangeLevel 2 Manual annotation:nonsense nonsense_mediated_decay Havana_polyA Black Level 2 Manual annotation:polyA polyA_signal, polyA_site, pseudo_polyA Havana_processed_pseudogene Dark Pink Level 2 Manual annotation:processed pseudogene processed_pseudogene, transcribed_processed_pseudogene Havana_unprocessed_pseudogene Medium Pink Level 2 Manual annotation:unprocessed pseudogene transcribed_unprocessed_pseudogene, unprocessed_pseudogene Havana_pseudogene Light Pink Level 2 Manual annotation:pseudogene IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Havana_TEC Grey Level 2 Manual annotation:TEC TEC, artifact Ensembl_coding Red Level 3 Automated annotation:coding IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding Ensembl_non_coding Light Red Level 3 Automated annotation:non-coding antisense, non_coding, processed_transcript, retained_intron Ensembl_processed_pseudogene Dark Salmon Level 3 Automated annotation:processed pseudogene processed_pseudogene Ensembl_unprocessed_pseudogene Salmon Level 3 Automated annotation:unprocessed pseudogeneunprocessed_pseudogene Ensembl_pseudogene Light Salmon Level 3 Automated annotation:pseudogene IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene Ensembl_RNA Faded Red Level 3 Automated annotation:RNA transcripts Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan --> Dark Green color="#00CC00">Medium Green color="#00FF00">Light Green --> This track uses filtering by category to select subsets of transcripts and has additional advanced features. Help with these features can be found here. Methods We aim to annotate all evidence-based gene features at high accuracy on the human reference sequence. This includes identifying all protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence, and pseudogenes. We integrate computational approaches (including comparative methods), manual annotation and targeted experimental verification. For a detailed description of the methods and references used, see Harrow et al (2006). Verification See Harrow et al. (2006) for information on verification techniques. Credits This GENCODE release is the result of a collaborative effort among the following laboratories: (contact: Felix Kokocinski) Lab/Institution Contributors HAVANA annotation group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Adam Frankish, James Gilbert, Jennifer Harrow, Felix Kokocinski, Stephen Trevanion, Tim Hubbard (GENCODE Principal Investigator) Genome Bioinformatics Lab (CRG), Barcelona, Spain Thomas Derrien, Tyler Alioto, Roderic Guigó Genome Bioinformatics, University of California Santa Cruz (UCSC), USA Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler Comp. Genomics Lab, Washington University St. Louis (WUSTL), USA Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent Computer Science and Artificial Intelligence Lab, Broad Institute of MIT and Harvard, USA Mike Lin, Manolis Kellis Bioinformatics, Yale University (Yale), USA Philip Cayting, Mark Gerstein Center for Integrative Genomics, University of Lausanne, Switzerland Cedric Howald, Alexandre Reymond ENSEMBL genebuild group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Bronwen Aken, Julio Fernandez Banet, Stephen Searle Structural Computational Biology Group, Centro Natcional de Investigaciones Oncologicas (CNIO), Madrid, Spain Manuel Rodríguez José, Jan-Jaap Wesselink, Michael Tress, Alfonso Valencia References Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, et al. The GENCODE exome: sequencing the complete human exome. European Journal of Human Genetics. March 2011;19 827-831. [Epub ahead of print] Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. Data Release Policy GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here. wgEncodeGencodePolyaV3 Gencode PolyA Gencode ENCODE Sep 2009 Freeze 2009-10-27 1878 Hubbard Sanger V3c wgEncodeGencodePolyaV3 GENCODE Hubbard Hubbard - GENCODE at Sanger Institute ENCODE Gencode PolyA Transcript Annotations (level 2) (Oct 2009) Genes and Gene Predictions wgEncodeGencodeAutoV3 Gencode Auto Gencode ENCODE Sep 2009 Freeze 2009-10-27 1878 Hubbard Sanger V3c wgEncodeGencodeAutoV3 GENCODE Hubbard Hubbard - GENCODE at Sanger Institute ENCODE Gencode Automated Gene Annotations (level 3) (Oct 2009) Genes and Gene Predictions wgEncodeGencodeManualV3 Gencode Manual Gencode ENCODE Sep 2009 Freeze 2009-10-27 1878 Hubbard Sanger V3c wgEncodeGencodeManualV3 GENCODE Hubbard Hubbard - GENCODE at Sanger Institute ENCODE Gencode Manual Gene Annotations (level 1+2) (Oct 2009) Genes and Gene Predictions rnaCluster Gene Bounds Gene Boundaries as Defined by RNA and Spliced EST Clusters mRNA and EST Description This track shows the boundaries of genes and the direction of transcription as deduced from clustering spliced ESTs and mRNAs against the genome. When many spliced variants of the same gene exist, this track shows the variant that spans the greatest distance in the genome. Method ESTs and mRNAs from GenBank were aligned against the genome using BLAT. Alignments with less than 97.5% base identity within the aligning blocks were filtered out. When multiple alignments occurred, only those alignments with a percentage identity within 0.2% of the best alignment were kept. The following alignments were also discarded: ESTs that aligned without any introns, blocks smaller than 10 bases, and blocks smaller than 130 bases that were not located next to an intron. The orientations of the ESTs and mRNAs were deduced from the GT/AG splice sites at the introns; ESTs and mRNAs with overlapping blocks on the same strand were merged into clusters. Only the extent and orientation of the clusters are shown in this track. Scores for individual gene boundaries were assigned based on the number of cDNA alignments used: 300 — based on a single cDNA alignment 600 — based on two alignments 900 — based on three alignments 1000 — based on four or more alignments Credits This track, which was originally developed by Jim Kent, was generated at UCSC and uses data submitted to GenBank by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32:D23-6. Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. geneid Geneid Genes Geneid Gene Predictions Genes and Gene Predictions Description This track shows gene predictions from the geneid program developed by Roderic Guigó's Computational Biology of RNA Processing group which is part of the Centre de Regulació Genòmica (CRG) in Barcelona, Catalunya, Spain. Methods Geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. Credits Thanks to Computational Biology of RNA Processing for providing these data. References Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr Protoc Bioinformatics. 2007 Jun;Chapter 4:Unit 4.3. PMID: 18428791 Parra G, Blanco E, Guigó R. GeneID in Drosophila. Genome Res. 2000 Apr;10(4):511-5. PMID: 10779490; PMC: PMC310871 geneReviews GeneReviews GeneReviews Phenotype and Disease Associations Description GeneReviews is an online collection of expert-authored, peer-reviewed articles that describe specific gene-related diseases. GeneReviews articles are searchable by disease name, gene symbol, protein name, author, or title. GeneReviews is supported by the National Institutes of Health, hosted at NCBI as part of the Genetic Testing Registry (GTR). The GeneReviews data underlying this track will be updated frequently. The GeneReviews track allows the user to locate the NCBI GeneReviews resource quickly from the Genome Browser. Hovering the mouse on track items shows the gene symbol and associated diseases. A condensed version of the GeneReviews article name and its related diseases are displayed on the item details page as links. Similar information, when available, is provided in the details page of items from the UCSC Genes, RefSeq Genes, and OMIM Genes tracks. Data Access The raw data for the GeneReviews track can be explored interactively with the Table Browser. Cross-referencing can be done with Data Integrator. The complete source file, in bigBed format, can be downloaded from our downloads directory. For automated analysis, the data may be queried from our REST API. Previous versions of this track can be found on our archive download server. References Pagon RA, Adam MP, Bird TD, et al., editors. GeneReviews® [Internet]. Seattle (WA): University of Washington, Seattle; 1993-2014. Available from: https://www.ncbi.nlm.nih.gov/books/NBK1116. pgSnp Genome Variants Personal Genome Variants Variation and Repeats Description This track displays variant base calls from the publicly released genome sequences of several individuals: 5 Sub-Saharan African genomes sequenced by Penn State University: !Gubi (KB1), G/aq'o (NB1), !Ai (MD8), D#kgao (TK1), Archbishop Desmond Tutu (ABTutu), 6 individuals from the 1000 Genome Project high-coverage pilot: a CEU daughter and parents (NA12878, NA12891, NA12892) a YRI daughter and parents (NA19240, NA19238, NA19239) and independently published genomes: Craig Venter, James Watson, Anonymous Yoruba individual NA18507, Anonymous Han Chinese individual (YH, YanHuang Project), Seong-Jim Kim (SJK), Anonymous Korean individual (AK1), Anonymous Irish male, Gregory Lucier, Stephen Quake, Extinct Palaeo-Eskimo Saqqaq individual Note: The Khoisan languages are characterized by clicks, denoting additional consonants. The ! is a palatal click, / is a dental click, and # is an alveolar click (Le Roux and White, 2004). Display Conventions and Configuration Substitutions and indels are displayed as boxes. When read frequency data are available, they are displayed in the mouseover text (e.g. "T:8 G:3" means that 8 reads contained a T and 3 reads contained a G at that base position), and box colors are used to show the proportion of alleles. In the genome browser, when viewing the forward strand of the reference genome (the normal case), the displayed alleles are relative to the forward strand. When viewing the reverse strand of the reference genome ("reverse" button), the displayed alleles are reverse-complemented to match the reverse strand. On the details page for each variant, the alleles are given for the forward strand of the reference genome. Frequency and phenotype data are shown when available. Sources KB1, NB1, MD8, TK1, ABTutu (Penn State) (Schuster et al.) SNPs are from the allSNPs.txt file which can be downloaded from Galaxy. The indels are also available for download from Galaxy. CEU trio NA12878, NA12891, NA12892; YRI trio NA19240, NA19238, NA19239 (1000 Genomes Project) (1000 Genomes) The variants shown are from the 1000 Genomes Project's March 2010 release. The CEU variant calls were based on sequence data from the Wellcome Trust Sanger Insititute and the Broad Institute, using the Illumina/Solexa platform. The YRI variant calls were based on sequence data from the Baylor College of Medicine Human Genome Sequencing Center and Applied Biosystems, using the SOLiD platform. For more information on the mapping, variant calling, filtering and validation, see the pilot 2 README file. The variant calls are available from the March 2010 release subdirectory at EBI and at NCBI. Craig Venter (JCVI) (Levy et al.) An overview is given here. This subtrack contains Venter's single-base and multi-base variants and small (< 100 bp) insertions/deletions from the file HuRef.InternalHuRef-NCBI.gff, filtered to include only Method 1 variants (where each variant was kept in its original form and not post-processed), and to exclude any variants that had N as an allele. JCVI hosts a genome browser. James Watson (CSHL) (Wheeler et al.) These single-base variants came from the file watson_snp.gff.gz. CSHL hosts a genome browser. Yoruba NA18507 (Illumina Cambridge/Solexa) (Bentley et al.) Illumina released the read sequences to the NCBI Short Read Archive. Aakrosh Ratan in the Miller Lab at Penn State University (PSU) mapped the sequence reads to the reference genome and called single-base variants and small insertions/deletions (< 20 bp) using MAQ. YH (YanHuang Project) (Wang et al.) The YanHuang Project released these single-base variants from the genome of a Han Chinese individual. The data are available from the YH database in the file yhsnp_add.gff. The YanHuang Project hosts a genome browser. SJK (GUMS/KOBIC) (Ahn et al.) Researchers at Gachon University of Medicine and Science (GUMS) and the Korean Bioinformation Center (KOBIC) released these single-base variants from the genome of Seong-Jin Kim. The data are available from KOBIC in the file KOREF-solexa-snp-X30_Q40d4D100.gff. AK1 (Genomic Medicine Institute) (Kim et al.) The variants shown are from the AK1_SNP.tar.gz download. Stephen Quake (Stanford) (Pushkarev et al.) The variants shown are from the Trait-o-matic download. Anonymous Irish male (Tong et al.) The SNPs shown are from the Galaxy library, Irish whole genome. Gregory Lucier (Life Technologies) The SNPs shown are from Nimbus Informatics. Sequencing was done using the Life SOLiD platform. Palaeo-Eskimo Saqqaq individual (Saqqaq Genome Project) (Rasmussen et al.) The variants shown are all the SNPs found by the SNPest program, and in a second track the high confidence SNPs from the first set. The allele counts are not available for these tracks but read depth is available. The read depth was put in place of the allele counts to give a measure of the reliability of the call. References KB1, NB1, MD8, TK1, ABTutu (Penn State) Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, Kasson LR, Harris RS, Petersen DC, Zhao F, Qi J, et al. Complete Khoisan and Bantu genomes from southern Africa. Nature. 2010 Feb 18;463(7283):943-7. Le Roux, W., and White, A. The voices of the San living in Southern Africa today. Cape Town: Kwla books (2004) CEU trio NA12878, NA12891, NA12892; YRI trio NA19240, NA19238, NA19239 (1000 Genomes) 1000 Genomes Project Consortium, Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. Craig Venter (JCVI) Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007 Sep 4;5(10):e254. James Watson (CSHL) Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008 Apr 17;452(7189):872-6. Yoruba NA18507 (Illumina Cambridge/Solexa) Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9. YH (YanHuang Project) Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, et al. The diploid genome sequence of an Asian individual. Nature. 2008 Nov 6;456(7218):60-5. SJK (GUMS/KOBIC) Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, Kim WY, Kim C, et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 2009 Sep;19(9):1622-9. AK1 (Genomic Medicine Institute) Jong-Il Kim, Young Seok Ju, Hansoo Park, Sheehyun Kim, Seonwook Lee, Jae-Hyuk Yi, Joann Mudge, Neil A. Miller, Dongwan Hong, Callum J. Bell, et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011-1015 (20 August 2009). Stephen Quake Pushkarev D, Neff NF, Quake SR "Single-molecule Sequencing of an Individual Human Genome" Nature Biotech 27, 847-850 (10 August 2009) doi:10.1038 PDF Anonymous Irish Male Tong P, Prendergast JG, Lohan AJ, Farrington SM, Cronin S, Friel N, Bradley DG, Hardiman O, Evans A, Wilson JF, Loftus B. Sequencing and analysis of an Irish human genome. Genome Biol. 2010;11(9):R91. Gregory Lucier Not published, data provided by Life Technologies and Nimbus Informatics. Palaeo-Eskimo Saqqaq individual Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, et al. Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo. Nature 463, 757-762 (11 February 2010). pgSnp1off Single Genomes Personal Genome Variants Variation and Repeats pgSaqqaqHc Saqqaq HC Individual from the Extinct Palaeo-Eskimo Saqqaq, high confidence SNPs Variation and Repeats pgSaqqaq Saqqaq Individual from the Extinct Palaeo-Eskimo Saqqaq (Saqqaq Genome Project) Variation and Repeats pgQuake S. Quake Stephen Quake (Stanford) Variation and Repeats pgLucier Greg Lucier Gregory Lucier (Life Technologies) Variation and Repeats pgIrish Irish Male Anonymous Irish Male Variation and Repeats pgAk1 AK1 Anonymous Korean individual, AK1 (Genomic Medicine Institute) Variation and Repeats pgSjk SJK Seong-Jin Kim (SJK, GUMS/KOBIC) Variation and Repeats pgYh1 YanHuang Han Chinese Individual (YanHuang Project) Variation and Repeats pgYoruban3 YRI NA18507 YRI NA18507 (Illumina Cambridge/Solexa, SNPs called by PSU) Variation and Repeats pgWatson Watson James Watson (CSHL) Variation and Repeats pgVenterIndel Venter indels J. Craig Venter - Published Method 1, Indels in Original Form (JCVI) Variation and Repeats pgVenterSnp Venter J. Craig Venter - Published Method 1, Variant in Original Form (JCVI) Variation and Repeats pgVenter Venter J. Craig Venter - Published Method 1, Variant in Original Form (JCVI) Variation and Repeats pgSnpPSU PSU Bushmen Personal Genome Variants Variation and Repeats pgYoruban2 YRI '8507, SOLiD YRI NA18507 Sequenced on the SOLiD Platform (PSU) Variation and Repeats pgAbt454indels ABTutu exome indels ABTutu Genome Variants, 454 exome indels Variation and Repeats pgAbtIllum ABTutu Illum ABTutu Genome Variants, Illumina 7.2X Variation and Repeats pgAbt454 ABTutu exome ABTutu Genome Variants, 454 exome Variation and Repeats pgAbtSolid ABTutu ABTutu Genome Variants, SOLiD Variation and Repeats pgTk1Indel TK1 indels TK1 Genome Variants indels Variation and Repeats pgTk1 TK1 TK1 Genome Variants (all SNPs, 16x exome) Variation and Repeats pgMd8Indel MD8 indels MD8 Genome Variants indels Variation and Repeats pgMd8 MD8 MD8 Genome Variants (all SNPs, 16x exome) Variation and Repeats pgNb1Indel NB1 indels NB1 Genome Variants indels Variation and Repeats pgNb1 NB1 NB1 Genome Variants (all SNPs, 2X genome plus 16x exome) Variation and Repeats pgKb1Indel KB1 indels KB1 indels from 454 and Illumina Variation and Repeats pgKb1Illum KB1 Illumina KB1 Genome Variants, Illumina 23.2X Variation and Repeats pgKb1454 KB1 454 KB1 Genome Variants, 454 Variation and Repeats pgKb1Comb KB1 KB1 Genome Variants, combination of 454, Illumina, and genotyping Variation and Repeats pgSnp1kG 1000 Genomes March 2010 Personal Genome Variants Variation and Repeats pgNA19239 YRI father '9239 YRI Trio Father NA19239 (1000 Genomes Project) Variation and Repeats pgNA19238 YRI mother '9238 YRI Trio Mother NA19238 (1000 Genomes Project) Variation and Repeats pgNA19240 YRI daught '9240 YRI Trio Daughter NA19240 (1000 Genomes Project) Variation and Repeats pgNA12892 CEU mother '2892 CEU Trio Mother NA12892 (1000 Genomes Project) Variation and Repeats pgNA12891 CEU father '2891 CEU Trio Father NA12891 (1000 Genomes Project) Variation and Repeats pgNA12878 CEU daught '2878 CEU Trio Daughter NA12878 (1000 Genomes Project) Variation and Repeats genscan Genscan Genes Genscan Gene Predictions Genes and Gene Predictions Description This track shows predictions from the Genscan program written by Chris Burge. The predictions are based on transcriptional, translational and donor/acceptor splicing signals as well as the length and compositional distributions of exons, introns and intergenic regions. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The track description page offers the following filter and configuration options: Color track by codons: Select the genomic codons option to color and label each codon in a zoomed-in display to facilitate validation and comparison of gene predictions. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods For a description of the Genscan program and the model that underlies it, refer to Burge and Karlin (1997) in the References section below. The splice site models used are described in more detail in Burge (1998) below. Credits Thanks to Chris Burge for providing the Genscan program. References Burge C. Modeling Dependencies in Pre-mRNA Splicing Signals. In: Salzberg S, Searls D, Kasif S, editors. Computational Methods in Molecular Biology. Amsterdam: Elsevier Science; 1998. p. 127-163. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997 Apr 25;268(1):78-94. PMID: 9149143 wgEncodeGisChipPetAll GIS ChIP-PET GIS ChIP-PET Regulation Description This track shows binding sites for p53, STAT1, c-Myc, histone modifications H3K4me3 and H3K27me3, as determined by chromatin immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. The data for STAT1 are restricted to the ENCODE regions, but the p53, c-Myc, H3K4me3 and H3K27me3 site data are genome-wide. The p53 protein is a transcription factor involved in the control of cell growth that is often expressed at high levels in cancer cells. STAT1 is a signal transducer and transcription factor that binds to gamma interferon activating sequence. The c-Myc (cellular myelocytomatosis) protein is a transcription factor associated with cell proliferation, differentiation, and neoplastic disease. H3K4me3 and H3K27me3 are two key histone modifications tightly associated with chromatin structures. The PET sequences in this track are derived from individual ChIP fragments as follows: FactorFragmentsCell lineTreatment p53 65,572 HCT116 6hrs 5-fluorouracil (5FU) STAT1 263,901 HeLa none STAT1 327,838 HeLa gamma interferon (gIFN) c-Myc 273,566 P493 B cell with tetracycline-repressible c-Myc transgene none H3K4me3 679,752 Embryonic stem cell hes3 none H3K27me3 992,509 Embryonic stem cell hes3 none Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from ES Cell International. These cells were serially cultured according to protocols established previously (Choo, 2006). In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2 on Matrigel-coated organ culture dishes supplemented with conditioned media from mouse feeders, DE-MEF. For the STAT1 experiments, a total of 4,007 of the PETs from the stimulated cells and 3,180 PETs from unstimulated cells were mapped to the ENCODE regions. The data from the unstimulated cells were used as the negative control. Only STAT1 PETs mapped to the ENCODE regions are shown in this track. Display Conventions and Configuration In the graphical display, PET sequences are shown as two blocks, representing the ends of the pair, connected by a thin arrowed line. Overlapping PET clusters (PET fragments that overlap one another) originating from the ChIP enrichment process define the genomic loci that are potential transcription factor binding sites (TFBSs). PET singletons, from non-specific ChIP fragments that did not cluster, are not shown. In full and packed display modes, the arrowheads on the horizontal line represent the orientation of the PET sequence, and an ID of the format XXXXX-M is shown to the left of each PET, where X is the unique ID for each PET and M is the number of PET sequences at this location. The track coloring reflects the value of M: light gray indicates one or two sequences (score = 333), dark gray is used for three sequences (score = 800) and black indicates four or more PET sequences (score = 1000) at the location. Methods The cross-linked chromatin was sheared and precipitated with a high affinity antibody. The DNA fragments were end-polished and cloned into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for sequencing, where each sequence read can generate 10-15 PETs. The PET sequences were extracted from raw sequence reads and mapped to the genome, defining the boundaries of each ChIP DNA fragment. The following specific mapping criteria were used: both 5' and 3' signatures must be present on the same chromosome their 5' to 3' orientation must be correct a minimal 17 bp match must exist for each 18 bp 5' and 3' signature the tags must have genomic alignments within 7 Kb of each other Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to ambiguities at the PET signature boundaries, a minimal 17 bp match was set for each 18 bp signature. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. Only PETs with specific mapping (one location) to the genome were considered. PETs that mapped to multiple locations may represent low complexity or repetitive sequences, and therefore were not included for further analysis. Verification Statistical and experimental verification exercises have shown that the overlapping PET clusters result from ChIP enrichment events. P53 HCT116 Monte Carlo simulation using the p53 ChIP-PET data estimated that about 27% of PET-2 clusters (PET clusters with two overlapping members), 3% of the PET clusters with 3 overlapping members (PET-3 clusters), and less than 0.0001% of PET clusters with more than 3 overlapping members were due to random chance. This suggests that the PET clusters most likely represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 36% of the PET-2 clusters and over 99% of the PET-3+ clusters (clusters with three or more overlapping members) are true enrichment ChIP sites. Thus, the verification rate is nearly 100% for PET-3+ ChIP clusters, and the PET-2 clusters contain significant noise. In addition to these statistical analyses, 40 genomic locations identified by PET-3+ clusters were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 40 regions (100%) were confirmed to have significant enrichment of p53 ChIP clusters. STAT1 HeLa Monte Carlo simulation using the STAT1 ChIP-PET data from interferon gamma-stimulated dataset estimated that random chance accounted for about 58% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4 clusters), and less than 0.5% of PET clusters with more than 5 overlapping members. This suggests that the PET-5+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 30% of the PET-4 clusters and over 90% of the PET-5+ clusters (clusters with five or more overlapping members) are true enrichment ChIP sites. In addition to these statistical analyses, 9 out of 14 genomic locations (64%) identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip data from Yale using the same ChIP DNA as hybridization material. c-Myc P493 Monte Carlo simulation using the c-Myc ChIP-PET data estimated that about 32% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster) and 4% of the PET clusters with 4 or more overlapping members (PET-4+ clusters) were due to random chance. This suggests that ~ 70% of PET-3+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. In addition to these statistical analyses, 29 genomic locations identified by PET-3+ clusters and 19 genomic locations defined by PET-2 clusters were randomly selected and subjected for quantitative real-time PCR analyses. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 29 PET-3+ regions (100%) and 19 PET-2 regions (47%) were confirmed significant enrichment of c-Myc ChIP, indicating that all of the PET-3+ and 47% of the PET-2 clusters defined regions are true c-Myc bound targets. H3K4me3 and H3K27me3 hES3 Monte Carlo simulation on these two datasets estimated that about 24% of PET-5 clusters (PET clusters with five overlapping members), 6% of the PET clusters with 6 overlapping members (PET-6 clusters), and less than 2% of PET clusters with more than 5 overlapping members were due to random chance. Thus, in conclusion the majority (98.7%) of overlapping PET-5+ clusters indeed represent the true enrichments from ChIP processes rather than random events. Therefore, PET clusters size 5 and above are reliable readouts for H3K4me3 and H3K27me3 modification regions based on Monte Carlo simulation. In addition to these statistical analyses, 30 genomic locations identified by PET-5+ clusters from each dataset were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 30 regions (100%) were confirmed to have significant enrichment (10 fold and more). 9 out of 10 clusters from PET-4 and PET-3 clusters are enriched with 10 fold and above compared with control Ena1 ChIP DNA. Credits The ChIP-PET library and sequence data were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore, the Bioinformatics Institute, Singapore, and Boston University. The STAT1 ChIP fragment prep was provided by Ghia Euskirchen from the Snyder lab at Yale. The c-Myc ChIP fragment prep was provided by Karen Zeller from the Dang lab at Johns Hopkins University. References Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11. Wei CL, Wu Q, Vega VB, Chiu KP, Ng P, Zhang T, Shahab A, Yong HC, Fu Y, Weng Z et al. A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome. Cell. 2006 Jan 13;124(1):207-19. Chiu KP, Wong CH, Chen Q, Ariyaratne P, Ooi HS, Wei CL, Sung WK, Ruan Y. PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data. BMC Bioinformatics. 2006 Aug 25;7:390. Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW. Immortalized feeders for the scale-up of human embryonic stem cells in feeder and feeder-free conditions. J Biotechnol. 2006 Mar 9;122(1):130-41. Zeller KI, Zhao X, Lee CW, Chiu KP, Yao F, Yustein JT, Ooi HS, Orlov YL, Shahab A, Yong HC et al. Global mapping of c-Myc binding sites and target gene networks in human B cells. Proc Natl Acad Sci U S A. 2006 Nov 21;103(47):17834-9. wgEncodeGisChipPetHes3H3K27me3 H3K27me3 hES3 GIS ChIP-PET: H3K27me3 Ab on ES hes-3 cells Regulation wgEncodeGisChipPetHes3H3K4me3 H3K4me3 hES3 GIS ChIP-PET: H3K4me3 Ab on ES hes-3 cells Regulation wgEncodeGisChipPetMycP493 cMyc P493 GIS ChIP-PET: c-Myc Ab on P493 B cells Regulation wgEncodeGisChipPetStat1NoGif STAT1 HeLa -gIF GIS ChIP-PET: STAT1 Ab on untreated HeLa cells Regulation wgEncodeGisChipPetStat1Gif STAT1 HeLa +gIF GIS ChIP-PET: STAT1 Ab on gIF treated HeLa cells Regulation wgEncodeGisChipPet p53 HCT116 +5FU GIS ChIP-PET: p53 Ab on 5FU treated HCT116 cells Regulation wgEncodeGisDnaPet GIS DNA PET ENCODE Genome Institute of Singapore DNA Paired-End Ditags Variation and Repeats Description This track is produced as part of the ENCODE Transcriptome Project. It shows the starts and ends of DNA fragments from different cell lines determined by paired-end ditag (PET) sequencing using different DNA fragment sizes for analysis of genome structural variation. Display Conventions and Configuration In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the strand, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. PETs that mapped to multiple locations may represent low complexity or repetitive sequences. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Alignments The Alignments view shows alignment of individual PET sequences. Methods Sample genomic DNA was isolated, hydrosheared at a given size-range, then ligated with specific DNA linker sequence at both ends, followed by gel-selection of the desired size, e.g., 1 kb, 10 kb, etc. respectively. The DNA fragments modified with linker at both ends (e.g., 10 kb) were then circularized by ligation, followed by restriction digest with enzyme EcoP15I to generate DNA-PETs (25-bp tag from each end). The PETs were ligated with SOLiD sequencing adaptors at both ends, then amplified by PCR and purified as complex templates for high throughput DNA sequencing. The current DNA-PET data sets submitted are mostly generated by SOLiD platform. Cells were grown according to the approved ENCODE cell culture protocols. Data: Reads of DNA-PETs were mapped onto reference genome, NCBI Build36, hg18. A majority of the PETs mapped on the same chromosome in correct orientations and within expected distance span (e.g., a 10 kb DNA-PET was expected mapping on ~10 kb span distance). A small portion of misaligned PETs, called discordant PETs, mapped either too far from each other, had wrong orientations, or in different chromosomes indicating various genome structure or variations observed between the sample and the reference genome. The variations could be due to deletion, inversion, tandem repeats, trans-location, fusion etc. Mapping parameters: Mapping was done using Applied Biosystems' SOLiD alignment and pairing pipeline. Initial mapping was done allowing two mismatches in color space and recovery was performed during pairing that allowed up to 4 mismatches in a pair. Verification Representative structural variations identified by DNA-PET data have been verified by targeted PCR and sequencing analysis to confirm the predicted rearrangement sites. Some of them have also been validated by FISH. Credits The GIS-DNA PET libraries and sequence data for genome structural variation analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists Xiaoan Ruan, Atif Shahab, Chialin Wei, and Yijun Ruan at the Genome Institute of Singapore. Contact: Yijun Ruan Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeGisDnaPetViewAlignments Alignments ENCODE Genome Institute of Singapore DNA Paired-End Ditags Variation and Repeats wgEncodeGisDnaPetAlignmentsK56220k K562 20k K562 DnaPet ENCODE June 2010 Freeze 2010-04-05 2011-01-05 247 Gingeras GIS wgEncodeGisDnaPetAlignmentsK56220k Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (20k frags in K562 cells) Variation and Repeats wgEncodeGisDnaPetAlignmentsK56210k K562 10k K562 DnaPet ENCODE Feb 2009 Freeze 2009-03-11 2009-12-11 243 Gingeras GIS wgEncodeGisDnaPetAlignmentsK56210k Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (10k frags in K562 cells) Variation and Repeats wgEncodeGisDnaPetAlignmentsK5621k K562 1k K562 DnaPet ENCODE Feb 2009 Freeze 2009-03-11 2009-12-11 244 Gingeras GIS wgEncodeGisDnaPetAlignmentsK5621k Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (1k frags in K562 cells) Variation and Repeats wgEncodeGisDnaPetAlignmentsGm1287810k GM12878 10k GM12878 DnaPet ENCODE Feb 2009 Freeze 2009-03-11 2009-12-11 242 Gingeras GIS wgEncodeGisDnaPetAlignmentsGm1287810k Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (10k frags in GM12878 cells) Variation and Repeats wgEncodeGisDnaPetAlignmentsGm128785k GM12878 5k GM12878 DnaPet ENCODE Feb 2009 Freeze 2009-04-03 2010-01-03 246 Gingeras GIS wgEncodeGisDnaPetAlignmentsGm128785k Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (5k frags in GM12878 cells) Variation and Repeats wgEncodeGisDnaPetAlignmentsGm128781k GM12878 1k GM12878 DnaPet ENCODE Feb 2009 Freeze 2009-04-03 2010-01-03 245 Gingeras GIS wgEncodeGisDnaPetAlignmentsGm128781k Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS DNA PET Alignments (1k frags in GM12878 cells) Variation and Repeats wgEncodeGisPet GIS PET Loc ENCODE GIS Subcellular RNA Localization by Paired End diTag Sequencing Expression Description This track is produced as part of the ENCODE Transcriptome Project. It shows the starts and ends of full length mRNA transcripts determined by GIS paired-end ditag (PET) sequencing using RNA extracts from different sub-cellular localizations in different cell lines. The RNA-PET information provided in this track is composed of two different PET length versions based on how the PETs were extracted. The cloning-based PET (18 bp and 16 bp) is an earlier version and detailed information can be found from reference (Ng et al. 2006). The cloning-free PET (25 bp and 25 bp) is a recently modified version which uses Type II enzyme EcoP15I to generate a longer length of PET (unpublished), which results in a significant enhancement in both library construction and mapping efficiency. Both versions of PET templates were sequenced by Solexa platform at 2 x 36 bp Paired End sequencing. See the Methods and References sections below for more details. Display Conventions and Configuration In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the direction of transcription, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. PETs that mapped to multiple locations may represent low complexity or repetitive sequences. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments. Plus Raw Signal The Plus Raw Signal view graphs the base-by-base density of tags on the + strand. Minus Raw Signal The Minus Raw Signal view graphs the base-by-base density of tags on the - strand. Alignments The Alignments view shows alignment of individual PET sequences. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. Clusters The Clusters view shows clusters built from the alignments. Methods Cells were grown according to the approved ENCODE cell culture protocols. Two different GIS RNA-PET protocols were used to generate the full length transcriptome PETs: one is based on a cloning-free RNA-PET library construction and sequencing strategy (unpublished), and the other is a cloning-based library construction (Ng et al. 2005) and recent Solexa paired end sequencing. Cloning-free RNA-PET (50 bp reads, 25 bp and 25 bp tag for each of the 5' and 3' ends) Method: The cloning-free RNA-PET libraries were generated from polyA mRNA samples and constructed using a recently modified GIS protocol (unpublished). Total RNA in good quality was used as starting material and purified through MACs polyT column to obtain full length polyA mRNAs. Approximately 5 micrograms of enriched polyA mRNA were used for reverse transcription to convert polyA mRNA to full length cDNA. The obtained full length cDNA was modified and ligated with specific linker sequences, followed by circularization through ligation to generate circular cDNA molecules. The 25 bp tag from each end of the full length cDNA was extracted by type II enzyme EcoP15I digestion. The resulting PETs were ligated with sequencing adaptors at the both ends, amplified by PCR, and further purified as complex templates for paired end (PE) sequencing using Solexa or SOLiD platforms. Most data displayed in this track are sequenced using Solexa. Data: Data: The sequenced RNA-PETs are unified in 25/25 bp length from each end of a cDNA. After filtering out redundant and noise tags, the unique PETs will proceed to analysis pipeline. Initially, the orientation of each tag will be screened out by the barcode built in the sequencing-template, then paired into a given orientation-PET. The orientation-determined RNA-PET is mapped onto reference genome allowing up to two mismatches. Majority of PETs are mapped on the known transcripts, or splice variants. A small portion of misaligned PETs, defined as discordant PETs, are mapped either too far from each tag, have wrong orientations, or mapped in different chromosomes, indicating exist some transcription variations which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc. Cloning-based RNA-PET (34 bp reads, 18 bp and 16 bp tag for each of the 5' and 3' ends) Method: The cloning-based RNA-PET (GIS-PET) libraries were generated from polyA RNA samples and constructed using the protocol described by Ng et al. (2005). Total RNA in good quality was used as starting material and further purified through MACs polyT column to enrich polyA mRNA and remove any contaminants (e.g., rRNA, tRNA, DNA, protein etc). Approximately 10 micrograms of polyA mRNA were then used for reverse transcription to convert polyA mRNA into full length cDNA. The obtained full length cDNA was modified with specific linker sequences, then, ligated to a GIS-developed (pGIS4) vector to form a complex full length cDNA library, which was cloned into E. coli. The plasmid DNA was then isolated from the library, followed by MmeI (a type II enzyme) digestion to generate a final length of 18 bp/16 bp ditags from each end of the full length cDNA. The single ditag (or called PET) was then ligated to form a diPET structure (a concatemer with two unrelated PET linked by a linker sequence) to facilitate Solexa Paired End sequencing. Data: The cloning-based RNA-PETs are unified in 18 bp and16 bp length, respectively extracted from 5' and 3' end of each cDNA. The redundant reads were filtered out initially and unique ones were included for analysis. PET sequences were then mapped to (hg18) reference genome using the following specific criteria (Ruan et al. 2007): A minimal continuous 16 bp match must exist for the 5' signature; the 3' signature must have a minimal continuous 14 bp match Both 5' and 3' signatures must be present on the same chromosome Their 5' to 3' orientation must be correct (5' signature followed by 3' signature) The maximal genomic span of a PET genomic alignment must be less than one million bp PETs mapping to 2-10 locations are also included and may represent duplicated genes or pseudogenes in the genome. A majority of PETs mapped on the known transcripts or splice variants. A small portion of misaligned PETs, defined as discordant PETs, were mapped either too far from each other, mapped in the wrong orientation, or mapped to different chromosomes, indicating that some transcription variations exist which could be caused by genome structure variations: such as fusion, deletion, insertion, inversion, tandem repeat and translocation; or RNA trans-splicing etc. Clusters To cluster the PETs the following procedure was applied: the mapping location of the 5' and 3' tag of a given PET was extended by 100 bp in both directions creating 5' and 3' search windows. If the 5' and 3' tags of a second PET mapped within the 5' and 3' search window of the first PET then the two PETs were clustered and the search windows were adjusted so that they contained the tag extensions of the second PET. PETs which subsequently maped with their 5' and 3' tags within the adjusted 5' and 3' search window, respectively, were also assigned to this cluster and search window readjusted. This iterative process continued till no new PET was found to fall within the search window, at which stage all the found PETs are classified as belonging to a single cluster. This process is repeated till all PETs are assigned to a cluster. Verification To assess overall PET quality and mapping specificity, the top ten most abundant PET clusters that mapped to well-characterized known genes were examined. Over 99% of the PETs represented full-length transcripts, and the majority fell within 10 bp of the known 5' and 3' boundaries of these transcripts. The PET mapping was further verified by confirming the existence of physical cDNA clones represented by the ditags. PCR primers were designed based on the PET sequences and amplified the corresponding cDNA inserts either from full length cDNA library (cloning-based PET) or from total RNA isolate (cloning-free PET) for sequencing confirmation. Credits The GIS RNA-PET libraries and sequence data for transcriptome analysis were generated and analyzed by scientists Xiaoan Ruan, Atif Shahab, Chialin Wei, and Yijun Ruan at the Genome Institute of Singapore. Contact: Yijun Ruan References Ng P, Tan JJ, Ooi HS, Lee YL, Chiu KP, Fullwood MJ, Srinivasan KG, Perbost C, Du L, Sung WK, et al., Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res. 2006;34:e84. Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, et al., Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005;2:105-111. Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan KG, Yao F, Choo CY, Liu J, Ariyaratne P, et al., Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res. 2007;17:828-838. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeGisPetView0PlusRawSignal Plus Raw Signal ENCODE GIS Subcellular RNA Localization by Paired End diTag Sequencing Expression wgEncodeGisPetPlusRawSigRep1ProstateCellLongpolya pros cell A+ +S1 prostate RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 262 Gingeras GIS cell 1 longPolyA wgEncodeGisPetPlusRawSigRep1ProstateCellLongpolya PlusRawSignal prostate tissue purchased for CSHL project RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in Prostate cell) Expression wgEncodeGisPetPlusRawSigRep1NhekNucleusLongpolya NHEK nucl A+ +S1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 261 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetPlusRawSigRep1NhekNucleusLongpolya PlusRawSignal epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in NHEK nucleus) Expression wgEncodeGisPetPlusRawSigRep1NhekCytosolLongpolya NHEK cyto A+ +S1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 260 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetPlusRawSigRep1NhekCytosolLongpolya PlusRawSignal epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in NHEK cytosol) Expression wgEncodeGisPetPlusRawSigRep1HuvecNucleusLongpolya HUVE nucl A+ +S1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 251 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetPlusRawSigRep1HuvecNucleusLongpolya PlusRawSignal umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HUVEC nucleus) Expression wgEncodeGisPetPlusRawSignalRep1HuvecCytosolLongpolya HUVE cyto A+ +S1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 266 Gingeras GIS cytosol longPolyA SAT2G_version_2.0 wgEncodeGisPetPlusRawSignalRep1HuvecCytosolLongpolya PlusRawSignal umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HUVEC cytosol) Expression wgEncodeGisPetPlusRawSigRep1Hepg2NucleusLongpolya HepG nucl A+ +S1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 253 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetPlusRawSigRep1Hepg2NucleusLongpolya PlusRawSignal hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HepG2 nucleus) Expression wgEncodeGisPetPlusRawSigRep1Hepg2CytosolLongpolya HepG cyto A+ +S1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 252 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetPlusRawSigRep1Hepg2CytosolLongpolya PlusRawSignal hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HepG2 cytosol) Expression wgEncodeGisPetPlusRawSignalRep1Helas3NucleusLongpolya HeS3 nucl A+ +S1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 267 Gingeras GIS nucleus longPolyA wgEncodeGisPetPlusRawSignalHelas3NucleusLongpolya PlusRawSignal cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HeLa-S3 nucleus) Expression wgEncodeGisPetPlusRawSignalRep1Helas3CytosolLongpolya HeS3 cyto A+ +S1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-05 2010-09-05 268 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetPlusRawSignalRep1Helas3CytosolLongpolya PlusRawSignal cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in HeLa-S3 cytosol) Expression wgEncodeGisPetPlusRawSigRep1K562NucleolusTotal K562 nlos to +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 256 Gingeras GIS nucleolus 1 total wgEncodeGisPetPlusRawSigRep1K562NucleolusTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (total RNA in K562 nucleolus) Expression wgEncodeGisPetPlusRawSigRep1K562ChromatinTotal K562 chrm to +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 254 Gingeras GIS chromatin 1 total wgEncodeGisPetPlusRawSigRep1K562ChromatinTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (total RNA in K562 chromatin) Expression wgEncodeGisPetPlusRawSigRep1K562NucleoplasmTotal K562 npls to +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 257 Gingeras GIS nucleoplasm 1 total wgEncodeGisPetPlusRawSigRep1K562NucleoplasmTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (total RNA in K562 nucleoplasm) Expression wgEncodeGisPetPlusRawSigRep1K562NucleusLongpolya K562 nucl A+ +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 258 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetPlusRawSigRep1K562NucleusLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 nucleus) Expression wgEncodeGisPetPlusRawSigRep2K562CytosolLongpolya K562 cyto A+ +S2 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetPlusRawSigRep2K562CytosolLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetPlusRawSigRep1K562CytosolLongpolya K562 cyto A+ +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetPlusRawSigRep1K562CytosolLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetPlusRawSigRep1K562PolysomeLongpolya K562 psom A+ +S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 259 Gingeras GIS polysome 1 longPolyA wgEncodeGisPetPlusRawSigRep1K562PolysomeLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Strand of mRNA with ribosomes attached Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 polysome) Expression wgEncodeGisPetPlusRawSigRep1H1hescCellLongpolya hESC cell A+ +S1 H1-hESC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 250 Gingeras GIS cell 1 longPolyA wgEncodeGisPetPlusRawSigRep1H1hescCellLongpolya PlusRawSignal embryonic stem cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisPetPlusRawSigRep1Gm12878NucleusLongpolya GM12 nucl A+ +S1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 249 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetPlusRawSigRep1Gm12878NucleusLongpolya PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 nucleus) Expression wgEncodeGisPetPlusRawSigRep2Gm12878CytosolLongpolya GM12 cyto A+ -S2 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetPlusRawSigRep2Gm12878CytosolLongpolya PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 2 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetPlusRawSigRep1Gm12878CytosolLongpolya GM12 cyto A+ +S1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetPlusRawSigRep1Gm12878CytosolLongpolya PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS PET Plus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetView1MinusRawSignal Minus Raw Signal ENCODE GIS Subcellular RNA Localization by Paired End diTag Sequencing Expression wgEncodeGisPetMinusRawSigRep1ProstateCellLongpolya pros cell A+ -S1 prostate RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 262 Gingeras GIS cell 1 longPolyA wgEncodeGisPetMinusRawSigRep1ProstateCellLongpolya MinusRawSignal prostate tissue purchased for CSHL project RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in Prostate cell) Expression wgEncodeGisPetMinusRawSigRep1NhekNucleusLongpolya NHEK nucl A+ -S1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 261 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetMinusRawSigRep1NhekNucleusLongpolya MinusRawSignal epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in NHEK nucleus) Expression wgEncodeGisPetMinusRawSigRep1NhekCytosolLongpolya NHEK cyto A+ -S1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 260 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetMinusRawSigRep1NhekCytosolLongpolya MinusRawSignal epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in NHEK cytosol) Expression wgEncodeGisPetMinusRawSigRep1HuvecNucleusLongpolya HUVE nucl A+ -S1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 251 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetMinusRawSigRep1HuvecNucleusLongpolya MinusRawSignal umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HUVEC nucleus) Expression wgEncodeGisPetMinusRawSignalRep1HuvecCytosolLongpolya HUVE cyto A+ -S1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 266 Gingeras GIS cytosol longPolyA SAT2G_version_2.0 wgEncodeGisPetMinusRawSignalRep1HuvecCytosolLongpolya MinusRawSignal umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HUVEC cytosol) Expression wgEncodeGisPetMinusRawSigRep1Hepg2NucleusLongpolya HepG nucl A+ -S1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 253 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetMinusRawSigRep1Hepg2NucleusLongpolya MinusRawSignal hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HepG2 nucleus) Expression wgEncodeGisPetMinusRawSigRep1Hepg2CytosolLongpolya HepG cyto A+ -S1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 252 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetMinusRawSigRep1Hepg2CytosolLongpolya MinusRawSignal hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HepG2 cytosol) Expression wgEncodeGisPetMinusRawSignalRep1Helas3NucleusLongpolya HeS3 nucl A+ -S1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 267 Gingeras GIS nucleus longPolyA wgEncodeGisPetMinusRawSignalRep1Helas3NucleusLongpolya MinusRawSignal cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HeLa-S3 nucleus) Expression wgEncodeGisPetMinusRawSignalRep1Helas3CytosolLongpolya HeS3 cyto A+ -S1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-05 2010-09-05 268 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetMinusRawSignalRep1Helas3CytosolLongpolya MinusRawSignal cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in HeLa-S3 cytosol) Expression wgEncodeGisPetMinusRawSigRep1K562NucleolusTotal K562 nlos to -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 256 Gingeras GIS nucleolus 1 total wgEncodeGisPetMinusRawSigRep1K562NucleolusTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (total RNA in K562 nucleolus) Expression wgEncodeGisPetMinusRawSigRep1K562ChromatinTotal K562 chrm to -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 254 Gingeras GIS chromatin 1 total wgEncodeGisPetMinusRawSigRep1K562ChromatinTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (total RNA in K562 chromatin) Expression wgEncodeGisPetMinusRawSigRep1K562NucleoplasmTotal K562 npls to -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 257 Gingeras GIS nucleoplasm 1 total wgEncodeGisPetMinusRawSigRep1K562NucleoplasmTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (total RNA in K562 nucleoplasm) Expression wgEncodeGisPetMinusRawSigRep1K562NucleusLongpolya K562 nucl A+ -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 258 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetMinusRawSigRep1K562NucleusLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 nucleus) Expression wgEncodeGisPetMinusRawSigRep2K562CytosolLongpolya K562 cyto A+ -S2 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetMinusRawSigRep2K562CytosolLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetMinusRawSigRep1K562CytosolLongpolya K562 cyto A+ -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetMinusRawSigRep1K562CytosolLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetMinusRawSigRep1K562PolysomeLongpolya K562 psom A+ -S1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 259 Gingeras GIS polysome 1 longPolyA wgEncodeGisPetMinusRawSigRep1K562PolysomeLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Strand of mRNA with ribosomes attached Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 polysome) Expression wgEncodeGisPetMinusRawSigRep1H1hescCellLongpolya hESC cell A+ -S1 H1-hESC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 250 Gingeras GIS cell 1 longPolyA wgEncodeGisPetMinusRawSigRep1H1hescCellLongpolya MinusRawSignal embryonic stem cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisPetMinusRawSigRep1Gm12878NucleusLongpolya GM12 nucl A+ -S1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 249 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetMinusRawSigRep1Gm12878NucleusLongpolya MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 nucleus) Expression wgEncodeGisPetMinusRawSigRep2Gm12878CytosolLongpolya GM12 cyto A+ -S2 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetMinusRawSigRep2Gm12878CytosolLongpolya MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 2 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetMinusRawSigRep1Gm12878CytosolLongpolya GM12 cyto A+ -S1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetMinusRawSigRep1Gm12878CytosolLongpolya MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS PET Minus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetView3Clusters Clusters ENCODE GIS Subcellular RNA Localization by Paired End diTag Sequencing Expression wgEncodeGisPetClustersRep1ProstateCellPap pros cell A+ CL1 prostate RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 262 Gingeras GIS cell 1 longPolyA wgEncodeGisPetClustersRep1ProstateCellPap Clusters prostate tissue purchased for CSHL project RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in Prostate cell) Expression wgEncodeGisPetClustersRep1NhekNucleusPap NHEK nucl A+ CL1 NHEK RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 261 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1NhekNucleusPap Clusters epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in NHEK nucleus) Expression wgEncodeGisPetClustersRep1NhekCytosolPap NHEK cyto A+ CL1 NHEK RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 260 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1NhekCytosolPap Clusters epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in NHEK cytosol) Expression wgEncodeGisPetClustersRep1HuvecNucleusPap HUVE nucl A+ CL1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 251 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1HuvecNucleusPap Clusters umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HUVEC nucleus) Expression wgEncodeGisPetClustersRep1HuvecCytosolPap HUVE cyto A+ CL1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 266 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1HuvecCytosolPap Clusters umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HUVEC cytosol) Expression wgEncodeGisPetClustersRep1Hepg2NucleusPap HepG nucl A+ CL1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 253 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1Hepg2NucleusPap Clusters hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HepG2 nucleus) Expression wgEncodeGisPetClustersRep1Hepg2CytosolPap HepG cyto A+ CL1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 252 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1Hepg2CytosolPap Clusters hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HepG2 cytosol) Expression wgEncodeGisPetClustersRep1Helas3NucleusPap HeS3 nucl A+ CL1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 267 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1Helas3NucleusPap Clusters cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HeLa-S3 nucleus) Expression wgEncodeGisPetClustersRep1Helas3CytosolPap HeS3 cyto A+ CL1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 268 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1Helas3CytosolPap Clusters cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in HeLa-S3 cytosol) Expression wgEncodeGisPetClustersRep1K562NucleolusTotal K562 nlos to CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 256 Gingeras GIS nucleolus 1 total wgEncodeGisPetClustersRep1K562NucleolusTotal Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (total RNA in K562 nucleolus) Expression wgEncodeGisPetClustersRep1K562ChromatinTotal K562 chrm to CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 254 Gingeras GIS chromatin 1 total wgEncodeGisPetClustersRep1K562ChromatinTotal Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (total RNA in K562 chromatin) Expression wgEncodeGisPetClustersRep1K562NucleoplasmTotal K562 npls to CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 257 Gingeras GIS nucleoplasm 1 total wgEncodeGisPetClustersRep1K562NucleoplasmTotal Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (total RNA in K562 nucleoplasm) Expression wgEncodeGisPetClustersRep1K562NucleusPap K562 nucl A+ CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 258 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1K562NucleusPap Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in K562 nucleus) Expression wgEncodeGisPetClustersRep2K562CytosolPap K562 cyto A+ CL2 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 255 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetClustersRep2K562CytosolPap Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetClustersRep1K562CytosolPap K562 cyto A+ CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 255 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1K562CytosolPap Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetClustersRep1K562PolysomePap K562 psom A+ CL1 K562 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 259 Gingeras GIS polysome 1 longPolyA wgEncodeGisPetClustersRep1K562PolysomePap Clusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Strand of mRNA with ribosomes attached Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in K562 polysome) Expression wgEncodeGisPetClustersRep1H1hescCellPap hESC cell A+ CL1 H1-hESC RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 250 Gingeras GIS cell 1 longPolyA wgEncodeGisPetClustersRep1H1hescCellPap Clusters embryonic stem cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisPetClustersRep1Gm12878NucleusPap GM12 nucl A+ CL1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 249 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetClustersRep1Gm12878NucleusPap Clusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in GM12878 nucleus) Expression wgEncodeGisPetClustersRep2Gm12878CytosolPap GM12 cyto A+ CL2 GM12878 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 248 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetClustersRep2Gm12878CytosolPap Clusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 2 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetClustersRep1Gm12878CytosolPap GM12 cyto A+ CL1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2010-03-08 2010-12-08 248 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetClustersRep1Gm12878CytosolPap Clusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Clusters built from the alignments ENCODE GIS PET Clusters Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetView2Alignments Alignments ENCODE GIS Subcellular RNA Localization by Paired End diTag Sequencing Expression wgEncodeGisPetAlignmentsRep1ProstateCellLongpolya pros cell A+ AL1 prostate RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 262 Gingeras GIS cell 1 longPolyA wgEncodeGisPetAlignmentsRep1ProstateCellLongpolya Alignments prostate tissue purchased for CSHL project RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in Prostate cell) Expression wgEncodeGisPetAlignmentsRep1NhekNucleusLongpolya NHEK nucl A+ AL1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 261 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetAlignmentsRep1NhekNucleusLongpolya Alignments epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in NHEK nucleus) Expression wgEncodeGisPetAlignmentsRep1NhekCytosolLongpolya NHEK cyto A+ AL1 NHEK RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 260 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetAlignmentsRep1NhekCytosolLongpolya Alignments epidermal keratinocytes RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in NHEK cytosol) Expression wgEncodeGisPetAlignmentsRep1HuvecNucleusLongpolya HUVE nucl A+ AL1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 251 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetAlignmentsRep1HuvecNucleusLongpolya Alignments umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HUVEC nucleus) Expression wgEncodeGisPetAlignmentsRep1HuvecCytosolLongpolya HUVE cyto A+ AL1 HUVEC RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 266 Gingeras GIS cytosol longPolyA SAT2G_version_2.0 wgEncodeGisPetAlignmentsRep1HuvecCytosolLongpolya Alignments umbilical vein endothelial cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HUVEC cytosol) Expression wgEncodeGisPetAlignmentsRep1Hepg2NucleusLongpolya HepG nucl A+ AL1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 253 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetAlignmentsRep1Hepg2NucleusLongpolya Alignments hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HepG2 nucleus) Expression wgEncodeGisPetAlignmentsRep1Hepg2CytosolLongpolya HepG cyto A+ AL1 HepG2 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 252 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetAlignmentsRep1Hepg2CytosolLongpolya Alignments hepatocellular carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HepG2 cytosol) Expression wgEncodeGisPetAlignmentsRep1Helas3NucleusLongpolya HeS3 nucl A+ AL1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-04 2010-09-04 267 Gingeras GIS nucleus longPolyA wgEncodeGisPetAlignmentsRep1Helas3NucleusLongpolya Alignments cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HeLa-S3 nucleus) Expression wgEncodeGisPetAlignmentsRep1Helas3CytosolLongpolya HeS3 cyto A+ AL1 HeLa-S3 RnaPet ENCODE Jan 2010 Freeze 2009-12-05 2010-09-05 268 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetAlignmentsRep1Helas3CytosolLongpolya Alignments cervical carcinoma RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in HeLa-S3 cytosol) Expression wgEncodeGisPetAlignmentsRep1K562NucleolusTotal K562 nlos to AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 256 Gingeras GIS nucleolus 1 total wgEncodeGisPetAlignmentsRep1K562NucleolusTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (total RNA in K562 nucleolus) Expression wgEncodeGisPetAlignmentsRep1K562ChromatinTotal K562 chrm to AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 254 Gingeras GIS chromatin 1 total wgEncodeGisPetAlignmentsRep1K562ChromatinTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (total RNA in K562 chromatin) Expression wgEncodeGisPetAlignmentsRep1K562NucleoplasmTotal K562 npls to AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 257 Gingeras GIS nucleoplasm 1 total wgEncodeGisPetAlignmentsRep1K562NucleoplasmTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (total RNA in K562 nucleoplasm) Expression wgEncodeGisPetAlignmentsRep1K562NucleusLongpolya K562 nucl A+ AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 258 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetAlignmentsRep1K562NucleusLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in K562 nucleus) Expression wgEncodeGisPetAlignmentsRep2K562CytosolLongpolya K562 cyto A+ AL2 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetAlignmentsRep2K562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetAlignmentsRep1K562CytosolLongpolya K562 cyto A+ AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 255 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetAlignmentsRep1K562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisPetAlignmentsRep1K562PolysomeLongpolya K562 psom A+ AL1 K562 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 259 Gingeras GIS polysome 1 longPolyA wgEncodeGisPetAlignmentsRep1K562PolysomeLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Strand of mRNA with ribosomes attached Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in K562 polysome) Expression wgEncodeGisPetAlignmentsRep1H1hescCellLongpolya hESC cell A+ AL1 H1-hESC RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 250 Gingeras GIS cell 1 longPolyA wgEncodeGisPetAlignmentsRep1H1hescCellLongpolya Alignments embryonic stem cells RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisPetAlignmentsRep1Gm12878NucleusLongpolya GM12 nucl A+ AL1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 249 Gingeras GIS nucleus 1 longPolyA wgEncodeGisPetAlignmentsRep1Gm12878NucleusLongpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in GM12878 nucleus) Expression wgEncodeGisPetAlignmentsRep2Gm12878CytosolLongpolya GM12 cyto A+ AL2 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 2 longPolyA wgEncodeGisPetAlignmentsRep2Gm12878CytosolLongpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 2 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisPetAlignmentsRep1Gm12878CytosolLongpolya GM12 cyto A+ AL1 GM12878 RnaPet ENCODE Jan 2010 Freeze 2009-11-13 2010-08-13 248 Gingeras GIS cytosol 1 longPolyA wgEncodeGisPetAlignmentsRep1Gm12878CytosolLongpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA Paired-End Tags Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS PET Alignments Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisRnaPet GIS PET RNA Gene Identification Signature Paired-End Tags of PolyA+ RNA Expression Description This track shows the starts and ends of mRNA transcripts determined by paired-end ditag (PET) sequencing. PETs are composed of 18 bases from either end of a cDNA; 36 bp PETs from many clones were concatenated together and cloned into pZero-1 for efficient sequencing. See the Methods and References sections below for more details on PET sequencing. The PET sequences in this track are full-length transcripts derived from two cell lines with differing treatments: the log phase of MCF7 cells MCF7 cells treated with estrogen (10nM beta-estradiol) for 12 hours HCT116 cells treated with 5FU (5-fluorouracil) for 6 hours Log phase of embryonic stem cell hES3 in feeder free culture condition In total, 584,624 PETs were generated for the log phase MCF7 cells, 153,179 PETs were generated for the estrogen-treated MCF7 cells, 280,340 PETs were generated for the HCT116 cells, and 1,799,970 PETs were generated from the hES3 cells. More than 80% of the PETs in the HCT116 and log phase MCF7 cells were mapped to the genome. The 474,278 log phase MCF7 PETs and 223,261 HCT116 PETs that mapped with single and multiple (up to ten) matches in the genome are shown in the two subtracks. For the estrogen-treated MCF7 cells, only those PETs mapped to the ENCODE regions with the above match criteria (4881 total) are displayed. Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from ES Cell International. These cells were serially cultured according to protocols established previously (Choo, 2006). In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2 on Matrigel-coated organ culture dishes supplemented with conditioned media from mouse feeders, DE-MEF. In the graphical display, the ends are represented by blocks connected by a horizontal line. In full and packed display modes, the arrowheads on the horizontal line represent the direction of transcription, and an ID of the format XXXXX-N-M is shown to the left of each PET, where X is the unique ID for each PET, N indicates the number of mapping locations in the genome (1 for a single mapping location, 2 for two mapping locations, and so forth), and M is the number of PET sequences at this location. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. PETs that mapped to multiple locations may represent low complexity or repetitive sequences. The graphical display also uses color coding to reflect the uniqueness and expression level of each PET: ColorMappingPETS observed at location dark blueunique2 or more light blueunique1 medium brownmultiple2 or more light brownmultiple1 Methods PolyA+ RNA was isolated from the cells. A full-length cDNA library was constructed and converted into a PET library for Gene Identification Signature analysis (Ng et al., 2005). Generation of PET sequences involved cloning of cDNA sequences into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET. Each 36 bp PET sequence contains 18 bp from each of the 5' and 3' ends of the original full-length cDNA clone. The 18 bp 3' signature contains 16 bp 3'-specific nucleotides and an AA residual of the polyA tail to indicate the sequence orientation. PET sequences were mapped to the genome using the following specific criteria: a minimal continuous 16 bp match must exist for the 5' signature; the 3' signature must have a minimal continuous 14 bp match both 5' and 3' signatures must be present on the same chromosome their 5' to 3' orientation must be correct the maximal genomic span of a PET genomic alignment must be less than one million bp Most of the PET sequences (more than 90%) were mapped to specific locations (single mapping loci). PETs mapping to 2 - 10 locations are also included and may represent duplicated genes or pseudogenes in the genome. Verification To assess overall PET quality and mapping specificity, the top ten most abundant PET clusters that mapped to well-characterized known genes were examined. Over 99% of the PETs represented full-length transcripts, and the majority fell within ten bp of the known 5' and 3' boundaries of these transcripts. The PET mapping was further verified by confirming the existence of physical cDNA clones represented by the ditags. PCR primers were designed based on the PET sequences and amplified the corresponding cDNA inserts from the parental GIS flcDNA library for sequencing analysis. In a set of 86 arbitrarily-selected PETs representing a wide range of annotation categories — including known genes (38 PETs), predicted genes (2 PETs), and novel transcripts (46 PETs) — 84 (97.7%) confirmed the existence of bona fide transcripts. Credits The GIS-PET libraries and sequence data for transcriptome analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore and the Bioinformatics Institute of Singapore. References Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW. Immortalized feeders for the scale-up of human embryonic stem cells in feeder and feeder-free conditions. J Biotechnol. 2006 Mar 9;122(1):130-41. Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11. wgEncodeGisRnaPetHes3 GIS RNA hES3 Gene Identification Signature Paired-End Tags of PolyA+ RNA (embryonic stem cell hES3) Expression wgEncodeGisRnaPetHCT116 GIS RNA HCT116 Gene Identification Signature Paired-End Tags of PolyA+ RNA (5FU-stim HCT116) Expression wgEncodeGisRnaPetMCF7Estr GIS RNA MCF7 Est Gene Identification Signature Paired-End Tags of PolyA+ RNA (estrogen-stim MCF7) Expression wgEncodeGisRnaPetMCF7 GIS RNA MCF7 Gene Identification Signature Paired-End Tags of PolyA+ RNA (log phase MCF7) Expression wgEncodeGisRnaSeq GIS RNA-seq ENCODE Genome Institute of Singapore RNA-seq Expression Description This track is produced as part of the ENCODE Transcriptome Project. It shows high throughput sequencing of RNA samples from tissues or sub cellular compartments from cell lines included in the ENCODE Transcriptome subproject. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types. Plus Raw Signal The Plus Raw Signal view graphs the base-by-base density of alignments on the + strand. Minus Raw Signal The Minus Raw Signal view graphs the base-by-base density of alignments on the - strand. All Raw Signal The All Raw Signal view graphs the base-by-base density of alignments on both strands. Alignments The Alignments view shows reads mapped to the genome. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. Split Alignments The Split Alignments view shows alignments of individual RNA sequences that cross exon splice sites. They are colored by strand as described above. Methods The RNA-Seq data were generated from high quality polyA RNA, and the RNA-Seq libraries were constructed using SOLiD Whole Transcriptome (WT) protocol and reagent kit. Total RNA in good quality was used as starting materials and purified twice through MACs polyT column aimed to enrich polyA and remove any contaminants (e.g., rRNA, tRNA, DNA, protein etc.). A one microgram enriched polyA RNA sample was then fragmented to small pieces, and a gel-based selection method was performed to collect fragmented random polyA at a size-range of 50-150 nt in length. The collected fragmental RNA was then hybridized and ligated to a mix of adaptors provided from ABI, followed by reverse transcription to generate corresponding cDNAs. The resulting cDNA library was further amplified by PCR and sequenced by SOLiD platform for single reads at 35 bp length (new version in 50 bp length). Cells were grown according to the approved ENCODE cell culture protocols. Data: The SOLiD-generated RNA-Seq reads were 35 bp in length. An initial filtering process was performed to remove any non-desirable contamination sequences, such as rRNA, tRNA, and repeats etc. A read-split mapping approach was developed to map the 35 bp reads onto the reference genome (NCBI Build 36/hg18). Specifically, the 35 bp reads were divided into two parts (1st-25 bp and 2nd-25 bp with 10 bp overlapping) and mapped separately. An extension mapping analysis was further performed to generate score counts from each read and use the score numbers as a gauge of filtering reference (e.g., scoring >26 was used). The reads with mapping locations N≤1 or N>10 were excluded from further analysis. As a unique strand-specific feature from the SOLiD RNA-Seq, the data sets generated by SOLiD RNA-Seq were strand-specific and mapped on exons with strand-specificity. Mapping parameters: Mapping was done using Applied Biosystems' SOLiD alignment for whole transcriptome analysis pipeline. Two mismatches were allowed in the 25 bp color space seed sequence with progressive alignment performed to find the full mapping location. A score is computed for each mapping location and any location that scored ≤26 was filtered. Credits The GIS RNA-seq libraries and sequence data for transcriptome analysis were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore. Contact: RUAN Xiaoan Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeGisRnaSeqView5SplitAlign Split Alignments ENCODE Genome Institute of Singapore RNA-seq Expression wgEncodeGisRnaSeqSplitAlignRep1K562CytosolLongpolya K562 cyto A+ SA1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqSplitAlignRep1K562CytosolLongpolya SplitAlign leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt ENCODE GIS RNA-seq Split Alignments Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqSplitAlignRep1H1hescCellLongpolya hESC cell A+ SA1 H1-hESC RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 270 Gingeras GIS cell 1 longPolyA wgEncodeGisRnaSeqSplitAlignRep1H1hescCellLongpolya SplitAlign embryonic stem cells Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt ENCODE GIS RNA-seq Split Alignments Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisRnaSeqView1PlusRawSignal Plus Raw Signal ENCODE Genome Institute of Singapore RNA-seq Expression wgEncodeGisRnaSeqPlusRawSignalRep2K562CytosolLongpolya K562 cyto A+ +S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqPlusRawSignalRep2K562CytosolLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS RNA-seq Plus Strand Raw Signal Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqPlusRawSignalRep1K562CytosolLongpolya K562 cyto A+ +S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqPlusRawSignalRep1K562CytosolLongpolya PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS RNA-seq Plus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqPlusRawSignalRep1H1hescCellLongpolya hESC cell A+ +S1 H1-hESC RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 270 Gingeras GIS cell 1 longPolyA wgEncodeGisRnaSeqPlusRawSignalRep1H1hescCellLongpolya PlusRawSignal embryonic stem cells Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS RNA-seq Plus Strand Raw Signal Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisRnaSeqPlusRawSignalRep1Gm12878CytosolLongpolya GM12 cyto A+ +S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 269 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqPlusRawSignalRep1Gm12878CytosolLongpolya PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE GIS RNA-seq Plus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisRnaSeqView2MinusRawSignal Minus Raw Signal ENCODE Genome Institute of Singapore RNA-seq Expression wgEncodeGisRnaSeqMinusRawSignalRep2K562CytosolLongpolya K562 cyto A+ -S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqMinusRawSignalRep2K562CytosolLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS RNA-seq Minus Strand Raw Signal Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqMinusRawSignalRep1K562CytosolLongpolya K562 cyto A+ -S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqMinusRawSignalRep1K562CytosolLongpolya MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS RNA-seq Minus Strand Raw Signal Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqMinusRawSignalRep1H1hescCellLongpolya hESC cell A+ -S1 H1-hESC RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 270 Gingeras GIS cell 1 longPolyA wgEncodeGisRnaSeqMinusRawSignalRep1H1hescCellLongpolya MinusRawSignal embryonic stem cells Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS RNA-seq Minus Strand Raw Signal Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisRnaSeqMinusRawSignalRep1Gm12878CytosolLongpolya GM12 cyto A+ -S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 269 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqMinusRawSignalRep1Gm12878CytosolLongpolya MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE GIS RNA-seq Minus Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisRnaSeqView3AllRawSignal All Raw Signal ENCODE Genome Institute of Singapore RNA-seq Expression wgEncodeGisRnaSeqAllRawSignalRep2K562CytosolLongpolya K562 cyto A+ AS2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqAllRawSignalRep2K562CytosolLongpolya RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE GIS RNA-seq All Strand Raw Signal Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqAllRawSignalRep1K562CytosolLongpolya K562 cyto A+ AS1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqAllRawSignalRep1K562CytosolLongpolya RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE GIS RNA-seq All Strand Raw Signal Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqAllRawSignalRep1H1hescCellLongpolya hESC cell A+ AS1 H1-hESC RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 270 Gingeras GIS cell 1 longPolyA wgEncodeGisRnaSeqAllRawSignalRep1H1hescCellLongpolya RawSignal embryonic stem cells Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE GIS RNA-seq All Strand Raw Signal Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisRnaSeqAllRawSignalRep1Gm12878CytosolLongpolya GM12 cyto A+ AS1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 269 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqAllRawSignalRep1Gm12878CytosolLongpolya RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE GIS RNA-seq All Strand Raw Signal Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression wgEncodeGisRnaSeqView4Alignments Alignments ENCODE Genome Institute of Singapore RNA-seq Expression wgEncodeGisRnaSeqAlignmentsRep2K562CytosolLongpolya K562 cyto A+ AL2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 2 longPolyA wgEncodeGisRnaSeqAlignmentsRep2K562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS RNA-seq Alignments Rep 2 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqAlignmentsRep1K562CytosolLongpolya K562 cyto A+ AL1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 271 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqAlignmentsRep1K562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS RNA-seq Alignments Rep 1 (PolyA+ RNA in K562 cytosol) Expression wgEncodeGisRnaSeqAlignmentsRep1H1hescCellLongpolya hESC cell A+ AL1 H1-hESC RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 270 Gingeras GIS cell 1 longPolyA wgEncodeGisRnaSeqAlignmentsRep1H1hescCellLongpolya Alignments embryonic stem cells Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore Whole cell Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS RNA-seq Alignments Rep 1 (PolyA+ RNA in H1-hESC cell) Expression wgEncodeGisRnaSeqAlignmentsRep1Gm12878CytosolLongpolya GM12 cyto A+ AL1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-29 2010-07-29 269 Gingeras GIS cytosol 1 longPolyA wgEncodeGisRnaSeqAlignmentsRep1Gm12878CytosolLongpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Ruan - Genome Institute of Singapore The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE GIS RNA-seq Alignments Rep 1 (PolyA+ RNA in GM12878 cytosol) Expression gnfAtlas2 GNF Atlas 2 GNF Expression Atlas 2 Expression Description This track shows expression data from the GNF Gene Expression Atlas 2. This contains two replicates each of 79 human tissues run over Affymetrix microarrays. By default, averages of related tissues are shown. Display all tissues by selecting "All Arrays" from the "Combine arrays" menu on the track settings page. As is standard with microarray data red indicates overexpression in the tissue, and green indicates underexpression. You may want to view gene expression with the Gene Sorter as well as the Genome Browser. Credits Thanks to the Genomics Institute of the Novartis Research Foundation (GNF) for the data underlying this track. References Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. PMID: 15075390; PMC: PMC395923 affyRatio GNF Ratio GNF Gene Expression Atlas Ratios Using Affymetrix GeneChips Expression Description This track shows expression data from GNF (The Genomics Institute of the Novartis Research Foundation) using Affymetrix GeneChips. The chip types, chip IDs or tissue averages associated with experiments can be displayed by selecting the appropriate option from the Experiment Display menu on the track description page. For more information, see the Track Configuration section. Methods For detailed information about the experiments, see Su et al. 2002 in the References section below. Alignments displayed on the track correspond to the target sequences used by Affymetrix to choose probes. In dense display mode, the track color denotes the average signal over all experiments on a log base 2 scale. Lighter colors correspond to lower signals and darker colors correspond to higher signals. In full display mode, the color of each item represents the log base 2 ratio of the signal of that particular experiment to the median signal of all experiments for that probe. More information about individual probes and probe sets is available on the Affymetrix website. Track Configuration This track may be configured to change the display mode and colors or vary the type of experiment information shown. The configuration controls are located at the top of the track description page, which is accessed via the small button to the left of the track's graphical display or the link on the track's control menu. Display mode: To change the display mode for the track, select the desired display setting from the Display Mode pulldown list. Combine Arrays: All arrays may be displayed with either the chip ID or the tissue type as the label. Replicate arrays may also be combined by expression medians. When you have finished making changes, click the Submit button to commit your changes and return to the Genome Browser tracks display. Credits Thanks to GNF for providing these data. References Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A. et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99(7), 4465-70 (2002). gwasCatalog GWAS Catalog NHGRI-EBI Catalog of Published Genome-Wide Association Studies Phenotype and Disease Associations Description This track displays single nucleotide polymorphisms (SNPs) identified by published Genome-Wide Association Studies (GWAS), collected in the NHGRI-EBI GWAS Catalog published jointly by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EMBL-EBI). Some abbreviations are used above. From http://www.ebi.ac.uk/gwas/docs/about: The Catalog is a quality controlled, manually curated, literature-derived collection of all published genome-wide association studies assaying at least 100,000 SNPs and all SNP-trait associations with p-values < 1.0 x 10-5 (Hindorff et al., 2009). For more details about the Catalog curation process and data extraction procedures, please refer to the Methods page. Methods From http://www.ebi.ac.uk/gwas/docs/methods: The GWAS Catalog data is extracted from the literature. Extracted information includes publication information, study cohort information such as cohort size, country of recruitment and subject ethnicity, and SNP-disease association information including SNP identifier (i.e. RSID), p-value, gene and risk allele. Each study is also assigned a trait that best represents the phenotype under investigation. When multiple traits are analysed in the same study either multiple entries are created, or individual SNPs are annotated with their specific traits. Traits are used both to query and visualise the data in the Catalog's web form and diagram-based query interfaces. Data extraction and curation for the GWAS Catalog is an expert activity; each step is performed by scientists supported by a web-based tracking and data entry system which allows multiple curators to search, annotate, verify and publish the Catalog data. Papers that qualify for inclusion in the Catalog are identified through weekly PubMed searches. They then undergo two levels of curation. First all data, including association information for SNPs, traits and general information about the study, are extracted by one curator. A second curator then performs an additional round of curation to double-check the accuracy and consistency of all the information. Finally, an automated pipeline performs validation of the extracted data, see the Quality control and SNP mapping section below for more details. This information is then used for queries and in the production of the diagram. Data Access The raw data can be explored interactively with the Table Browser, or Data Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server (gwasCatalog*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. Previous versions of this track can be found on our archive download server. References Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7. PMID: 19474294; PMC: PMC2687147 ntHumChimpCodingDiff H-C Coding Diffs Neandertal Alleles in Human/Chimp Coding Non-synonymous Differences in Human Lineage Neandertal Assembly and Analysis Description This track displays Neandertal alleles for human-chimp protein-coding differences on the human lineage using orangutan as the outgroup to determine which allele is more likely to be ancestral. Display Conventions and Configuration Neandertal ancestral alleles are colored blue; derived (human) alleles are colored green. The item names show the number of Neandertal reads for the ancestral and derived alleles, followed by the ancestral and derived codons enclosed in parentheses. For example, if no Neandertal reads matched the ancestral base G and three Neandertal reads matched the derived base A, and the ancestral and derived codons were GTA and ATA respectively, then the item name would be "0G>3A(GTA>ATA)". If N Neandertal reads match neither ancestral nor derived base, then a "+N?" is added before the codons (i.e. "0G>3A+N?(GTA>ATA)"). Methods Neandertal DNA was extracted from a ~49,000-year-old bone (Sidrón 1253), which was excavated in El Sidrón cave, Asturias, Spain. Non-synonymous changes that occurred on the human lineage since the ancestral split with chimpanzee were identified by aligning human, chimpanzee and orangutan protein sequences for all orthologous proteins in HomoloGene (Build 58) . Comparison of these three species allowed the assignment of human/chimpanzee differences to their respective evolutionary lineages. An Agilent custom oligonucleotide array covering the 13,841 non-synonymous changes inferred to have occurred in the human lineage was designed and used to capture Neandertal sequences. Reference Burbano HA, Hodges E, Green RE, Briggs AW, Krause J, Meyer M, Good JM, Maricic T, Johnson PL, Xuan Z et al. Targeted investigation of the Neandertal genome by array-based sequence capture. Science. 2010 May 7;328(5979):723-5. PMID: 20448179; PMC: PMC3140021 HInvGeneMrna H-Inv H-Invitational Genes mRNA Alignments mRNA and EST Description This track shows alignments of full-length cDNAs that were used as the basis of the H-Invitational Gene Database (HInv-DB). The HInv-DB is a human gene database, with integrative annotation of 56,419 full-length cDNA clones currently available from six high throughput cDNA sequencing projects. This database represents 25,585 cDNA clusters. The project was initiated in 2002 and the database became publicly available in April 2004. HInv-DB entries describe the following entities: gene structures functions novel alternative splicing isoforms non-coding functional RNAs functional domains sub-cellular localizations mapping of SNPs and microsatellite repeat motifs in relation with orphan diseases gene expression profiling comparative results with mouse full-length cDNAs in the context of molecular evolution Methods A full description of the construction of the HInv-DB is contained in the report by the H-Inv Consortium (see References section). Credits The H-InvDB is hosted at the JBIRC. The human-curated annotations were produced during invitational annotation meetings held in Japan during the summer of 2002, with a follow-up meeting in November 2004. Participants included 158 scientists representing 67 institutions from 12 countries. The full-length cDNA clones and sequences were produced by the Chinese National Human Genome Center (CHGC), the Deutsches Krebsforschungszentrum (DKFZ/MIPS), Helix Research Institute, Inc. (HRI), the Institute of Medical Science in the University of Tokyo (IMSUT), the Kazusa DNA Research Institute (KDRI), the Mammalian Gene Collection (MGC/NIH) and the Full-Length Long Japan (FLJ) project. References Imanishi, T. et al. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2:(6), e162 (2004). wgEncodeHudsonalphaMethylSeq HAIB Methyl-seq ENCODE HudsonAlpha Methyl-seq Regulation Description This track shows average methylation status in CpG islands. In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. Release Notes This is release 2 of this track. Release 2 adds tables for several new cell types: GM12891, GM12892, H1-hESC, HeLa-S3, and HepG2. Track Conventions Methylation status is color-coded as: orange = methylated (bed score = 1000) blue = non-methylated (bed score = 0) Methods CpG regions were assayed via Methyl-seq, a method developed in the Myers laboratory to measure the methylation status at CpGs throughout the genome. It combines DNA digestion by a methyl-sensitive enzyme HpaII and its methyl-insensitive isoschizomer MspI with the Illumina DNA sequencing platform. The method was first applied in a collaboration with the laboratory of Dr. Julie Baker at Stanford University to study methylation and gene expression changes that occur in human embryonic stem cells before and after differentiation to definitive endoderm. A paper describing the results as well as the method has been submitted for publication [1]. This study profiled genomic DNA and mRNA samples derived from two human embryonic stem cell lines: H9 and BG02. These cells were differentiated into definitive endoderm, embryoid bodies, embryoid body-derived cells, and AFP+ (alpha-fetoprotein positive) hepatocytes. These in vitro samples were profiled with Methyl-seq and compared them with normal tissue samples from 11-week and 24-week fetal liver and adult liver. Methyl-seq assays more than 250,000 methyl-sensitive restriction enzyme cleavage sites, representing more than 90,000 genomic regions. These regions include 35,528 annotated CpG islands, while the remaining 55,084 non-CpG island regions are distributed across the genome in promoters, genes, and intergenic regions. Sequence tags present in MspI libraries but not in HpaII libraries are derived from methylated regions. Conversely, sequence tags that occur in HpaII libraries come from at least partially unmethylated regions. In vitro differentiation Definitive endoderm precursor cells were generated from H9 hES cells by treating them with activin A. Embryoid bodies (EBs) were generated by growing undifferentitated H9 and BG02 hESCs in suspension. EB-derived cells were obtained by plating clumps of the cells from the EBs. AFP+ fetal hepatocytes were derived from EBs by plating EB cells with FgF, followed by fluorescence activated cell sorting (FACS) to isolate cells expressing the green fluorescent protein (GFP) reporter gene driven from the AFP promoter. Isolation of genomic DNA Genomic DNA is isolated from biological replicates of each cell line by using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation is determined by UV absorbance. HpaII and MspI digestions Cleavage of DNA by restriction endonuclease HpaII is prevented by the presence of a 5-methyl group at the internal C residue of its recognition sequence CCGG. MspI, an isoschizomer of HpaII, cleaves DNA irrespective of the presence of a methyl group at this position. For the MspI library, 5 μg genomic DNA was digested in a 100 μl reaction with 1X NEB Buffer2 and 20 units MspI restriction enzyme and incubated for 18 hr at 37°C. For the HpaII library, 5 μg genomic DNA was digested in a 100 μl reaction with 1X NEB Buffer1 and 20 units HpaII restriction enzyme and incubated for 18 hr at 37°C. Note that in subsequent versions of the Methyl-seq protocol, which will be described later, much lower amounts of genomic DNA were used (1 μg and potentially lower). DNA library construction and sequencing High-throughput sequencing libraries were generated from DNA fragments of the HpaII or MspI digested genomic DNA according to the protocol posted at the Myers' lab protocols page. This approach was recently modified by removing the first PCR amplification step, just prior to the gel electrophoresis size-selection step, which was found to reduce a fragment-size bias in the sequencing libraries. These libraries were sequenced with an Illumina Genome Analyzer (GA2) according to the manufacturer's recommendations. Data analysis For this analyis, reads that align to human genome sequence version hg18 and contain the 5'-CGG-3' HpaII-cut signature on their 5' end were used. These aligned sequence reads were mapped to CCGG sites predicted in silico on hg18. Sites with four or more Msp1 tags occurring in either the forward or reverse direction were retained for analysis. These "assayable" sites were then grouped with neighboring sites that are within 35-75 bp of each other. Thus, a "region" can be comprised of between 2 and 18 digestion sites that are each within 35-75 bp of another site. Methylated and non-methylated calls were made by using HpaII tag data from all assayable cut sites. For each site across each region, the larger of either the forward read count or reverse read count was used. Regions that have an average of 0 or 1 read per cut site are called methylated, and regions with more than one sequence read per site are called unmethylated. Credits Dr. Richard M. Myers Mr. Yuya Kobayashi: yuyak@stanford. edu Dr. Devin M. Absher: dabsher@hudsonalpha. org Dr. Rebekka O. Sprouse: rsprouse@hudsonalpha. org Contact: Flo Pauli. References 1. Brunner AL, Johnson DS, Kim SW, Valouev A, Reddy TE, Neff NF, Anton E, Medina C, Nguyen L, Chiao E et al. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Research. 2009 Jun;19(6):1044-56. Epub 2009 Mar 9. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeHudsonalphaMethylSeqRegionsRep2K562 K562 2 K562 MethylSeq ENCODE July 2009 Freeze 2009-07-15 2010-04-15 297 Myers HudsonAlpha 2 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep2K562 Regions leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 2 (in K562 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1K562 K562 1 K562 MethylSeq ENCODE July 2009 Freeze 2009-07-15 2010-04-15 297 Myers HudsonAlpha 1 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep1K562 Regions leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in K562 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Hepg2Pcr2x HepG2 2 HepG2 MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 312 Myers HudsonAlpha SL579 PCR1x 2 MACS wgEncodeHudsonalphaMethylSeqRegionsRep2Hepg2Pcr2x Regions hepatocellular carcinoma DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 2 (in HepG2 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Hepg2Pcr2x HepG2 1 HepG2 MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 312 Myers HudsonAlpha SL578 PCR1x 1 MACS wgEncodeHudsonalphaMethylSeqRegionsRep1Hepg2Pcr2x Regions hepatocellular carcinoma DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 1 (in HepG2 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Helas3Pcr2x HeLa-S3 2 HeLa-S3 MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 309 Myers HudsonAlpha SL603 PCR1x 2 MACS wgEncodeHudsonalphaMethylSeqRegionsRep2Helas3Pcr2x Regions cervical carcinoma DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 2 (in HeLa-S3 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Helas3Pcr2x HeLa-S3 1 HeLa-S3 MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 309 Myers HudsonAlpha SL577 PCR1x 1 MACS wgEncodeHudsonalphaMethylSeqRegionsRep1Helas3Pcr2x Regions cervical carcinoma DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 1 (in HeLa-S3 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Hfl24w HFL24W 1 HFL24W MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 291 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Hfl24w Regions fetal liver 24 weeks, consented fetal liver samples were isolated from legally aborted fetuses at 24 weeks gestation DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in HFL24W cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Hfl11w HFL11W 1 HFL11W MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 290 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Hfl11w Regions fetal liver 11 weeks, consented fetal liver samples were isolated from legally aborted fetuses at 11 weeks gestation DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in HFL11W cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Hct116 HCT-116 2 HCT-116 MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 289 Myers HudsonAlpha 2 wgEncodeHudsonalphaMethylSeqRegionsRep2Hct116 Regions colorectal carcinoma (PMID: 7214343) DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 2 (in HCT-116 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Hct116 HCT-116 1 HCT-116 MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 289 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Hct116 Regions colorectal carcinoma (PMID: 7214343) DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in HCT-116 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Hal HAL 1 HAL MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 288 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Hal Regions adult liver, genomic DNA purified from surgically excised adult human liver DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in HAL cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9esebd H9ES-EBD 1 H9ES-EBD MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 287 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9esebd Regions embryonic stem cell (hESC) H9, embryoid body-derived DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-EBD cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9eseb H9ES-EB 1 H9ES-EB MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 286 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9eseb Regions embryonic stem cell (hESC) H9, embryoid bodies DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-EB cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9ese H9ES-E 1 H9ES-E MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 285 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9ese Regions embryonic stem cell (hESC) H9, endoderm DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-E cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9escm H9ES-CM 1 H9ES-CM MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 284 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9escm Regions embryonic stem cell (hESC) H9, treatment: H9 conditioned medium DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-CM cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9esafpPos H9ES-AFP+ 1 H9ES-AFP+ MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 282 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9esafpPos Regions embryonic stem cell (hESC), H9-derived, treatment: H9 AFP+ DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-AFP+ cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9esafpNeg H9ES-AFP- 1 H9ES-AFP- MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 283 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9esafpNeg Regions embryonic stem cell (hESC), H9-derived, treatment: H9 AFP- DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES-AFP- cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H9es H9ES 1 H9ES MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 281 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1H9es Regions embryonic stem cell (hESC) H9 DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in H9ES cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2H1hesc H1-hESC 2 H1-hESC MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 306 Myers HudsonAlpha PCR1x 2 SL659 wgEncodeHudsonalphaMethylSeqRegionsRep2H1hesc Regions embryonic stem cells DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 2 (in H1-hESC cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1H1hesc H1-hESC 1 H1-hESC MethylSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 306 Myers HudsonAlpha PCR1x 1 SL605 wgEncodeHudsonalphaMethylSeqRegionsRep1H1hesc Regions embryonic stem cells DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology one 15-cycle round of PCR (Myers) Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 1 (in H1-hESC cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12892 GM12892 2 GM12892 MethylSeq ENCODE Jan 2010 Freeze 2009-11-30 2010-08-29 303 Myers HudsonAlpha 2 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12892 Regions B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 2 (in GM12892 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12892 GM12892 1 GM12892 MethylSeq ENCODE Jan 2010 Freeze 2009-11-30 2010-08-29 303 Myers HudsonAlpha 1 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12892 Regions B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 1 (in GM12892 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12891 GM12891 2 GM12891 MethylSeq ENCODE Jan 2010 Freeze 2009-11-25 2010-08-25 300 Myers HudsonAlpha 2 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12891 Regions B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 2 (in GM12891 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12891 GM12891 1 GM12891 MethylSeq ENCODE Jan 2010 Freeze 2009-11-25 2010-08-25 300 Myers HudsonAlpha 1 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12891 Regions B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-Seq Regions Replicate 1 (in GM12891 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12878 GM12878 2 GM12878 MethylSeq ENCODE July 2009 Freeze 2009-07-15 2010-04-15 294 Myers HudsonAlpha 2 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep2Gm12878 Regions B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 2 (in GM12878 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12878 GM12878 1 GM12878 MethylSeq ENCODE July 2009 Freeze 2009-07-15 2010-04-15 294 Myers HudsonAlpha 1 myerslab wgEncodeHudsonalphaMethylSeqRegionsRep1Gm12878 Regions B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in GM12878 cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep2Bg02esebd BG02ES-EBD 2 BG02ES-EBD MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 280 Myers HudsonAlpha 2 wgEncodeHudsonalphaMethylSeqRegionsRep2Bg02esebd Regions embryonic stem cell (hESC), BG02, embryoid body-derived DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 2 (in BG02ES-EBD cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Bg02esebd BG02ES-EBD 1 BG02ES-EBD MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 280 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Bg02esebd Regions embryonic stem cell (hESC), BG02, embryoid body-derived DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in BG02ES-EBD cells) Regulation wgEncodeHudsonalphaMethylSeqRegionsRep1Bg02es BG02ES 1 BG02ES MethylSeq ENCODE July 2009 Freeze 2009-02-10 2009-11-10 279 Myers HudsonAlpha 1 wgEncodeHudsonalphaMethylSeqRegionsRep1Bg02es Regions embryonic stem cell (hESC), BG02, treatment: H9 conditioned medium DNA Methyl Seq Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE HudsonAlpha Methyl-seq Regions Replicate 1 (in BG02ES cells) Regulation wgEncodeHudsonalphaMethyl27 HAIB Methyl27 ENCODE HudsonAlpha CpG Methylation by Illumina Methyl27 Regulation Description These tracks display the methylation status of specific CpG dinucleotides in the given cell types as identified by the Illumina Infinium HumanMethylation27 BeadArray platform. In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. This method was used to validate Methyl-seq data generated in collaboration with the laboratory of Dr. Julie Baker at Stanford University to study methylation and gene expression changes that occur in human embryonic stem cells before and after differentiation to definitive endoderm [1]. Based on the results of these experiments, cut-off beta values to call "methylated" and "unmethylated" are selected. Analysis is performed for two replicates for each cell line. Detailed information for the CpG targets is in an XLS formatted spreadsheet on the Myers' lab protocols website. Display Conventions Scores associated with each site are beta value multiplied by 1000. Methylation status is color-coded as: orange = methylated (score >= 600) purple = partially methylated (200 < score < 600) bright blue = unmethylated (0 < score <= 200) black = NA (score = 0) Methods Cells were grown according to the approved ENCODE cell culture protocols. Genomic DNA was isolated from biological replicates of each cell line by using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation was determined by UV absorbance. The Methyl27K platform uses bisulfite treated genomic DNA to assay the methylation status of 27,578 CpG sites within more than 14,000 genes. Genomic DNA treated with sodium bisulfite converts unmethylated cytosine of a CpG dinucleotides into uracil; methylated cytosines do not get converted. After bisulfite treatment, the methylation status of a site is assayed by single base-pair extension with a Cy3 or Cy5 labeled nucleotide on oligo-beads specific for the methylated or unmethylated state. A beta value is calculated by Illumina's Bead Studio software for each CpG target. This value represents the intensity value from the methylated bead type divided by the sum of the intensity values from the methylated and unmethylated bead types for any given CpG target. Bisulfite conversion reaction was done using the Zymo Research EZ-96 DNA MethylationTM Kit. One step of the protocol was modified. During the incubation, a 30 sec 95oC denaturing step every hour was included to increase reaction efficiency as recommended by the Illumina Infinium Human Methylation27 protocol. The bead arrays were run according to the protocol provided by Illumina. The intensity data from the BeadArray was processed using Illumina's BeadStudio software with the Methylation Module v3.2. The data was then quality-filtered using p-values. Any beta value equal to or greater than 0.6 is considered fully methylated. Any beta value equal to or less than 0.2 is considered to be fully unmethylated. Beta values between 0.2 and 0.6 are considered to be partially methylated. Beta-values are quality filtered and spots that fall below the minimum intensity threshold are displayed as "NA". Credits Dr. Richard M. Myers Mr. Yuya Kobayashi: yuyak@stanford.edu Dr. Devin M. Absher: dabsher@hudsonalpha.org Dr. Rebekka O. Sprouse: rsprouse@hudsonalpha.org Contact: Flo Pauli. References Brunner AL, Johnson DS, Kim SW, Valouev A, Reddy TE, Neff NF, Anton E, Medina C, Nguyen L, Chiao E et al. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Research. 2009 Jun;19(6):1044-56. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeHudsonalphaMethyl27K562r2 K562 2 K562 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 278 Myers HudsonAlpha Methyl27 2 wgEncodeHudsonalphaMethyl27K562r2 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 K562 replicate 2 Regulation wgEncodeHudsonalphaMethyl27K562r1 K562 1 K562 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 278 Myers HudsonAlpha Methyl27 1 wgEncodeHudsonalphaMethyl27K562r1 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 K562 replicate 1 Regulation wgEncodeHudsonalphaMethyl27HepG2r2 HepG2 2 HepG2 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 277 Myers HudsonAlpha Methyl27 2 wgEncodeHudsonalphaMethyl27HepG2r2 hepatocellular carcinoma DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 HepG2 replicate 2 Regulation wgEncodeHudsonalphaMethyl27HepG2r1 HepG2 1 HepG2 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 277 Myers HudsonAlpha Methyl27 1 wgEncodeHudsonalphaMethyl27HepG2r1 hepatocellular carcinoma DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 HepG2 replicate 1 Regulation wgEncodeHudsonalphaMethyl27GM12878r2 GM12878 2 GM12878 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 276 Myers HudsonAlpha Methyl27 2 wgEncodeHudsonalphaMethyl27GM12878r2 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 GM12878 replicate 2 Regulation wgEncodeHudsonalphaMethyl27GM12878r1 GM12878 1 GM12878 MethylArray ENCODE Feb 2009 Freeze 2008-12-23 2009-09-23 276 Myers HudsonAlpha Methyl27 1 wgEncodeHudsonalphaMethyl27GM12878r1 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNA Methylation Array Myers Myers - Hudson Alpha Institute for Biotechnology ENCODE HudsonAlpha Methyl27 GM12878 replicate 1 Regulation hapmapLdPh HapMap LD Phased HapMap Linkage Disequilibrium - Phase II - from Phased Genotypes Variation and Repeats Description Linkage disequilibrium (LD) is the association of alleles on chromosomes. It measures the difference between the observed allele frequency for a two-locus allele combination as compared to its expected frequency, which is the product of the two single allele frequencies. When LD is low, the two loci tend to be inherited in a nearly random manner. This track shows three different measures of linkage disequilibrium — D', r2, and LOD (log odds) — between pairs of SNPs as genotyped by the HapMap consortium. LD is useful for understanding the associations between genetic variants throughout the genome, and can be helpful in selecting SNPs for genotyping. By default, the display in full mode shows LOD values. Each diagonal represents a different SNP with each diamond representing a pairwise comparison between two SNPs. Shades are used to indicate linkage disequilibrium between the pair of SNPs, with darker shades indicating stronger LD. For the LOD values, additional colors are used in some cases: White diamonds indicate pairwise D' values less than 1 with no statistically significant evidence of LD (LOD < 2). Light blue diamonds indicate high D' values (>0.99) with low statistical significance (LOD > 2). Light pink diamonds are drawn when the statistical significance is high (LOD >= 2) but the D' value is low (less than 0.5). Methods Phased genotypes from HapMap Phase II release 22 were used with Haploview to calculate LD values for all SNP pairs within 250 kb. The YRI and CEU tracks each use 30 parents+child trios (90 individuals) and the combined JPT+CHB track uses 90 unrelated individuals. Haploview uses a two marker EM (ignoring missing data) to estimate the maximum-likelihood values of the four gamete frequencies, from which the D', LOD, and r2 calculations derive. Display Conventions and Configuration Display Mode Full mode shows the pairwise LD values in a Haploview-style mountain plot. Dense mode shows the pairwise LD values in a single line for each population, where the intensity at each position is the average of all of the LD values between the SNP at that position and all other SNPs within 250 kb. LD Values: measures of linkage disequilibrium r2 displays the raw r2 value, or the square of the correlation coefficient for a given marker pair. SNPs that have not been separated by recombination have r2 = 1; in this case, these two markers are said to be redundant for genotyping, but may have different functional effects. Lower r2 values show a lower degree of LD, indicating that some recombination has occurred in this population. See Hill and Robertson (1966) for details. D' displays the raw D' value, which is the normalized covariance for a given marker pair. A D' value of 1 (complete LD) indicates that two SNPs have not been separated by recombination, while lower values indicate evidence of recombination in the history of the sample. Only D' values near 1 are a reliable measure of LD; lower values are difficult to interpret as the magnitude of D' depends strongly on sample size. See Lewontin (1988) for more details. LOD displays the log odds score for linkage disequilibrium between a given marker pair, and is shown by default. Track Geometry Trim to triangle shows the standard mountain plot (default); turning this option off will show LD values with SNPs outside the window. Inverting makes it easier to visually compare two adjacent populations. Colors LD Values can be drawn in a variety of colors, with red as default. The intensity of the color is proportional to the strength of the LD measure chosen above. Outlines can be drawn in contrasting colors or turned off. Outlines are automatically suppressed when the window is larger than 100,000 bp. Population Selection The HapMap populations can be individually displayed or hidden. YRI: Yoruba people in Ibadan, Nigeria (30 parent-and-adult-child trios) CEU: European samples from the Centre d'Etude du Polymorphisme Humain (CEPH) (30 trios) JPT+CHB: Combination of Japanese in Tokyo (45 unrelated individuals) and Han Chinese in Beijing (45 unrelated individuals) Credits This track was created at UCSC using data from the International HapMap Project and LD scores were computed using the Haploview program. The genome browser track display was created by Daryl Thomas following the display style from Haploview. References HapMap Project The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320. The International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96. HapMap Data Coordination Center Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592-3. Haploview Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005 Jan 15;21(2):263-5. Epub 2004 Aug 5. General references on Linkage Disequilibrium Lewontin, RC. On measures of gametic disequilibrium. Genetics. 1988 Nov;120(3):849-52. Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet Res. 1966 Dec;8(3):269-94. hapmapLdPhChbJpt Ph JPT+CHB LD for the Han Chinese + Japanese from Tokyo (JPT+CHB) from Phased Genotypes Variation and Repeats hapmapLdPhCeu Phased CEU Linkage Disequilibrium for the CEPH (CEU) from Phased Genotypes Variation and Repeats hapmapLdPhYri Phased YRI Linkage Disequilibrium for the Yoruba (YRI) from Phased Genotypes Variation and Repeats hapmapSnps HapMap SNPs HapMap SNPs (rel27, merged Phase II + Phase III genotypes) Variation and Repeats Description The HapMap Project identified a set of approximately four million common SNPs, and genotyped these SNPs in four populations in Phase II of the project. In Phase III, it genotyped approximately 1.4 to 1.5 million SNPs in eleven populations. This track shows the combined data from Phases II and III. The intent is that this data can be used as a reference for future studies of human disease. This track displays the genotype counts and allele frequencies of those SNPs, and (when available) shows orthologous alleles from the chimp and macaque reference genome assemblies. The four million HapMap Phase II SNPs were genotyped on individuals from these four human populations: Yoruba in Ibadan, Nigeria (YRI) Japanese in Tokyo, Japan (JPT) Han Chinese in Beijing, China (CHB) CEPH (Utah residents with ancestry from northern and western Europe) (CEU) Phase III expanded to eleven populations: the four above, plus the following: African Ancestry in SouthWestern United States (ASW) Chinese Ancestry in Metropolitan Denver, CO, US (CHD) Gujarati Indians in Houston, TX (GIH) Luhya in Webuye, Kenya (LWK) Mexican Ancestery in Los Angeles, CA, US (MEX) Masai in Kinyawa, Kenya (MKK) Toscani in Italia (TSI) Each of the populations is displayed in a separate subtrack. The HapMap assays provide biallelic results. Over 99.8% of HapMap SNPs are described as biallelic in dbSNP build 129; approximately 6,800 are described as more complex types (in-del, mixed, etc). 70% of the HapMap SNPs are transitions: 35% are A/G, 35% are C/T. The orthologous alleles in chimp (panTro2) and macaque (rheMac2) were derived using liftOver. No two HapMap SNPs occupy the same position. Aside from 430 SNPs from the pseudoautosomal region of chrX and chrY, no SNP is mapped to more than one location in the reference genome. No HapMap SNPs occur on "random" chromosomes (concatenations of unordered and unoriented contigs). Display Conventions and Configuration Note: calculation of heterozygosity has changed since the Phase II (rel22) version of this track. Observed heterozygosity is calculated as follows: each population's heterozygosity is computed as the proportion of heterozygous individuals in the population. The population heterozygosities are averaged to determine the overall observed heterozygosity. [For Phase II genotypes, expected heterozygosity was calculated as follows: the allele counts from all populations were summed (not normalized for population size) and used to determine overall major and minor allele frequencies. Assuming Hardy-Weinberg equilibrium, overall expected heterozygosity was calculated as two times the product of major and minor allele frequencies (see Modern Genetic Analysis, section 17-2).] The human SNPs are displayed in gray using a color gradient based on minor allele frequency. The higher the minor allele frequency, the darker the display. By definition, the maximum minor allele frequency is 50%. When zoomed to base level, the major allele is displayed for each population. The orthologous alleles from chimp and macaque are displayed in brown using a color gradient based on quality score. Quality scores range from 0 to 100 representing low to high quality. For orthologous alleles, the higher the quality, the darker the display. Quality scores are not available for chimp chromosomes chr21 and chrY; these were set to 98, consistent with the panTro2 browser quality track. Filters are provided for the data attributes described above. Additionally, a filter is provided for observed heterozgosity (average of all populations' observed heterozygosities). Filters are applied to all subtracks, even if a subtrack is not displayed. Notes on orthologous allele filters: If a SNP's major allele is different between populations, no overall major allele for human is determined, thus the "matches major human allele" and "matches minor human allele" filters for orthologous alleles do not apply. If a SNP is monomorphic in all populations, the minor allele is not verified in the HapMap dataset. In these cases, the filter to match orthologous alleles to the minor human allele will yield no results. Credits This track is based on International HapMap Project release 27 data, provided by the HapMap Data Coordination Center. References HapMap Project The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320. The International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96. HapMap Data Coordination Center Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592-3. A Sampling of HapMap Literature Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006 Mar 1; 15(5):789-95. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al. Global variation in copy number in the human genome. Nature. 2006 Nov 23;444(7118):444-454. Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. Common genetic variants account for differences in gene expression among ethnic groups. Nature Genet. 2007 Feb;39(2):226-31. Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007 Apr;17(4):520-6. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biol. 2006 Mar;4(3):e72. Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005 Nov;15(11):1468-76. Data Source The genotypes_chr*_*_r27_nr.b36_fwd.txt.gz files from the HapMap FTP site were processed to make this track. hapmapAllelesMacaque Macaque Alleles Orthologous Alleles from Macaque (rheMac2) Variation and Repeats hapmapAllelesChimp Chimp Alleles Orthologous Alleles from Chimp (panTro2) Variation and Repeats hapmapSnpsYRI HapMap SNPs YRI HapMap SNPs from the YRI Population (Yoruba in Ibadan, Nigeria) Variation and Repeats hapmapSnpsJPT HapMap SNPs JPT HapMap SNPs from the JPT Population (Japanese in Tokyo, Japan) Variation and Repeats hapmapSnpsCHB HapMap SNPs CHB HapMap SNPs from the CHB Population (Han Chinese in Beijing, China) Variation and Repeats hapmapSnpsCEU HapMap SNPs CEU HapMap SNPs from the CEU Population (Northern and Western European Ancestry in Utah, US - CEPH) Variation and Repeats wgEncodeHelicosRnaSeq Helicos RNA-seq ENCODE Helicos RNA-seq Expression Description This track depicts high throughput sequencing of long RNAs (>200 nt) from whole cell RNA samples from tissues or sub cellular compartments from cell lines included in the ENCODE Transcriptome subproject. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. RNA-Seq was performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing of the cDNA, which was done here on Helicos™ Genetic Analysis System (Harris et al; http://www.helicosbio.com/). Display Conventions and Configuration This is a multi-view track that provides the following views of the data: Alignments RNA-seq tag alignments. Raw Signal Density graph (wiggle) of the number of reads overlapping a nucleotide in the genome. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments. Note that the strand of the RNA is not displayed in the track in the genome browser. The strand can be found in the download file. Methods Cells were grown according to the approved ENCODE cell culture protocols. RNA molecules longer than 200 nt and present in RNA population isolated from different subcellular compartments (such as cytosol, nucleus, polysomes and others) were fractionated into polyA+ and polyA- fractions as described in these protocols. RNA was converted into first strand cDNA using a high excess of random hexamers without prior fragmentation. Spurious second-strand cDNA synthesis could occur under these conditions. The first strand cDNA molecules were tailed at the 3′ ends with polyA residues using terminal transferase and used directly for sequencing. Filtered reads were aligned to the human genome using in-house and freely available Helicos Alignment software indexDPgenomic (http://open.helicosbio.com/mwiki/index.php/Docs/Software/Bioinformatics#Executables, requires registration (free)) with a minimum normalized alignment score of 4.5. The normalized score was defined as following: Score=(#matches*5-#mismatches*4)/length_read For example, in the following alignment: Tag Sequence CCTCCGTGTTGTTCCAGCC-CAGTGCTCGCAGG Ref Sequence C-TCCGTGTTGTTCCAGCCACAGTGCTCGCAGG Length of alignment block: 33 Length of tag sequence: 32 Number of matches: 31 Number of errors: 2 Score: (31*5) - (2*4) = 155 - 8 = 147 Normalized score = 147/32 = 4.59375 Raw data can be found at Helicos (requires registration (free)). Verification Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads. Credits Helicos BioSciences: Philipp Kapranov, Eldar Giladi, Steve Roels, Chris Hart, Stan Letovsky, Patrice Milos. Cold Spring Harbor Laboratory: Carrie Davis, Kim Bell, Huaien Wang, Tom Gingeras. Contacts: Philipp Kapranov ; Patrice Milos References Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z. Single-molecule DNA sequencing of a viral genome Science. 2008 Apr 4;320(5872):106-9 Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeHelicosRnaSeqViewyRawSignal Raw Signal ENCODE Helicos RNA-seq Expression wgEncodeHelicosRnaSeqRawSignalK562CytosolLongpolya K562cyto pA+ Sig K562 RnaSeq ENCODE Feb 2009 Freeze 2009-04-03 2010-01-03 272 Gingeras Helicos cytosol longPolyA wgEncodeHelicosRnaSeqRawSignalK562CytosolLongpolya RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Kapranov - Helicos BioScience Corporation The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Helicos RNA-seq Raw Signal (PolyA+ RNA in K562 cytosol) Expression wgEncodeHelicosRnaSeqViewAlignments Alignments ENCODE Helicos RNA-seq Expression wgEncodeHelicosRnaSeqAlignmentsK562CytosolLongpolya K562cyto pA+ Tag K562 RnaSeq ENCODE Feb 2009 Freeze 2009-04-03 2010-01-03 272 Gingeras Helicos cytosol longPolyA wgEncodeHelicosRnaSeqAlignmentsK562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Kapranov - Helicos BioScience Corporation The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Helicos RNA-seq Tags (PolyA+ RNA in K562 cytosol) Expression hg18ContigDiff Hg19 Diff Contigs dropped or changed from NCBI build 36(hg18) to GRCh37(hg19) Mapping and Sequencing Description This track indicates differences in genome assembly NCBI Build 36 (hg18) when compared to the next version GRCh37 (hg19). Contigs used in this assembly, NCBI Build 36 (hg18), that were not carried forward to GRCh37 (hg19) are shown. The following color/score key is used: itemcoloritemscoretype of change from hg18 to hg19 0Contigs in hg18 dropped in the transition to hg19 500Different portion of the same contig used in hg19 compared to hg18 1000Updated version of the same contig used in hg19 to correct errors in the hg18 sequence You can use the score filter to eliminate or include the different categories in the display. Methods The contig coordinates were extracted from the AGP files for the assemblies. Contigs that match the same name, same version, and the same specific portion of sequence are noted as identical in the two assemblies. Contigs not satisifying those identical characteristics are included in this track. Credits The data and presentation of this track were prepared by Hiram Clawson, UCSC Genome browser engineering. hgdpGeo HGDP Allele Freq Human Genome Diversity Project SNP Population Allele Frequencies Variation and Repeats Description This track shows the 657,000 SNPs genotyped in 53 populations worldwide by the Human Genome Diversity Project in collaboration with the Centre d'Etude du Polymorphisme Humain (HGDP-CEPH). This track and several others are available from the HGDP Selection Browser. Methods Samples collected by the HGDP-CEPH from 1,043 individuals from around the world were genotyped for 657,000 SNPs at Stanford. Ancestral states for all SNPs were estimated using whole genome human-chimpanzee alignments from the UCSC database. For each SNP in the human genome (NCBI Build 35, UCSC database hg17), the allele at the corresponding position in the chimp genome (Build 2 version 1, UCSC database pantro2) was used as ancestral. Allele frequencies were plotted on a world map using programs included in the Generic Mapping Tools. Credits Thanks to the HGDP-CEPH, the Pritchard lab at Stanford University, Joe Pickrell and John Novembre for sharing the data and plotting scripts for this track. References Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al. A human genome diversity cell line panel. Science. 2002 Apr 12;296(5566):261-2. PMID: 11954565 Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4. PMID: 18292342 Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37. PMID: 19307593; PMC: PMC2675971 Wessel P, Smith WHF. New, improved version of Generic Mapping Tools released. EOS, Trans. Amer. Geophys. U. 1998;79(47):579. hgdpHzy HGDP Hetrzygsty Human Genome Diversity Project Smoothed Expected Heterozygosity on 7 Continents Variation and Repeats Description This track shows a 3-SNP moving average of p(1-p) where p is the major allele frequency (i.e. half of the expected heterozygosity) on seven continents, from SNPs genotyped in 53 populations worldwide by the Human Genome Diversity Project in collaboration with the Centre d'Etude du Polymorphisme Humain (HGDP-CEPH). This track and several others are available from the HGDP Selection Browser. Methods Samples collected by the HGDP-CEPH from 1,043 individuals from around the world were genotyped for 657,000 SNPs at Stanford. The 53 populations were divided into seven continental groups: Africa, Middle East, Europe, South Asia, East Asia, Oceania and the Americas. Allele frequencies were used to calculate p(1-p) for each SNP, and then a 3-SNP average was computed for each SNP and its two neighboring SNPs. The associated analysis tracks HGDP FST, HGP iHS, and HGDP XP-EHH (Pickrell et al.) did not make use of all African populations, but instead used only the Bantu populations because a more closely related group was desired for comparison with other continental groups. For this track, separate subtracks show the expected heterozygosity of all African populations and of only Bantu populations. Credits Thanks to the HGDP-CEPH and Joe Pickrell in the Pritchard lab at the University of Chicago for providing these data. References Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al. A human genome diversity cell line panel. Science. 2002 Apr 12;296(5566):261-2. hgdpHzyAmericas Hetzgty Americas Human Genome Diversity Proj Smoothd Expec Heterozygosity (Americas) Variation and Repeats hgdpHzyOceania Hetzgty Oceania Human Genome Diversity Proj Smoothd Expec Heterozygosity (Oceania) Variation and Repeats hgdpHzyEAsia Hetzgty E. Asia Human Genome Diversity Proj Smoothd Expec Heterozygosity (E. Asia) Variation and Repeats hgdpHzySAsia Hetzgty S. Asia Human Genome Diversity Proj Smoothd Expec Heterozygosity (S. Asia) Variation and Repeats hgdpHzyEurope Hetzgty Europe Human Genome Diversity Proj Smoothd Expec Heterozygosity (Europe) Variation and Repeats hgdpHzyMideast Hetzgty Mideast Human Genome Diversity Proj Smoothd Expec Heterozygosity (Mideast) Variation and Repeats hgdpHzyBantu Hetzgty Bantu Human Genome Diversity Proj Smoothd Expec Heterozygosity (Bantu pops. in Africa) Variation and Repeats hgdpHzyAfrica Hetzgty Africa Human Genome Diversity Proj Smoothd Expec Heterozygosity (Africa) Variation and Repeats hgdpIhs HGDP iHS Human Genome Diversity Project Integrated Haplotype Score on 7 Continents Variation and Repeats Description This track shows per-continent integrated haplotype score (iHS, Voight et al.), a measure of very recent positive selection. Scores were calculated using SNPs genotyped in 53 populations worldwide by the Human Genome Diversity Project in collaboration with the Centre d'Etude du Polymorphisme Humain (HGDP-CEPH). This track and several others are available from the HGDP Selection Browser. Methods Samples collected by the HGDP-CEPH from 1,043 individuals from around the world were genotyped for 657,000 SNPs at Stanford. The 53 populations were divided into seven continental groups: Africa (Bantu populations only), Middle East, Europe, South Asia, East Asia, Oceania and the Americas. Bantu populations in Africa were chosen instead of all African populations because a more closely related group was desired for comparison with other continental groups. iHS was then calculated for each population group using the program ihs (source code available) and then normalizing the resulting unstandardized iHS scores in derived allele frequency bins as described in (Voight et al.). Per-SNP iHS scores were smoothed in windows of 31 SNPs, centered on each SNP. The final score is -log10 of the proportion of smoothed scores higher than each SNP's smoothed score. Credits Thanks to the HGDP-CEPH and Joe Pickrell in the Pritchard lab at the University of Chicago for providing these data. References Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006 Mar;4(3):e72. Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al. A human genome diversity cell line panel. Science. 2002 Apr 12;296(5566):261-2. hgdpIhsAmericas iHS Americas Human Genome Diversity Project iHS (Americas) Variation and Repeats hgdpIhsOceania iHS Oceania Human Genome Diversity Project iHS (Oceania) Variation and Repeats hgdpIhsEAsia iHS E. Asia Human Genome Diversity Project iHS (East Asia) Variation and Repeats hgdpIhsSAsia iHS S. Asia Human Genome Diversity Project iHS (South Asia) Variation and Repeats hgdpIhsEurope iHS Europe Human Genome Diversity Project iHS (Europe) Variation and Repeats hgdpIhsMideast iHS Mideast Human Genome Diversity Project iHS (Mideast) Variation and Repeats hgdpIhsBantu iHS Bantu Human Genome Diversity Project iHS (Bantu populations in Africa) Variation and Repeats hgdpFst HGDP Smoothd FST Human Genome Diversity Project Smoothed Relative FST (Fixation Index) Variation and Repeats Description In this track, the value shown for each SNP is -log10 of the fraction of SNPs with a more extreme FST value than that SNP. Relative FST (also known as the Fixation index) values were calculated from SNPs genotyped in 53 populations worldwide by the Human Genome Diversity Project in collaboration with the Centre d'Etude du Polymorphisme Humain (HGDP-CEPH). This track and several others are available from the HGDP Selection Browser. From Wikipedia: Fixation index (FST) is a measure of population differentiation based on genetic polymorphism data, such as Single nucleotide polymorphisms (SNPs) or microsatellites. It is a special case of F-statistics, the concept developed in the 1920s by Sewall Wright. This statistic compares the genetic variability within and between populations and is frequently used in the field of population genetics. From http://www.uwyo.edu/dbmcd/popecol/Maylects/PopGenGloss.html: FST is the proportion of the total genetic variance contained in a subpopulation (the S subscript) relative to the total genetic variance (the T subscript). Values can range from 0 to 1. High FST implies a considerable degree of differentiation among populations. Methods Samples collected by the HGDP-CEPH from 1,043 individuals from around the world were genotyped for 657,000 SNPs at Stanford. The 53 populations were divided into seven continental groups: Africa, Middle East, Europe, South Asia, East Asia, Oceania and the Americas. FST was computed for all SNPs, and then each SNP's place in the empirical FST distribution was used to derive the scores shown in this track, -log10 of the fraction of SNPs with a more extreme FST value than that SNP. Credits Thanks to the HGDP-CEPH and Joe Pickrell in the Pritchard lab at the University of Chicago for providing these data. References Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al. A human genome diversity cell line panel. Science. 2002 Apr 12;296(5566):261-2. hgdpXpehh HGDP XP-EHH Human Genome Diversity Proj Cross-Pop Ext Haplo Homzgty (XP-EHH) on 7 Continents Variation and Repeats Description This track shows per-continent Cross Population Extended Haplotype Homozygosity (XP-EHH) score (Sabeti et al.), an estimate of positive selection that highlights SNPs that have approached or achieved fixation in a population but remain polymorphic in the human population as a whole. Scores were calculated using SNPs genotyped in 53 populations worldwide by the Human Genome Diversity Project in collaboration with the Centre d'Etude du Polymorphisme Humain (HGDP-CEPH). This track and several others are available from the HGDP Selection Browser. Methods Samples collected by the HGDP-CEPH from 1,043 individuals from around the world were genotyped for 657,000 SNPs at Stanford. The 53 populations were divided into seven continental groups: Africa (Bantu populations only), Middle East, Europe, South Asia, East Asia, Oceania and the Americas. Bantu populations in Africa were chosen instead of all African populations because a more closely related group was desired for comparison with other continental groups. XP-EHH was then calculated for each population group using the program xpehh (source code available) as described in Sabeti et al. Credits Thanks to the HGDP-CEPH and Joe Pickrell in the Pritchard lab at the University of Chicago for providing these data. References Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007 Oct 18;449(7164):913-8. Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li J, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009 May;19(5):826-37. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008 Feb 22;319(5866):1100-4. Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A et al. A human genome diversity cell line panel. Science. 2002 Apr 12;296(5566):261-2. hgdpXpehhAmericas XP-EHH Americas Human Genome Diversity Project XP-EHH (Americas) Variation and Repeats hgdpXpehhOceania XP-EHH Oceania Human Genome Diversity Project XP-EHH (Oceania) Variation and Repeats hgdpXpehhEAsia XP-EHH E. Asia Human Genome Diversity Project XP-EHH (East Asia) Variation and Repeats hgdpXpehhSAsia XP-EHH S. Asia Human Genome Diversity Project XP-EHH (South Asia) Variation and Repeats hgdpXpehhEurope XP-EHH Europe Human Genome Diversity Project XP-EHH (Europe) Variation and Repeats hgdpXpehhMideast XP-EHH Mideast Human Genome Diversity Project XP-EHH (Mideast) Variation and Repeats hgdpXpehhBantu XP-EHH Bantu Human Genome Diversity Project XP-EHH (Bantu populations in Africa) Variation and Repeats kiddEichlerDisc HGSV Discordant HGSV Discordant Clone End Alignments Variation and Repeats Description This track shows data from the Human Genome Structural Variation Project. Clone ends from nine individuals from Kidd, et al. were mapped to the reference Human genome. This track shows clones whose end mappings were discordant with the reference genome in one of the following ways: deletion: Clone mapping too large relative to reference insertion: Clone mapping too small relative to reference inversion: In appropriate orientation, clone mapping spans potential inversion breakpoint OEA: One End Anchored clones (only one end could be mapped to reference) transchrm: Clone ends map to different chromosomes (name indicates identity of other chromosome after the underscore). Each individual's discordant clone end mappings are in a different subtrack. The nine individuals' labels used in Kidd, et al., populations of origin, and Coriell Cell Repository catalog IDs are shown here: Individual Population Coriell ID ABC14CEPHNA12156 ABC13YorubaNA19129 ABC12CEPHNA12878 ABC11ChinaNA18555 ABC10YorubaNA19240 ABC9JapanNA18956 ABC8YorubaNA18507 ABC7YorubaNA18517 G248UnknownNA15510 Methods Excerpted from Kidd, et al.: We selected eight individuals as part of the first phase of the Human Genome Structural Variation Project. This included four individuals of Yoruba Nigerian ethnicity and four individuals of non-African ethnicity. For each individual we constructed a whole genomic library of about 1 million clones, using a fosmid subcloning strategy. Each library was arrayed and both ends of each clone insert were sequenced to generate a pair of high-quality end sequences (termed an end-sequence pair (ESP)). The overall approach generated a physical clone map for each individual human genome, flagging regions discrepant by size or orientation on the basis of the placement of end sequences against the reference assembly. Across all eight libraries, we mapped 6.1 million clones to distinct locations against the reference sequence (http://hgsv.washington.edu). Of these, 76,767 were discordant by length and/or orientation, indicating potential sites of structural variation. About 0.4% (23,742) of the ESPs mapped with only one end to the reference assembly despite the presence of high-quality sequence at the other end (termed one-end anchored (OEA) clones). Note: This track contains many more than the 76,767 + 23,742 items mentioned above because it also includes clones whose ends map to different chromosomes (transchrm). References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. kiddEichlerDiscG248 Discordant G248 HGSV Individual G248 Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc7 Discordant ABC7 HGSV Individual ABC7 (Yoruba) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc8 Discordant ABC8 HGSV Individual ABC8 (Yoruba) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc9 Discordant ABC9 HGSV Individual ABC9 (Japan) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc10 Discordant ABC10 HGSV Individual ABC10 (Yoruba) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc11 Discordant ABC11 HGSV Individual ABC11 (China) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc12 Discordant ABC12 HGSV Individual ABC12 (CEPH) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc13 Discordant ABC13 HGSV Individual ABC13 (Yoruba) Discordant Clone End Alignments Variation and Repeats kiddEichlerDiscAbc14 Discordant ABC14 HGSV Individual ABC14 (CEPH) Discordant Clone End Alignments Variation and Repeats hiSeqDepth Hi Seq Depth Regions of Exceptionally High Depth of Aligned Short Reads Mapping and Sequencing Description This track displays regions of the reference genome that have exceptionally high sequence depth, inferred from alignments of short-read sequences from the 1000 Genomes Project. These regions may be caused by collapsed repetitive sequences in the reference genome assembly; they also have high read depth in assays such as ChIP-seq, and may trigger false positive calls from peak-calling algorithms. Excluding these regions from analysis of short-read alignments should reduce such false positive calls. Methods Pickrell et al. downloaded sequencing reads for 57 Yoruba individuals from the 1000 Genomes Project's low-coverage pilot data, mapped them to the Mar. 2006 human genome assembly (NCBI36/hg18), computed the read depth for every base in the genome, and compiled a distribution of read depths. They then identified contiguous regions where read depth exceeded thresholds corresponding to the top 0.001, 0.005, 0.01, 0.05 and 0.1 of the per-base read depths, merging regions which fall within 50 bases of each other. The regions are available for download from http://eqtl.uchicago.edu/Masking/ (see the readme file). Credits Thanks to Joseph Pickrell at the University of Chicago for these data. References Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics. 2011 Aug 1;27(15):2144-6. Epub 2011 Jun 19. hiSeqDepthTop10Pct Top 0.10 Depth Top 0.10 of Read Depth Distribution Mapping and Sequencing hiSeqDepthTop5Pct Top 0.05 Depth Top 0.05 of Read Depth Distribution Mapping and Sequencing hiSeqDepthTop1Pct Top 0.01 Depth Top 0.01 of Read Depth Distribution Mapping and Sequencing hiSeqDepthTopPt5Pct Top 0.005 Depth Top 0.005 of Read Depth Distribution Mapping and Sequencing hiSeqDepthTopPt1Pct Top 0.001 Depth Top 0.001 of Read Depth Distribution Mapping and Sequencing wgEncodeHudsonalphaRnaSeq HudsonAlpha RNA-seq ENCODE HudsonAlpha RNA-seq Expression Description This track is produced as part of the ENCODE Project. This track shows short tag sequencing of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs such ELAND (Illumina) or Bowtie (Langmead et al., 2009). RNA-seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing, which was done here on an Illumina Genome Analyzer (GA2) (Mortazavi et al., 2008). The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The "randomly primed" reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: RPKM RefSeq gene models are displayed shaded by their RPKM (Reads Per Kilobase of exon per Million reads) value. RPKM is reported in the score of each element, and each element is shaded using a gray scale that becomes darker as RPKM increases. The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Alignments The Alignments view shows reads mapped to the genome. Alignments are colored by cell type. Methods Gene expression is measured in Reads Per Kilobase exon per Million reads (RPKM; Mortazavi et al., 2008). RNA-seq reads are aligned to RefSeq gene models. RPKM is then calculated by dividing the total number of reads that align to the gene model (RefSeq) by the size of the spliced transcript in kilobases. This number is then divided by the total number of reads in millions for the experiment. For example, if x reads align to a RefSeq gene whose spliced transcript is y kb in size and there are z million reads in the experiment, then RPKM = x/(y*z). Cells were grown according to the approved ENCODE cell culture protocols. A total of 2 X 107 cells were lysed in either 4mls of RLT buffer (Qiagen RNEasy kit), and processed on 2 RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNAse digestion step to remove residual genomic DNA. 75 �g of total RNA was selected twice with oligodT beads (Dynal) according to the manufacturer's protocol to isolate mRNA from each of the preparations. 100 ng of mRNA was then processed according to the protocol in Mortazavi et al. (2008), and prepared for sequencing on the Genome Analyzer flow cell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). Following alignment of the sequence reads to the genome assembly as described above, the sequence reads were further analyzed using the ERANGE 3.0 software package, which quantifies the number of reads falling within the mapped boundaries of known transcripts from the Gencode annotations. ERANGE assigns both genomically unique reads and reads that occur in 2-10 genomic locations for quantification. Verification Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads. Known spliced exons are detected at the expected frequency for transcripts of given abundance. RT-QCPR confirms expression measurements with r > 0.8 Credits Myers Group: Florencia Pauli, Tim Reddy Wold Group: Ali Mortazavi, Brian Williams, Diane Trout, Brandon King, Ken McCue, Lorian Schaeffer. Illumina gene expression group: Gary Schroth, Shujun Luo, Eric Vermaas. Contacts: Tim Reddy and Flo Pauli (experimental). References Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-Seq Nature Methods. 2008 Jul; 5(7):621-628. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology. 2009 Mar; 10:R25. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeHudsonalphaRnaSeqViewRPKM RPKM ENCODE HudsonAlpha RNA-seq Expression wgEncodeHudsonalphaRnaSeqRPKMRep2JurkatCellPapBow10R1x25 Jurk none pA+ S2 Jurkat RnaSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 315 Myers HudsonAlpha SL317 cell bow10 1x25 2 longPolyA wgEncodeHudsonalphaRnaSeqRPKMRep2JurkatCellPapBow10R1x25 None RPKM T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell bowtie v0.10.0 Single 25 nt reads Poly(A)+ RNA longer than 200 nt ENCODE HudsonAlpha RNA-seq PolyA+ Jurkat w/None Rep 2 1x25 RPKM Expression wgEncodeHudsonalphaRnaSeqRPKMRep1JurkatCellPapBow10R1x25 Jurk none pA+ S1 Jurkat RnaSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 315 Myers HudsonAlpha SL316 cell bow10 1x25 1 longPolyA wgEncodeHudsonalphaRnaSeqRPKMRep1JurkatCellPapBow10R1x25 None RPKM T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell bowtie v0.10.0 Single 25 nt reads Poly(A)+ RNA longer than 200 nt ENCODE HudsonAlpha RNA-seq PolyA+ Jurkat w/None Rep 1 1x25 RPKM Expression wgEncodeHudsonalphaRnaSeqRPKMRep2A549CellPapErng3R1x36Etoh02 A549 etoh pA+ S2 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 314 Myers HudsonAlpha SL333 (Control) cell erng3 1x36 2 longPolyA wgEncodeHudsonalphaRnaSeqRPKMRep2A549CellPapErng3R1x36Etoh02 EtOH_0.02pct RPKM epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 0.02% Ethanol (Myers) ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/EtOH 0.02pct Rep 2 1x36 RPKM Expression wgEncodeHudsonalphaRnaSeqRPKMRep2A549CellPapErng3R1x36Dexa A549 dex pA+ S2 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 313 Myers HudsonAlpha SL332 cell erng3 1x36 2 longPolyA MACS wgEncodeHudsonalphaRnaSeqRPKMRep2A549CellPapErng3R1x36Dexa DEX_100nM RPKM epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 100 nM Dexamethasone (Myers) ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/DEX 100nM Rep 2 1x36 RPKM Expression wgEncodeHudsonalphaRnaSeqRPKMRep1A549CellPapErng3R1x36Etoh02 A549 etoh pA+ S1 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 314 Myers HudsonAlpha SL331 (Control) cell erng3 1x36 1 longPolyA wgEncodeHudsonalphaRnaSeqRPKMRep1A549CellPapErng3R1x36Etoh02 EtOH_0.02pct RPKM epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 0.02% Ethanol (Myers) ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/EtOH 0.02pct Rep 1 1x36 RPKM Expression wgEncodeHudsonalphaRnaSeqRPKMRep1A549CellPapErng3R1x36Dexa A549 dex pA+ S1 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 313 Myers HudsonAlpha SL330 cell erng3 1x36 1 longPolyA MACS wgEncodeHudsonalphaRnaSeqRPKMRep1A549CellPapErng3R1x36Dexa DEX_100nM RPKM epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 100 nM Dexamethasone (Myers) ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/DEX 100nM Rep 1 1x36 RPKM Expression wgEncodeHudsonalphaRnaSeqViewAligns Aligns ENCODE HudsonAlpha RNA-seq Expression wgEncodeHudsonalphaRnaSeqAlignsRep2JurkatCellPapBow10R1x25 Jurk none pA+ A2 Jurkat RnaSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 315 Myers HudsonAlpha SL317 cell bow10 1x25 2 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep2JurkatCellPapBow10R1x25 None Alignments T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell bowtie v0.10.0 Single 25 nt reads Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ Jurkat w/None Rep 2 1x25 Aligns Expression wgEncodeHudsonalphaRnaSeqAlignsRep1JurkatCellPapBow10R1x25 Jurk none pA+ A1 Jurkat RnaSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 315 Myers HudsonAlpha SL316 cell bow10 1x25 1 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep1JurkatCellPapBow10R1x25 None Alignments T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell bowtie v0.10.0 Single 25 nt reads Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ Jurkat w/None Rep 1 1x25 Aligns Expression wgEncodeHudsonalphaRnaSeqAlignsRep2A549CellPapErng3R1x36Etoh02 A549 etoh pA+ A2 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 314 Myers HudsonAlpha SL333 (Control) cell erng3 1x36 2 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep2A549CellPapErng3R1x36Etoh02 EtOH_0.02pct Alignments epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 0.02% Ethanol (Myers) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/EtOH 0.02pct Rep 2 1x36 Aligns Expression wgEncodeHudsonalphaRnaSeqAlignsRep2A549CellPapErng3R1x36Dexa A549 dex pA+ A2 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 313 Myers HudsonAlpha SL332 cell erng3 1x36 2 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep2A549CellPapErng3R1x36Dexa DEX_100nM Alignments epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 100 nM Dexamethasone (Myers) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/DEX 100nM Rep 2 1x36 Aligns Expression wgEncodeHudsonalphaRnaSeqAlignsRep1A549CellPapErng3R1x36Etoh02 A549 etoh pA+ A1 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 314 Myers HudsonAlpha SL331 (Control) cell erng3 1x36 1 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep1A549CellPapErng3R1x36Etoh02 EtOH_0.02pct Alignments epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 0.02% Ethanol (Myers) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/EtOH 0.02pct Rep 1 1x36 Aligns Expression wgEncodeHudsonalphaRnaSeqAlignsRep1A549CellPapErng3R1x36Dexa A549 dex pA+ A1 A549 RnaSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-06 313 Myers HudsonAlpha SL330 cell erng3 1x36 1 longPolyA wgEncodeHudsonalphaRnaSeqAlignsRep1A549CellPapErng3R1x36Dexa DEX_100nM Alignments epithelial cell line derived from a lung carcinoma tissue. (PMID: 175022), "This line was initiated in 1972 by D.J. Giard, et al. through explant culture of lung carcinomatous tissue from a 58-year-old caucasian male." - ATCC, newly promoted to tier 2: not in 2011 analysis Sequencing analysis of RNA expression Myers Myers - Hudson Alpha Institute for Biotechnology Whole cell erange v3.0 Single 36 nt reads Poly(A)+ RNA longer than 200 nt 1 h with 100 nM Dexamethasone (Myers) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE HudsonAlpha RNA-seq PolyA+ A549 w/DEX 100nM Rep 1 1x36 Aligns Expression est Human ESTs Human ESTs Including Unspliced mRNA and EST Description This track shows alignments between human expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. NOTE: As of April, 2007, we no longer include GenBank sequences that contain the following URL as part of the record: http://fulllength.invitrogen.com Some of these entries are the result of alignment to pseudogenes, followed by "correction" of the EST to match the genomic sequence. It is therefore not the sequence of the actual EST and makes it appear that the EST is transcribed. Invitrogen no longer sells the clones. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, click here. Several types of alignment gap may also be colored; for more information, click here. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, human ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. Kent WJ. BLAT - The BLAST-Like Alignment Tool. Genome Res. 2002 Apr;12(4):656-64. darned Human RNA Editing Human RNA Editing from the DAtabase of RNa EDiting mRNA and EST Description This track provides information on RNA nucleotides that are edited after transcription and their corresponding genomic coordinates. Only post-transcriptional editing that results in small changes to the identity of a nucleic acid are included in this track; it does not include other RNA processing such as splicing or methylation. The track contains information on A-to-I (adenosine-to-inosine) and C-to-U (cytidine-to-uridine) editing that occur due to deamination by ADAR and APOBEC enzymes, respectively. Most of the data in this track are on A-to-I editing, which is known to be highly abundant in humans. Display Track items are colored depending on their occurrence within RNA transcripts: Dark Green: 5' UTR Blue: CDS Red: Intron Deep Pink: 3' UTR Black: Other (exon/intron status is unclear or unknown) Methods The data were obtained from several research papers on RNA editing and were mapped to the reference genome. More information can be obtained from DARNED database. References: Kiran A, Baranov PV. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics. 2010 Jul 15;26(14):1772-6. PMID: 20547637 illuminaProbes Illumina WG-6 Alignments of Illumina WG-6 3.0 Probe Set Expression Description This track displays the probes from the Illumina WG-6 3.0 BeadChip. The WG-6 BeadChip contains probes for the following set of RNA transcripts: Probe sourceNumber of probes Number of unique probe sources RefSeq NM (well-established coding transcript)27,454 22,435 RefSeq XM (provisional coding transcript)7,870 7,518 RefSeq NR (well-established non-coding transcript)446 358 RefSeq XR (provisional non-coding transcript)196 190 UniGene ESTs12,83712,837 TOTAL48,80343,338 Display The track shows the location of the probes on the genome after the RNAs they correspond to were all aligned to the genome using BLAT. Alignment scores range from 0 to 1000, where 1000 is a perfect score. In the display, darker browns are for higher-scoring alignments. Click on a probe track item to see detailed information about that probe ID. View the base-by-base alignment for that probe by clicking the "View Alignment" link on the details page. Methods The probe set was collected from the NCBI GEO (Gene Expression Omnibus), and the 43,338 RNA sequences were collected from Genbank using NCBI's EUtils interface to Entrez. These RNAs were aligned to the genome using BLAT, and 43,224 of them aligned well to 46,432 locations on the genome. The single best alignment was used, except in 1,789 cases where the RNA mapped equally well to two or more locations. The probes were then aligned to their respective RNAs using BLAT, and if a good alignment resulted, the probe was then mapped through to the genome using the combination of the probe-on-RNA and the RNA-on-genome alignments. Of the 48,803 original probes, 40,852 map well through this procedure to 44,163 locations on the genome. nestedRepeats Interrupted Rpts Fragments of Interrupted Repeats Joined by RepeatMasker ID Variation and Repeats Description This track shows joined fragments of interrupted repeats extracted from the output of the RepeatMasker program which screens DNA sequences for interspersed repeats and low complexity DNA sequences using the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. The detailed annotations from RepeatMasker are in the RepeatMasker track. This track shows fragments of original repeat insertions which have been interrupted by insertions of younger repeats or through local rearrangements. The fragments are joined using the ID column of RepeatMasker output. Note that this track was created using a version of RepeatMasker from Nov. 2005 along with Repbase Update 9.11. In hg18, there is also an Intr Rpts 3.2.7 track which was created in 2009 using a newer version of RepeatMasker and Repbase Update. All of the hg18 tracks are based upon this original track and not upon the newer Intr Rpts 3.2.7 track. Display Conventions and Configuration In pack or full mode, each interrupted repeat is displayed as boxes (fragments) joined by horizontal lines, labeled with the repeat name. If all fragments are on the same strand, arrows are added to the horizontal line to indicate the strand. In dense or squish mode, labels and arrows are omitted and in dense mode, all items are collapsed to fit on a single row. Items are shaded according to the average identity score of their fragments. Usually, the shade of an item is similar to the shades of its fragments unless some fragments are much more diverged than others. The score displayed above is the average identity score, clipped to a range of 50% - 100% and then mapped to the range 0 - 1000 for shading in the browser. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. nestedRepeatsRM327 Intr Rpts 3.2.7 Fragments of Interrupted Repeats Joined by RepeatMasker ID (RM version 3.2.7) Variation and Repeats Description This track shows joined fragments of interrupted repeats extracted from the output of a more recent version (3.2.7, Jan. 2009) of the RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences using the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka, J. (2000) in the References section below. The detailed annotations from RepeatMasker are in the RepMask 3.2.7 track. This track shows fragments of original repeat insertions which have been interrupted by insertions of younger repeats or through local rearrangements. The fragments are joined using the ID column of RepeatMasker output. Interrupted repeats from the original RepeatMasker run have been kept in the Interrupted Rpts track in order to avoid disrupting any analyses performed on the original run's results. Display Conventions and Configuration In pack or full mode, each interrupted repeat is displayed as boxes (fragments) joined by horizontal lines, labeled with the repeat name. If all fragments are on the same strand, then arrows are added to the horizontal line to indicate strand. In dense or squish mode, labels and arrows are omitted, and in dense mode, all items are collapsed to fit on a single row. Items are shaded according to the average identity score of their fragments. Usually, the shade of an item is similar to the shades of its fragments, unless some fragments are much more diverged than others. The score displayed above is the average identity score, clipped to a range of 50% - 100%, and then mapped to the range 0 - 1000 for shading in the browser. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit, AFA, Hubley, R and Green, P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2007. Repbase Update is described in Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6): 657-63. Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. encodeRna Known+Pred RNA Known and Predicted RNA Transcription in the ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows the locations of known and predicted non-protein-coding RNA genes and pseudogenes that fall within the ENCODE regions. It contains all information in Sean Eddy's RNA Genes track for these regions, combined with computational predictions generated by Jakob Skou Pedersen's EvoFold algorithm. In addition to the fields contained in the RNA Genes track, this track also includes ENCODE-related fields describing overlap with transcribed regions and repeats. Feature types in this annotation include: tRNA: transfer RNA (or pseudogene) rRNA: ribosomal RNA (or pseudogene) scRNA: small cytoplasmic RNA (or pseudogene) snRNA: small nuclear RNA (or pseudogene) snoRNA: small nucleolar RNA (or pseudogene) miRNA: microRNA (or pseudogene) misc_RNA: miscellaneous other RNA, such as Xist (or pseudogene) "-": unknown RNA Display Conventions and Configuration The locations of the RNA genes and pseudogenes are represented by blocks in the graphical display, color-coded as follows: Black: region is Repeatmasked. Green: region is transcribed. Red: region is from the RNA Genes track and is not transcribed. Blue: region is an EvoFold prediction and is not transcribed. The display may be filtered to show only those items with unnormalized scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page. Methods The RNA Genes track was supplemented with EvoFold predictions and filtered to include only those items that lie within the ENCODE regions. Regions that are at least 10 percent Repeatmasked are flagged because no transcriptional data is available for them. A region is considered transcribed if at least 10 percent overlaps with any Affymetrix transcribed fragment (transfrag), derived from six microarray experiments, or Yale transcriptionally-active region (TAR), derived from 15 microarray experiments. In these cases, each array from which the overlapped transfrags and TARs were derived is listed. EvoFold is a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. The method makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist both of a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions. Two sets of EvoFold predictions are included in this track. The first, labeled EvoFold, contains predictions based on the conserved elements of an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish, and Fugu assemblies. The second set of predictions, TBA23_EvoFold, was based on the conserved elements of the 23-way TBA alignments present in the ENCODE regions. When a pair of these predictions overlap, only the EvoFold prediction is shown. Credits These data were kindly provided by Sean Eddy at Washington University, Jakob Skou Pedersen at UC Santa Cruz, and The Encode Consortium. This annotation track was generated by Matt Weirauch. References Knudsen, B. and J.J. Hein. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6), 446-54 (1999). Pedersen, J.S., Bejerano, G. and Haussler, D. Identification and classification of conserved RNA secondary structures in the human genome. (In preparation). laminB1 LaminB1 (Tig3) NKI LaminB1 DamID Map (log2-ratio scores, Tig3 cells) Regulation Description Please see the NKI Nuc Lamina "super-track" link above for description and methods. laminB1Super NKI Nuc Lamina NKI Nuclear Lamina Associated Domains (LaminB1 DamID) Regulation Overview Model of chromosome organization in interphase, summarizing the main results presented in this paper. Large, discrete chromosomal domains are dynamically associated (double arrows) with the nuclear lamina, and demarcated by putative insulator elements that include CTCF binding sites, promoters that are oriented away from the lamina, and CpG islands (Fig. S1, Guelen et al., 2008). The architecture of human chromosomes in interphase nuclei is still largely unknown. Microscopy studies have indicated that specific regions of chromosomes are located in close proximity to the nuclear lamina (NL, a dense fibrillar network associated with the inner face of the nuclear envelope). This has led to the idea that certain genomic elements may be attached to the NL, which may contribute to the spatial organization of chromosomes inside the nucleus. This track represents a high-resolution map of genome-NL interactions in human Tig3 lung fibroblasts, as determined by the DamID technique. NKI LaminB1 track The LaminB1 track shows a high resolution map of the interaction sites of the entire genome with Lamin B1, (a key NL component) in human fibroblasts. This map shows that genome-lamina interactions occur through more than 1,300 sharply defined large domains 0.1-10 megabases in size. Microscopy evidence indicates that most of these domains are preferentially located at nuclear periphery. These lamina associated domains (LADs) are characterized by low gene-expression levels, indicating that LADs represent a repressive chromatin environment. The borders of LADs are demarcated by the insulator protein CTCF, by promoters that are oriented away from LADs, or by CpG islands, suggesting possible mechanisms of LAD confinement. Taken together, these results demonstrate that the human genome is divided into large, discrete domains that are units of chromosome organization within the nucleus (see Guelen et al., 2008). NKI LADs track The LADs track shows Lamina Associated Domains, or LADs, based on a genome-wide DamID profile of LaminB1 (above). For the definition of LADs, the full-genome lamin B1 DamID data set was binarized by setting tiling array probes with positive DamID log ratios to 1 and otherwise to 21. Next, a two-step algorithm was used to identify LADs. First, sharp transitions were identified with a sliding edge filter, which calculates the difference in average binary values in two windows of 99 neighbouring probes immediately left and right of a queried probe. The cutoff for this difference was chosen such that the number of edges detected in randomly permuted data sets was less than 5% of the number of edges detected in the original lamin B1 data set. Second, pairs of adjacent 'left' and 'right' edges were identified that together enclosed a region of arbitrary size with at least 70% of the enclosed probes reporting a positive log2 ratio. A total of 1,344 regions fulfilled these criteria and were termed LADs. In 20 randomly permuted data sets, fewer than 13 domains were identified by the same criteria. Note that there are also lamin-B1-positive domains flanked by one or two gradual or irregular transitions. Because it is difficult to define the borders of such domains precisely, these 'fuzzy' domains are not analyzed here. (see Guelen et al., 2008). Display Conventions and Configuration The LaminB1 wiggle track values range from -6.602 to 5.678 and were normalized so have a median of 0 and standard deviation of 1.037. The default vertical viewing range for the wiggle track was chosen from -2 to 2 because this is roughly +/- 2 standard deviations. For an example region see genomic location: chr4:35,000,001-45,000,000 (Fig 1, Guelen et al., 2008). Methods The DamID technique was applied to generate a high-resolution map of NL interactions for the entire human genome. DamID is based on targeted adenine methylation of DNA sequences that interact in vivo with a protein of interest. DamID was performed with lentiviral transduction as described (Guelen et al., 2008). In short, a fusion protein consisting of Escherichia coli DNA adenine methyltransferase (Dam) fused to human LaminB1 was introduced into cultured Tig3 human lung fibroblasts. Dam methylates adenines in the sequence GATC, a mark absent in most eukaryotes. Here, the LaminB1-Dam fusion protein incorporates in the nuclear lamina, as verified by immunofluorescence staining. Hence, the sequences near the nuclear lamina are marked with a unique methylation tag. The adenine methylation pattern was detected with genomic tiling arrays. Unfused Dam was used as a reference (http://research.nki.nl/vansteensellab/DamID.htm). The data shown are the log2-ratio of LaminB1-Dam fusion protein over Dam-only. Sample labelling and hybridizations were performed by NimbleGen Inc., on a set of 8 custom-designed oligonucleotide arrays, with a median probe spacing of ~750 bp. All probes recognize unique (non-repetitive) sequences. The raw data was log2 transformed and loess normalized. Between array median/scale normalization was based on 6979 probes common to all arrays. Replicate arrays were averaged and the full data set normalized to genome-wide median. Verification The data are based on two independent biological replicates. Fluorescence in situ hybridization microscopy confirmed that most of the LaminB1 associated regions are preferentially located at the nuclear periphery. The array platform, the raw and normalized data have been deposited at the NCBI Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE8854. Credits The data for this track were generated by Lars Guelen, Ludo Pagie, and Bas van Steensel at the Van Steensel Lab, Netherlands Cancer Institute. References Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008 Jun 12;453(7197):948-51. PMID: 18463634 ctgPos Map Contigs Physical Map Contigs Mapping and Sequencing Description This track shows the locations of human contigs on the physical map. The underlying data is derived from the NCBI seq_contig.md file that accompanies this assembly. All contigs are "+" oriented in the assembly. Methods For human genome reference sequences dated April 2003 and later, the individual chromosome sequencing centers are responsible for preparing the assembly of their chromosomes in AGP format. The files provided by these centers are checked and validated at NCBI, and form the basis for the seq_contig.md file that defines the physical map contigs. For more information on the human genome assembly process, see The NCBI Handbook. wgEncodeMapability Mappability Mappability or Uniqueness of Reference Genome Mapping and Sequencing Description These tracks display the level of sequence uniqueness of the reference NCBI36/hg18 genome assembly. They were generated using different window sizes, and high signal will be found in areas where the sequence is unique. Methods The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marked as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks. The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks. The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project. The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project. The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project. The CRG Alignability tracks display how uniquely k-mer sequences align to a region of the genome. To generate the data, the GEM-mappability program has been employed. The method is equivalent to mapping sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts to produce these tracks) back to the genome using the GEM mapper aligner (up to 2 mismatches were allowed in this case). For each window, a mapability score was computed (S = 1/(number of matches found in the genome): S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on). The CRG Alignability tracks were generated independently of the ENCODE project, in the framework of the GEM (GEnome Multitool) project. Credits The Broad alignability track was created by the Broad Institute. Data generation and analysis was supported by funds from the NHGRI (the ENCODE project), the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute. The Duke uniqueness and Duke excluded regions tracks were created by Terry Furey and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support. The Rosetta uniqueness track was created by John Castle, at Rosetta Inpharmatics (Merck), with assistance from Melissa Cline at UCSC. The UMass uniqueness track was created by Bryan Lajoie in Job Dekker's Lab at the University of Massachusetts Medical School. Funding Support: NIH grant HG003143 to JD. Keck Distinguished Young Scholar Award to JD. This track was generated as part of the ENCODE project funded by the NHGRI. The CRG Alignability track was created by Thomas Derrien and Paolo Ribeca in Roderic Guigo's lab at the Centre for Genomic Regulation (CRG), Barcelona, Spain. Thomas Derrien was supported by funds from NHGRI for the ENCODE project, while Paolo Ribeca was funded by a Consolider grant CDS2007-00050 from the Spanish Ministerio de Educación y Ciencia. References Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377. Data Release Policy Data users may freely use all data in this track. ENCODE labs that contributed annotations have exempted the data displayed here from the ENCODE data release policy restrictions. wgEncodeMapabilityViewUUNIQ UMass Uniqueness Mappability or Uniqueness of Reference Genome Mapping and Sequencing wgEncodeUmassMapabilityUniq15 Umass Uniq 15 Mapability ENCODE July 2009 Freeze 326 Dekker UMass-Dekker wgEncodeUmassMapabilityUniq15 Uniqueness Short Read Mappability Dekker Dekker - University of Massachusettes Displays how unique is each sequence on the positive strand starting at a particular base and of a particular length. The score of 1 represents completely unique, 0.5 occurs exactly twice, 0.33 three times, 0.25 four times and 0 occurs > 4 times (excluding chrN_random and alternative haplotypes). The Duke uniqueness tracks were generated for the ENCODE project. Mappability - ENCODE UMass Uniqueness at 15bp Mapping and Sequencing wgEncodeMapabilityViewRUNIQ Rosetta Uniqueness Mappability or Uniqueness of Reference Genome Mapping and Sequencing uniqueness Rosetta Uniq 35 Mappability - Rosetta Uniqueness 35-mer Alignment (BWA/MAQ, unique alignment=37) Mapping and Sequencing Description This track shows unique regions in the genome, as identified by the genomic alignment of 35-nucleotide segments. Methods Sequence 'tiles' of 35 nucleotides in length were generated from the genome. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). Credits This track was created by John Castle at Rosetta (Merck), with assistance from Melissa Cline at UCSC. References (to appear) wgEncodeMapabilityViewDUNIQ Duke Uniqueness Mappability or Uniqueness of Reference Genome Mapping and Sequencing wgEncodeDukeUniqueness35bp Duke Uniq 35 Mapability ENCODE Nov 2008 Freeze 325 Crawford Duke 1.0 - 4 or less wgEncodeDukeUniqueness35bp Uniqueness Short Read Mappability Crawford Crawford - Duke University Displays how unique is each sequence on the positive strand starting at a particular base and of a particular length. The score of 1 represents completely unique, 0.5 occurs exactly twice, 0.33 three times, 0.25 four times and 0 occurs > 4 times (excluding chrN_random and alternative haplotypes). The Duke uniqueness tracks were generated for the ENCODE project. Mappability - ENCODE Duke Uniqueness of 35bp sequences Mapping and Sequencing wgEncodeDukeUniqueness24bp Duke Uniq 24 Mapability ENCODE Nov 2008 Freeze 324 Crawford Duke 1.0 - 4 or less wgEncodeDukeUniqueness24bp Uniqueness Short Read Mappability Crawford Crawford - Duke University Displays how unique is each sequence on the positive strand starting at a particular base and of a particular length. The score of 1 represents completely unique, 0.5 occurs exactly twice, 0.33 three times, 0.25 four times and 0 occurs > 4 times (excluding chrN_random and alternative haplotypes). The Duke uniqueness tracks were generated for the ENCODE project. Mappability - ENCODE Duke Uniqueness of 24bp sequences Mapping and Sequencing wgEncodeDukeUniqueness20bp Duke Uniq 20 Mapability ENCODE Nov 2008 Freeze 323 Crawford Duke 1.0 - 4 or less wgEncodeDukeUniqueness20bp Uniqueness Short Read Mappability Crawford Crawford - Duke University Displays how unique is each sequence on the positive strand starting at a particular base and of a particular length. The score of 1 represents completely unique, 0.5 occurs exactly twice, 0.33 three times, 0.25 four times and 0 occurs > 4 times (excluding chrN_random and alternative haplotypes). The Duke uniqueness tracks were generated for the ENCODE project. Mappability - ENCODE Duke Uniqueness of 20bp sequences Mapping and Sequencing wgEncodeMapabilityViewXR Duke Excluded Regions Mappability or Uniqueness of Reference Genome Mapping and Sequencing wgEncodeDukeRegionsExcluded Excluded Regions Mapability ENCODE Nov 2008 Freeze 322 Crawford Duke satellite_rna_chrM_500.bed.20080925 wgEncodeDukeRegionsExcluded Excludable Short Read Mappability Crawford Crawford - Duke University Genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling, problematic regions for short sequence tag signal detection (such as satellites and rRNA genes) Mappability - ENCODE Duke Excluded Regions Mapping and Sequencing wgEncodeMapabilityViewCRGMAP CRG GEM Alignability Mappability or Uniqueness of Reference Genome Mapping and Sequencing wgEncodeCrgMapabilityAlign100mer CRG Align 100 Mapability ENCODE January 2010 Freeze 317 Gingeras CRG-Guigo wgEncodeCrgMapabilityAlign100mer Alignability Short Read Mappability Gingeras Guigo - CGR, Barcelona Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - CRG GEM Alignability of 100mers with no more than 2 mismatches Mapping and Sequencing wgEncodeCrgMapabilityAlign75mer CRG Align 75 Mapability ENCODE January 2010 Freeze 321 Gingeras CRG-Guigo wgEncodeCrgMapabilityAlign75mer Alignability Short Read Mappability Gingeras Guigo - CGR, Barcelona Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - CRG GEM Alignability of 75mers with no more than 2 mismatches Mapping and Sequencing wgEncodeCrgMapabilityAlign50mer CRG Align 50 Mapability ENCODE January 2010 Freeze 320 Gingeras CRG-Guigo wgEncodeCrgMapabilityAlign50mer Alignability Short Read Mappability Gingeras Guigo - CGR, Barcelona Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - CRG GEM Alignability of 50mers with no more than 2 mismatches Mapping and Sequencing wgEncodeCrgMapabilityAlign40mer CRG Align 40 Mapability ENCODE January 2010 Freeze 319 Gingeras CRG-Guigo wgEncodeCrgMapabilityAlign40mer Alignability Short Read Mappability Gingeras Guigo - CGR, Barcelona Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - CRG GEM Alignability of 40mers with no more than 2 mismatches Mapping and Sequencing wgEncodeCrgMapabilityAlign36mer CRG Align 36 Mapability ENCODE January 2010 Freeze 318 Gingeras CRG-Guigo wgEncodeCrgMapabilityAlign36mer Alignability Short Read Mappability Gingeras Guigo - CGR, Barcelona Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - CRG GEM Alignability of 36mers with no more than 2 mismatches Mapping and Sequencing wgEncodeMapabilityViewALN Broad Alignability Mappability or Uniqueness of Reference Genome Mapping and Sequencing wgEncodeBroadMapabilityAlign36mer Broad Align 36 Mapability ENCODE July 2009 Freeze 316 Bernstein Broad wgEncodeBroadMapabilityAlign36mer Alignability Short Read Mappability Bernstein Bernstein - Broad Institute Displays how uniquely k-mer sequences align to a region of the genome. The GEM mapper (GEnome Multitool, CRG) maps sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts) allowing up to 2 mismatches. Mappability scores were computed as S = 1/(number of matches found in the genome). The CRG Alignability tracks were generated independently of the ENCODE project. Mappability - ENCODE Broad Alignability of 36mers with no more than 2 mismatches Mapping and Sequencing mgcFullMrna MGC Genes Mammalian Gene Collection Full ORF mRNAs Genes and Gene Predictions Description This track shows alignments of human mRNAs from the Mammalian Gene Collection (MGC) having full-length open reading frames (ORFs) to the genome. The goal of the Mammalian Gene Collection is to provide researchers with unrestricted access to sequence-validated full-length protein-coding cDNA clones for human, mouse, and rat genes. Display Conventions and Configuration The track follows the display conventions for gene prediction tracks. An optional codon coloring feature is available for quick validation and comparison of gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Methods GenBank human MGC mRNAs identified as having full-length ORFs were aligned against the genome using blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 1% of the best and at least 95% base identity with the genomic sequence were kept. Credits The human MGC full-length mRNA track was produced at UCSC from mRNA sequence data submitted to GenBank by the Mammalian Gene Collection project. References Mammalian Gene Collection project references. Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 jaxQtlMapped MGI Mouse QTL MGI Mouse Quantitative Trait Loci Coarsely Mapped to Human Phenotype and Disease Associations Description This track shows Mouse quantitative trait loci (QTLs) from Mouse Genome Informatics (MGI) at the Jackson Laboratory that have been coarsely mapped by UCSC to the Human genome using stringently filtered cross-species alignments. A quantitative trait locus (QTL) is a polymorphic locus that contains alleles which differentially affect the expression of a continuously distributed phenotypic trait. Usually a QTL is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. To map the Mouse QTLs to Human, UCSC's chained and netted blastz alignments of Mouse to Human were filtered to retain only those with minimum length of 20,000 bases in both Mouse and Human, and minimum score of 10,000. This removed many valid-but-short alignments. This choice was made because QTLs in general are extremely large and approximate regions. After the alignment filtering, UCSC's liftOver program was used to map Mouse regions to Human via the filtered alignments. For the purpose of cross-species mapping, MGI QTLs were divided into two categories: QTLs whose genomic coordinates span the entire confidence interval (often several million bases), and QTLs for which only the STS marker with the peak score was given, resulting in genomic coordinates for very small regions (most less than 300 bases). QTLs in the latter set were so small as to make mapping impossible in many cases, so their coordinates were padded by 50,000 bases before and after, for a total size of approximately 100,000 bases, a conservative proxy for the unknown confidence interval. The two categories of QTL are displayed in subtracks: MGI Mouse QTL for the unmodified QTLs and MGI Mouse QTL Padded for the single-marker QTLs that were padded to 100,000 bases. To get a sense of how many genomic rearrangments between Mouse and Human are in the region of a particular Mouse QTL, you may want to view the Human Nets track in the Mouse Feb. 2006 (NCBI36/mm8) genome browser. In the position/search box, enter the name of the Mouse QTL of interest. Credits Thanks to MGI at the Jackson Laboratory, and Bob Sinclair in particular, for providing these data. jaxQtlPadded MGI Mouse QTL Padded MGI Mouse QTL Peak-Score Markers Padded to 100k and Coarsely Mapped to Human Phenotype and Disease Associations jaxQtlAsIs MGI Mouse QTL MGI Mouse QTLs Coarsely Mapped to Human Phenotype and Disease Associations microsat Microsatellite Microsatellites - Di-nucleotide and Tri-nucleotide Repeats Variation and Repeats Description This track displays regions that are likely to be useful as microsatellite markers. These are sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats and tend to be highly polymorphic in the population. Methods The data shown in this track are a subset of the Simple Repeats track, selecting only those repeats of period 2 and 3, with 100% identity and no indels and with at least 15 copies of the repeat. The Simple Repeats track is created using the Tandem Repeats Finder. For more information about this program, see Benson (1999). Credits Tandem Repeats Finder was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 denisovaModernHumans Modern Human Seq Alignments of Sequence Reads from 7 Humans Denisova Assembly and Analysis Description The Modern Human Seq track shows human sequence reads of seven individuals mapped to the human genome. The purpose of this track is to put the divergence of the Denisova genome into perspective with regard to present-day humans. Methods DNA was obtained for each of seven individuals from the CEPH-Human Genome Diversity Panel (HGDP): HGDP00456 (Mbuti), HGDP00998 (Karitiana Native American), HGDP00665 (Sardinia), HGDP00491 (Bougainville Melanesian), HGDP00711 (Cambodian), HGDP01224 (Mongolian) and HGDP00551 (Papuan). Each library was sequenced on the Illumina Genome Analyzer IIx using 2x101 + 7 cycles on one flow cell according to the manufacturer's instructions for multiplex sequencing. The paired-end reads were aligned using the Burrows-Wheeler Aligner to the human sequence (NCBI36/hg18) Download the Modern Human Seq track data sets from the Genome Browser downloads server. References Briggs A.W., Stenzel U., Meyer M., Krause J., Kircher M., Pääbo S. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 2009 Dec 22:38(6) e87. Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., Briggs A.W., Stenzel U., Johnson P.L.F. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010 Dec 23;468:1053-1060. Credits This track was produced at UCSC using data generated by the Max Planck Institute for Evolutionary Anthropology. bamMMS14Mongolian Mongolian Seq Mongolian (HGDP01224) Sequence Reads Denisova Assembly and Analysis bamMMS10NativeAmerican Native American Native American (HGDP00998) Sequence Reads Denisova Assembly and Analysis bamMMS13Cambodian Cambodian Seq Cambodian (HGDP00711) Sequence Reads Denisova Assembly and Analysis bamMMS11Sardinian Sardinian Seq Sardinian (HGDP00665) Sequence Reads Denisova Assembly and Analysis bamMMS16Papuan Papuan Seq Papuan (HGDP00551) Sequence Reads Denisova Assembly and Analysis bamMMS12Melanesian Melanesian Seq Melanesian (HGDP00491) Sequence Reads Denisova Assembly and Analysis bamMMS9MbutiPygmy Mbuti Pygmy Mbuti Pygmy (HGDP00456) Sequence Reads Denisova Assembly and Analysis ntModernHumans Modern Human Seq Alignments of Sequence Reads from 5 Modern Humans Neandertal Assembly and Analysis Description The Modern Human Seq track shows human sequence reads of five individuals mapped to the human genome. The purpose of this track is to put the divergence of the Neandertal genomes into perspective with regard to present-day humans. Display Conventions and Configuration The sequence reads (query sequences) from each of the five individuals are contained in separate subtracks. Use the checkboxes to select which individuals will be displayed in the browser. Click and drag the sample name to reorder the subtracks. The order in which the subtracks appear in the subtrack list will be the order in which they display in the browser. The query sequences in the SAM/BAM alignment representation are normalized to the + strand of the reference genome (see the SAM Format Specification for more information on the SAM/BAM file format). If a query sequence was originally the reverse of what has been stored and aligned, it will have the following flag: (0x10) Read is on '-' strand. BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning with X, Y or Z) are defined by the aligner or data submitter. The following is a list of the tags associated with this track. For this track, those starting with X are specific to the Burrows-Wheeler Aligner (BWA). XT: Type: Unique/Repeat/N/Mate-sw NM: Number of nucleotide differences (i.e. edit distance to the reference sequence) SM: Mapping quality if the read is mapped as a single read rather than as a read pair AM: Smaller single-end mapping quality of the two reads in a pair X0: Number of best hits X1: Number of suboptimal hits found by BWA XM: Number of mismatches in the alignment XO: Number of gap opens XG: Number of gap extentions MD: String for mismatching positions in the format of [0-9]+(([ACGTN]|\^[ACGTN]+)[0-9]+)* The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Display Read Names: By default, read names are not displayed. To display the read names, selected the check box next to "Display read names". Attempt to join paired end reads by name: When checked (default), reads with the same name will be joined into pairs for display, with a line drawn between them. Minimum alignment quality: Excludes alignments with quality less than the given number. The default is 0. Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence. Click here for additional information. Alignment Gap/Insertion Display Options: Click here for help with these options. Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. Color by strand: Alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue. Grayscale: Items are shaded according to the chosen method: alignment quality, base qualities, or unpaired ends. The alignment qualities of individual items are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). When "unpaired ends" is selected, items that were paired in sequencing but whose mate was not mapped are colored gray, while singletons and properly paired items are black. Alignment quality is the default. Methods The genomes of a San individual from Southern Africa (HGDP01029), a Yoruba individual from West Africa (HGDP00927), a Han Chinese individual (HGDP00778), an individual from Papua New Guinea (HGDP00542), and a French individual (HGDP00521) from Western Europe were sequenced to 4- to 6-fold coverage on the Illumina GAII platform. These sequences were aligned to the human reference genome (NCBI36/hg18) using the Burrows-Wheeler Aligner (BWA). Reads with an alignment quality of less than 30 were not included in these data. Those with an alignment quality greater than or equal to 30 were analyzed using a similar approach to that used for the Neandertal data. Credits This track was produced at UCSC using data generated by Ed Green. References Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A Draft Sequence of the Neandertal Genome. Science. 2010 7 May;328(5979):710-22. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. bamMMS8 French Seq French (HGDP00521) Sequence Reads Neandertal Assembly and Analysis bamMMS5 Papuan Seq Papuan (HGDP00542) Sequence Reads Neandertal Assembly and Analysis bamMMS4 Han Seq Han (HGDP00778) Sequence Reads Neandertal Assembly and Analysis bamMMS6 Yoruba Seq Yoruba (HGDP00927) Sequence Reads Neandertal Assembly and Analysis bamMMS7 San Seq San (HGDP01029) Sequence Reads Neandertal Assembly and Analysis nscan N-SCAN N-SCAN Gene Predictions Genes and Gene Predictions Description This track shows gene predictions using the N-SCAN gene structure prediction software provided by the Computational Genomics Lab at Washington University in St. Louis, MO, USA. Methods N-SCAN N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions. Human N-SCAN uses mouse (mm7) as the informant and iterative pseudogene masking. N-SCAN PASA-EST N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation sequence models in TWINSCAN, separate probability models are developed for EST alignments to genomic sequence in exons, introns, splice sites and UTRs, reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more accurate than N-SCAN while retaining the ability to discover novel genes to which no ESTs align. In N-SCAN PASA-EST, cDNA sequences were clustered using the PASA program beforehand. PASA, the Program to Assemble Spliced Alignments, was created by Brian Haas at TIGR. The algorithm assembles clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. The PASA clusters were used as 'EST' sequences in N-SCAN PASA-EST. The resulting gene models were updated with the input PASA clusters using the assembly tool of the PASA pipeline. These updates consist of automatically generated alternative splices, UTR features and sometimes merging of two gene models. In addition, PASA assigned open reading frames to clusters that did not overlap a gene prediction, but that did contain a full length cDNA, and output them as 'novel genes'. Note that PASA does not use any cDNA annotation from input but assigns the ORF itself. No manual annotation was performed to generate any of the gene models. The high accuracy of the set is in part due to the large number of available ESTs and full length cDNAs. Credits Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing these data. Special thanks for this implementation of N-SCAN to Aaron Tenney in the Brent lab, and Robert Zimmermann, currently at Max F. Perutz Laboratories in Vienna, Austria. References Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93. Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001 Jun 1;17(90001):S140-8. van Baren MJ, Brent MR. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 2006 May;16(5):678-85. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003 Oct 1;31(19):5654-66. nscanGene N-SCAN N-SCAN Gene Predictions Genes and Gene Predictions Description This track shows gene predictions using the N-SCAN gene structure prediction software provided by the Computational Genomics Lab at Washington University in St. Louis, MO, USA. Methods N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions. Human N-SCAN uses mouse (mm7) as the informant and iterative pseudogene masking. Credits Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing this data. Special thanks for this implementation of N-SCAN to Aaron Tenney in the Brent lab, and Robert Zimmermann, currently at Max F. Perutz Laboratories in Vienna, Austria. References Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93. Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001 Jun 1;17(90001)S140-8. van Baren MJ, Brent MR. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 2006 May;16(5):678-85. nscanPasaGene N-SCAN PASA-EST N-SCAN PASA-EST Gene Predictions Genes and Gene Predictions ntSeqContigs Neandertal Cntgs Neandertal Sequence Contigs Generated by Genotype Caller Neandertal Assembly and Analysis Description The Neandertal Sequence Contigs track shows consensus contigs called (after duplicate reads from each library were merged) from overlapping, non-redundant reads that passed mapping and base quality criteria. Display Conventions and Configuration The contigs (query sequences) from each of the six samples are contained in separate subtracks. Use the checkboxes to select which samples will be displayed in the browser. Click and drag the sample name to reorder the subtracks. The order in which the subtracks appear in the subtrack list will be the order in which they display in the browser. The query sequences in the SAM/BAM alignment representation are normalized to the + strand of the reference genome (see the SAM Format Specification for more information on the SAM/BAM file format). If a query sequence was originally the reverse of what has been stored and aligned, it will have the following flag: (0x10) Read is on '-' strand. BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning with X, Y or Z) are defined by the aligner or data submitter. The following tag is associated with this track: AS: Alignment score generated by aligner The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Display Read Names: By default, read names are not displayed. To display the read names, select the check box next to "Display read names". Minimum alignment quality: Excludes alignments with quality less than the given number. The default is 0. Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence. Click here for additional information. Alignment Gap/Insertion Display Options: Click here for help with these options. Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. Color by strand: Alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue. Grayscale: Items are shaded according to the chosen method: alignment quality or base qualities. The alignment qualities of individual items are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). Alignment quality is the default. Methods All Neandertal sequence reads from each of the six samples were aligned to the human (hg18) genome using the short read aligner/mapper ANFO. To reduce the effects of sequencing error, the alignments of Neandertal reads to the human and chimpanzee reference genomes were used to construct human-based and chimpanzee-based consensus "minicontigs". To generate the consensus, uniquely placed, overlapping alignments were selected (ANFO MAPQ ≥ 90) and these were merged into a single multi-sequence alignment using the common reference genome sequence. At each position in the resulting alignment, for each observed base, and for each possible original base: i) The likelihood of the observation was calculated, ii) the likely length of single-stranded overhangs was estimated, and iii) the potential for ancient DNA damage using the Briggs-Johnson model was considered (Briggs et al. 2007). If most observations in a given position showed a gap, the consensus became a gap; otherwise the base with the highest quality score (calculated by dividing each likelihood by the total likelihood) was used as the consensus. At the current coverage, heterozygous sites will appear as low quality bases with the second base (not shown) having a similar likelihood to the consensus base. Likewise, heterozygous indels are included only by chance or may show up as stretches of low quality bases. Credits This track was produced at UCSC using data generated by Ed Green. Reference Briggs AW, Stenzel U, Johnson PL, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A. 2007 Sep 11;104(37):14616-21. PMID: 17715061; PMC: PMC1976210 Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 bamVi33dot26 Vi33.26 Contigs Vi33.26 Contigs Neandertal Assembly and Analysis bamVi33dot25 Vi33.25 Contigs Vi33.25 Contigs Neandertal Assembly and Analysis bamVi33dot16 Vi33.16 Contigs Vi33.16 Contigs Neandertal Assembly and Analysis bamSid1253 Sid1253 Contigs Sid1253 Contigs Neandertal Assembly and Analysis bamMez1 Mez1 Contigs Mez1 Contigs Neandertal Assembly and Analysis bamFeld1 Feld1 Contigs Feld1 Contigs Neandertal Assembly and Analysis bamAll All Contigs Generated from All Data Neandertal Assembly and Analysis ntMito Neandertal Mito Neandertal Mitochondrial Sequence (Vi33.16, 2008) Neandertal Assembly and Analysis Description This track shows the alignment of a complete Neandertal mitochondrial sequence to a modern human mitochondrial sequence. Note: the mitochondrion used as the genome browser reference sequence "chrM" in hg18 and hg19 is NC_001807, which has been deprecated. Future human genome browsers will use the revised Cambridge Reference Sequence (rCRS) NC_012920. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. Mismatching bases are highlighted as described here. Several types of alignment gap may also be colored; for more information, click here. Methods DNA was extracted from a 38,000-year-old bone and sequenced using methods described in Green, et al. The Neandertal mitochondrial sequence (NC_011137) was downloaded from GenBank and aligned to chrM (NC_001807) using BLAT. Reference Green RE, Malaspinas AS, Krause J, Briggs AW, Johnson PL, Uhler C, Meyer M, Good JM, Maricic T, Stenzel U et al. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell. 2008 Aug 8;134(3):416-26. ntSeqReads Neandertal Seq Neandertal Sequence Reads Neandertal Assembly and Analysis Description The Neandertal Seq track shows Neandertal sequence reads mapped to the human genome. The Neandertal sequence was generated from six Neandertal fossils found in Croatia, Germany, Spain and Russia. Display Conventions and Configuration The sequence reads (query sequences) from each of the six samples are contained in separate subtracks. Use the checkboxes to select which samples will be displayed in the browser. Click and drag the sample name to reorder the subtracks. The order in which the subtracks appear in the subtrack list will be the order in which they display in the browser. The query sequences in the SAM/BAM alignment representation are normalized to the + strand of the reference genome (see the SAM Format Specification for more information on the SAM/BAM file format). If a query sequence was originally the reverse of what has been stored and aligned, it will have the following flag: (0x10) Read is on '-' strand. BAM/SAM alignment representations also have tags. Some tags are predefined and others (those beginning with X, Y or Z) are defined by the aligner or data submitter. The following tag is associated with this track: AS: Alignment score generated by aligner The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Display Read Names: By default, read names are not displayed. To display the read names, selected the check box next to "Display read names". Minimum alignment quality: Excludes alignments with quality less than the given number. The default is 0. Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence. Click here for additional information. Alignment Gap/Insertion Display Options: Click here for help with these options. Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. Color by strand: Alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue. Grayscale: Items are shaded according to the chosen method: alignment quality or base qualities. The alignment qualities of individual items are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). Alignment quality is the default. Methods The Neandertal sequence was genereated from six Neandertal fossils. Vi33.16 (54.1% genome coverage), Vi33.25 (46.6%) and Vi33.26 (45.2%) were discovered in the Vindija cave in Croatia. Feld1 (0.1%) is from the Neandertal type specimen from the Neander Valley in Germany, Sid1253 (0.1%) is from El Sidron cave in Asturias, Spain, and Mez1 (2%) is from Mezmaiskaya in the Altai Mountains, Russia. To increase the fraction of endogenous Neandertal DNA in the sequencing libraries, restriction enzymes were used to deplete libraries of microbial DNA. This was done by identifying Neandertal sequencing reads whose best alignment was to a primate sequence, and selecting enzymes that would differentially cut non-primate fragments. These enzymes all contained CpG dinucleotides in their recognition sequences, reflecting the particularly low abundance of this dinucleotide in mammalian DNA. Sequencing was carried out on the 454 FLX and Titanium platforms and the Illumina GA. Neandertal reads were mapped to the human genome (hg18) using a custom mapper called ANFO. This custom alignment program was developed to take into account the characteristics of ancient DNA. Following the observation and implementation by Briggs et al., ANFO uses different substitution matrices for DNA thought to be double-stranded versus single-stranded and changes between them if doing so affords a better score. Credits This track was produced at UCSC using data generated by Ed Green. References Briggs AW, Stenzel U, Johnson PL, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A. 2007 Sep 11;104(37):14616-21. PMID: 17715061; PMC: PMC1976210 Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 bamSLVi33dot26 Vi33.26 Sequence Vi33.26 Sequence Reads Neandertal Assembly and Analysis bamSLVi33dot25 Vi33.25 Sequence Vi33.25 Sequence Reads Neandertal Assembly and Analysis bamSLVi33dot16 Vi33.16 Sequence Vi33.16 Sequence Reads Neandertal Assembly and Analysis bamSLSid1253 Sid1253 Sequence Sid1253 Sequence Reads Neandertal Assembly and Analysis bamSLMez1 Mez1 Sequence Mez1 Sequence Reads Neandertal Assembly and Analysis bamSLFeld1 Feld1 Sequence Feld1 Sequence Reads Neandertal Assembly and Analysis wgEncodeNhgriBip NHGRI Bi-Pro Bip ENCODE July 2009 Freeze 2009-04-27 2010-01-27 607 Elnitski NHGRI-Elnitski wgEncodeNhgriBip Bidirectional Promoters Elnitski Elnitski - National Human Genome Research Institute ENCODE NHGRI Elnitski Bidirectional Promoters Regulation Description Bidirectional promoters are the regulatory regions that fall between pairs of genes, where the 5' ends of the genes within a pair are positioned in close proximity to one another. This spacing facilitates the initiation of transcription of both genes, creating two transcription forks that advance in opposite directions. The formal definition of a bidirectional promoter requires that the transcription initiation sites are separated by no more than 1,000 bp from one another. Using these criteria we have comprehensively annotated the human and mouse genomes for the presence of bidirectional promoters, using in silico approaches. The identification of these promoters is contingent upon the presence of adjacent, oppositely oriented pairs of genes, because few distinguishing features are available to uniquely identify bidirectional promoters de novo. Genomic annotations used for our identification phase include: A) UCSC known genes annotations (items with score=800). B) GenBank mRNA annotations (score=600). C) spliced ESTs (score=400). The annotations for protein coding genes (A) are strongly supported and therefore provide a high quality dataset for mapping bidirectional promoters. In contrast, bidirectional promoters supported by spliced ESTs (C) alone have varying levels of evidence, ranging from one characterized transcript to hundreds of them. For this reason, the mRNA annotation (B) from GenBank provides a stringent level of validation for the start sites of the EST transcripts. As a large class of regulatory sequences, bidirectional promoters exemplify a rich source of unexplored biological information in the human genome. When compared to the mouse genome, these promoters are identifiable as truly orthologous locations, being maintained in regions of conserved synteny (including both genes and the intervening promoter region) that have undergone no rearrangements since the last common ancestor of mammals, and in some cases fish. We use this approach to annotate orthologous bidirectional promoters in nonhuman species until genomic annotations become available. Methods Assigning Orthologous Regions A multi-stage approach to mapping orthology at bidirectional promoters was developed. Orthology assignments are strongest in coding regions. Therefore we began by mapping single human genes regulated by bidirectional promoters from the Known Genes annotations onto the mouse genome. Orthology assignments were determined using the "chains and nets" data from the UCSC Human Genome Browser mysql tables. Chains in the Genome Browser represent sequences of gapless aligned blocks. Nets provide a hierarchical ordering of those chains. Level 1 chains contain the longest, best-scoring sequence chains that span any selected region. Subsequent levels in the net represent the results of rearrangements, duplications, insertions and deletions that may have disrupted the presence of conserved synteny derived from an ancestral sequence. Confirming Orthologous Genes After determining the orthology assignments using the UCSC chains and nets data, we used the Known Gene annotations or spliced ESTs to search the identity of genes within the corresponding region. Known Genes represent protein-coding genes and therefore can be verified by chains and nets alignments, followed by confirmation of protein identity in both species. Spliced ESTs carry less descriptive information than protein coding genes and therefore were validated in the second species by their presence in an orthologous region, showing conserved synteny of the two genes within a pair, and meeting the criteria of less than 1,000 bp of intergenic distance between those transcripts. Our method for mapping bidirectional promoters in spliced EST datasets is described in more detail in a previous publication. If the program verified evidence for orthology and conserved-syntenic gene arrangement, then the orthologous bidirectional promoter was confirmed. After orthologous assignments were confirmed for pairs of human genes, the reciprocal assignments were analyzed from mouse to human. Currently orthologous bidirectional promoter regions (that have been identified using UCSC known genes) have been mapped in human, chimp, macaque, mouse, rat, dog and cow genomes). Credits These data were produced by Mary Q. Yang in the Elnitski lab at NHGRI, NIH. (contact: elnitski@mail.nih.gov) References Piontkivska H, Yang MQ, Larkin DM, Lewin HA, Reecy J, Elnitski L. Cross-species mapping of bidirectional promoters enables prediction of unannotated 5' UTRs and identification of species-specific transcripts. BMC Genomics. 2009 Apr 24;10:189. PMID: 19393065; PMC: PMC2688522 Yang MQ, Elnitski LL. A computational study of bidirectional promoters in the human genome . Springer Lecture Series: Notes in Bioinformatics 2007. Yang MQ, Elnitski L. Orthology of Bidirectional Promoters Enables Use of a Multiple Class Predictor for Discriminating Functional Elements in the Human Genome. Proceedings of the 2007 International Conference on Bioinformatics & Computational Biology. Proceedings of the 2007 International Conference on Bioinformatics & Computational Biology . --> pp. 218-228. 2007. ISBN: 1-60132-042-6. Yang MQ, Koehly LM, Elnitski LL. Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and ovarian cancer genes. PLoS Comput Biol. 2007 Apr 20;3(4):e72. PMID: 17447839; PMC: PMC1853124 Yang MQ, Taylor J, Elnitski L. Comparative analyses of bidirectional promoters in vertebrates. BMC Bioinformatics. 2008 May 28;9 Suppl 6:S9. PMID: 18541062; PMC: PMC2423431 Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the tablemetadata as dateUnrestricted and on the download page. The full data release policy for ENCODE is available here. encodeIndels NHGRI DIPs NHGRI Deletion/Insertion Polymorphisms in ENCODE regions Pilot ENCODE Comparative Genomics and Variation Description This track shows deletion/insertion polymorphisms (DIPs). In packed and full modes, the sequence variation is shown to the left of the DIP. The naming convention "-/sequence" is used for deletions; "sequence/-" is used for insertions. The details page shows the name of the trace used to define the polymorphism, the quality score, and the strand on which the trace aligns to the reference sequence. The quality score reflects the minimum PHRED quality value over the entire range of the DIP within the trace, plus 5 flanking bases. PHRED quality scores are expressed as log probabilities using the formula: Q = -10 * log10(Pe) where Pe is the estimated probability of an error at that base. PHRED quality scores typically vary from 0 to 40, where 0 indicates complete uncertainty about the base and 40 implies odds of 10,000 to 1 that the base is correct. Sometimes a PHRED value of 50 or higher is used to denote finished sequence. A color gradient is used to distinguish quality scores in the browser display: brighter shading indicates higher scores. The "Trace Pos" value on the details page indicates the 3' position of the DIP within the trace. The alleles are reported relative to the "+" strand of the reference sequence; however, the trace may actually align to the "-" strand. When viewing the chromatogram using the URL provided, if the trace aligned to the "-" strand, the DIP bases in the trace will be the reverse compliment of the variant allele given. Methods All human trace data from NCBI's trace archive were aligned to hg17 with ssahaSNP, followed by ssahaDIP post-processing to detect deletion/insertion polymorphisms. DIPs within ENCODE regions were extracted. Verification For verification, 500k traces from the mouse whole genome shotgun (WGS) sequencing effort were compared to mm6 using ssahaSNP and ssahaDIP. Because mm6 and these traces are from the same mouse strain, C57BL/6J, the DIP rate should be very low. Applying a quality threshold of Q23, the detected DIP rate was one DIP per 140k Neighborhood Quality Standard (NQS) bases. This level was ten-fold lower than the SNP rate for the same data set using ssahaSNP, which has been validated as having a 5% false positive rate. The detected DIP rate for human traces against hg17 is one DIP per 12k NQS bases, indicating a false positive rate of 12k/140k, or about 8%. Further validation experiments are in progress. Credits All analyses were performed by Jim Mullikin using ssahaSNP and ssahaDIP. The trace data were contributed to the trace archive by many sequencing centers. References Ning Z, Cox AJ, Mullikin JC. SSAHA: A fast search method for large DNA databases. Genome Res. 2001 Oct;11(10):1725-9. The International SNP Map Working Group. A map of human genome sequence variation containing 1.4 million single nucleotide polymorphisms. Nature. 2001 Feb 15;409(6822):928-33. wgEncodeNhgriNre NHGRI NRE ENCODE NHGRI Elnitski Negative Regulatory Elements Regulation Description Silencers and enhancer-blockers (EBs) are cis-acting, negative regulatory elements (NREs) that control interactions between promoters and enhancers. Although relatively uncharacterized in terms of biological mechanisms, these elements are likely to be abundant in the genome. Examples of NREs include silencers, which decrease expression of a gene under their regulation and enhancer-blocking (EB) elements, which prevent the action of an enhancer on a promoter when placed between the two, but not otherwise. Examples of NREs are extremely limited. EB and silencer assays typically require integration of the reporter gene into the chromatin of genomic DNA. The process is too laborious to use in large-scale analyses. To improve scalability, the system uses non-integrated recombinant plasmids, which have been shown to support EB activity. Silencing and EB function in genomic sequences were evaluated by developing a transient transfection assay suitable for high-throughput screening. To ensure that the identified elements are capable of overcoming the effects of strong enhancers, the assay utilizes an enhancer from the human beta-globin locus control region. Known as DNase I hypersensitive site II (HS2) this enhancer is functional in multiple cell lines, with multiple promoters. The presence of HS2 provides a large window of expression to reliably measure loss-of-function effects. This track was developed as part of the ENCODE project. It currently is limited to chromosome 7. Methods To assess transient expression, 4 x 105 K562 cells and 6 x 104 HeLa or 293T cells were transfected with 0.4 micrograms of test DNA and 4.0 or 40 nanograms of pRL-Tk Renilla plasmid for lipofection (using the reagent TFX-50) or electroporation (using the Amaxa 96-well nucleofector II). Both approaches used Renilla luciferase as a control for transfection efficiency. Luciferase expression was measured in a 96-well plate format with detection of fluorescence using the dual luciferase "Stop and Glo" procedure from Promega. Measurements were recorded on a Berthold plate-reader luminometer. The average expression level from three replicate transfections was normalized to the Renilla luciferase co-transfection control. This value was further normalized to the average expression level from three normalized replicates of the promoter-only plasmid to yield a "fold" enhancement measurement. Upon producing a silencing phenotype, each construct was re-sequenced to confirm the integrity of the plasmid. Credits These data were produced by the Elnitski lab at NHGRI, NIH. (contact: elnitski@mail.nih.gov) References Petrykowska HM, Vockley CM, Elnitski L. Detection and characterization of silencers and enhancer-blockers in the greater CFTR locus. Genome Res. 2008 Aug;18(8):1238-46. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeNhgriNreViewSI Silencers ENCODE NHGRI Elnitski Negative Regulatory Elements Regulation wgEncodeNhgriNRESilencersK562Sv40 K562 SV40 Si K562 NRE ENCODE Nov 2008 Freeze 2008-11-25 2009-08-25 328 Elnitski NHGRI-Elnitski wgEncodeNhgriNRESilencersK562Sv40 Silencers leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Negative Regulatory Elements Elnitski Elnitski - National Human Genome Research Institute ENCODE NHGRI Elnitski NRE Silencers (SV40 prom. in K562 cells) Regulation wgEncodeNhgriNRESilencersK562Ggamma K562 gamma Si K562 NRE ENCODE Nov 2008 Freeze 2008-11-25 2009-08-25 327 Elnitski NHGRI-Elnitski wgEncodeNhgriNRESilencersK562Ggamma Silencers leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Negative Regulatory Elements Elnitski Elnitski - National Human Genome Research Institute ENCODE NHGRI Elnitski NRE Silencers (Ggamin prom. in K562 cells) Regulation wgEncodeNhgriNreViewEB Enhancer Blockers ENCODE NHGRI Elnitski Negative Regulatory Elements Regulation wgEncodeNhgriNREEnhancerBlockersK562Sv40 K562 SV40 EB K562 NRE ENCODE Nov 2008 Freeze 2008-11-25 2009-08-25 328 Elnitski NHGRI-Elnitski wgEncodeNhgriNREEnhancerBlockersK562Sv40 Enhancer_Blockers leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Negative Regulatory Elements Elnitski Elnitski - National Human Genome Research Institute ENCODE NHGRI Elnitski NRE Enhancer Blockers (SV40 prom. in K562 cells) Regulation wgEncodeNhgriNREEnhancerBlockersK562Ggamma K562 gamma EB K562 NRE ENCODE Nov 2008 Freeze 2008-11-25 2009-08-25 327 Elnitski NHGRI-Elnitski wgEncodeNhgriNREEnhancerBlockersK562Ggamma Enhancer_Blockers leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Negative Regulatory Elements Elnitski Elnitski - National Human Genome Research Institute ENCODE NHGRI Elnitski NRE Enhancer Blockers (Ggam prom. in K562 cells) Regulation laminB1Lads NKI LADs (Tig3) NKI LADs (Lamina Associated Domains, Tig3 cells) Regulation Description Please see the NKI Nuc Lamina "super-track" link above for description and methods. numtSeq NumtS Sequence Human NumtS mitochondrial sequence Variation and Repeats Description and display conventions NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported. The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described. The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome. "NumtS (Nuclear mitochondrial Sequences)" Track The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts. "NumtS assembled" Track The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions: the orientation of their alignments must be concordant. the distance between them must be less than 2 kb, on the mitochondrial genome as well as on the nuclear genome. Exceptions for the second condition arise when a long repetitive element is present between two HSPs. "NumtS on mitochondrion" Track The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided. "NumtS on mitochondrion with chromosome placement" Track The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided. Methods NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection. Verification NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank. Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples. Credits These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone. References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. PMID: 18451855; PMC: PMC2424287 Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC Genomics. 2008 Jun 3;9:267. PMID: 18522722; PMC: PMC2447851 Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M. The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser. BMC Genomics. 2011 Oct 20;12:517. PMID: 22013967; PMC: PMC3228558 numtSMitochondrionChrPlacement NumtS chr colored Human NumtS on Mitochondrion with Chromosome Placement Variation and Repeats Description and display conventions NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported. The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described. The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome. "NumtS (Nuclear mitochondrial Sequences)" Track The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts. "NumtS assembled" Track The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions: the orientation of their alignments must be concordant. the distance between them must be less than 2 kb, on the mitochondrial genome as well as on the nuclear genome. Exceptions for the second condition arise when a long repetitive element is present between two HSPs. "NumtS on mitochondrion" Track The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided. "NumtS on mitochondrion with chromosome placement" Track The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided. Methods NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection. Verification NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank. Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples. Credits These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone. References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. PMID: 18451855; PMC: PMC2424287 Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC Genomics. 2008 Jun 3;9:267. PMID: 18522722; PMC: PMC2447851 Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M. The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser. BMC Genomics. 2011 Oct 20;12:517. PMID: 22013967; PMC: PMC3228558 numtSMitochondrion NumtS on mitochon Human NumtS on Mitochondrion Variation and Repeats Description and display conventions NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported. The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described. The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome. "NumtS (Nuclear mitochondrial Sequences)" Track The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts. "NumtS assembled" Track The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions: the orientation of their alignments must be concordant. the distance between them must be less than 2 kb, on the mitochondrial genome as well as on the nuclear genome. Exceptions for the second condition arise when a long repetitive element is present between two HSPs. "NumtS on mitochondrion" Track The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided. "NumtS on mitochondrion with chromosome placement" Track The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided. Methods NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection. Verification NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank. Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples. Credits These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone. References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. PMID: 18451855; PMC: PMC2424287 Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC Genomics. 2008 Jun 3;9:267. PMID: 18522722; PMC: PMC2447851 Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M. The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser. BMC Genomics. 2011 Oct 20;12:517. PMID: 22013967; PMC: PMC3228558 numtSAssembled NumtS assembled Human NumtS Assembled Variation and Repeats Description and display conventions NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported. The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described. The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome. "NumtS (Nuclear mitochondrial Sequences)" Track The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts. "NumtS assembled" Track The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions: the orientation of their alignments must be concordant. the distance between them must be less than 2 kb, on the mitochondrial genome as well as on the nuclear genome. Exceptions for the second condition arise when a long repetitive element is present between two HSPs. "NumtS on mitochondrion" Track The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided. "NumtS on mitochondrion with chromosome placement" Track The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided. Methods NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection. Verification NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank. Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples. Credits These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone. References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. PMID: 18451855; PMC: PMC2424287 Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC Genomics. 2008 Jun 3;9:267. PMID: 18522722; PMC: PMC2447851 Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M. The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser. BMC Genomics. 2011 Oct 20;12:517. PMID: 22013967; PMC: PMC3228558 numtS NumtS Human NumtS Variation and Repeats Description and display conventions NumtS (Nuclear mitochondrial sequences) are mitochondrial fragments inserted in nuclear genomic sequences. The most credited hypothesis concerning their generation suggests that in presence of mutagenic agents or under stress conditions fragments of mtDNA escape from mitochondria, reach the nucleus and insert into chromosomes during break repair, although NumtS can derive from duplication of genomic fragments. NumtS may be cause of contamination during human mtDNA sequencing and hence frequent false low heteroplasmic evidences have been reported. The Bioinformatics group chaired by M.Attimonelli (Bari, Italy) has produced the RHNumtS compilation annotating more than 500 Human NumtS. To allow the scientific community to access to the compilation and to perform genomics comparative analyses inclusive of the NumtS data, the group has designed the Human NumtS tracks below described. The NumtS tracks show the High Score Pairs (HSPs) obtained by aligning the mitochondrial reference genome (NC_012920) with the hg18 release of the human genome. "NumtS (Nuclear mitochondrial Sequences)" Track The "NumtS mitochondrial sequences" track shows the mapping of the HSPs returned by BlastN on the nuclear genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the mitochondrial mapping is provided, thus allowing a fast cross among the NumtS genomic contexts. "NumtS assembled" Track The "NumtS assembled" track shows items obtained by assembling HSPs annotated in the "NumtS" track fulfilling the following conditions: the orientation of their alignments must be concordant. the distance between them must be less than 2 kb, on the mitochondrial genome as well as on the nuclear genome. Exceptions for the second condition arise when a long repetitive element is present between two HSPs. "NumtS on mitochondrion" Track The "NumtS on mitochondrion" track shows the mapping of the HSPs on the mitochondrial genome. The shading of the items reflects the similarity returned by BlastN, and the direction of the arrows is concordant with the strand of the alignment. For every item, a link pointing to the nuclear mapping is provided. "NumtS on mitochondrion with chromosome placement" Track The "NumtS on mitochondrion with chromosome placement" shows the mapping of the HSPs on the mitochondrial genome, but the items are coloured according to the colours assigned to each human chromosome on the UCSC genome browser. No shading is here provided. For every item, a link pointing to the nuclear mapping is provided. Methods NumtS mappings were obtained by running Blast2seq (program: BlastN) between each chromosome of of the Human Genome hg18 build and the human mitochondrial reference sequence (rCRS, AC: NC_012920), fixing the e-value threshold to 1e-03. The assembling of the HSPs was performed with spreadsheet interpolation and manual inspection. Verification NumtS predicted in silico were validated by carrying out PCR amplification and sequencing on blood-extracted DNA of a healthy individual of European origin. PCR amplification was successful for 275 NumtS and provided amplicons of the expected length. All PCR fragments were sequenced on both strands, and submitted to the EMBL databank. Furthermore, 541 NumtS were validated by merging NumtS nuclear coordinates with HapMap annotations. Our analysis has been carried on eight HapMap individuals (NA18517, NA18507, NA18956, NA19240, NA18555, NA12878, NA19129, NA12156). For each sample, clones with a single best concordant placement (according to the fosmid end-sequence-pair analysis described in Kidd et al., 2008), have been considered. The analysis showed that 541 NumtS (at least 30bp for each one) had been sequenced in such samples. Credits These data were provided by Domenico Simone and Marcella Attimonelli at Department of Biochemistry and Molecular Biology "Ernesto Quagliariello" (University of Bari, Italy). Primer designing was carried out by Francesco Calabrese and Giuseppe Mineccia. PCR validation was carried out by Martin Lang, Domenico Simone and Giuseppe Gasparre. Merging with HapMap annotations has been performed by Domenico Simone. References Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008 May 1;453(7191):56-64. PMID: 18451855; PMC: PMC2424287 Lascaro D, Castellana S, Gasparre G, Romeo G, Saccone C, Attimonelli M. The RHNumtS compilation: features and bioinformatics approaches to locate and quantify Human NumtS. BMC Genomics. 2008 Jun 3;9:267. PMID: 18522722; PMC: PMC2447851 Simone D, Calabrese FM, Lang M, Gasparre G, Attimonelli M. The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser. BMC Genomics. 2011 Oct 20;12:517. PMID: 22013967; PMC: PMC3228558 knownGeneOld3 Old UCSC Genes Previous Version of UCSC Genes Genes and Gene Predictions Description The Old UCSC Genes track shows genes from the previous version of the UCSC Genes build. This is similar to the current version but without explicitly including CCDS proteins. Additionally not as much data were available in Genbank, RefSeq, or UniProt for this older version. Read the description of how the current version of the UCSC Genes track was built. omimAvSnp OMIM Alleles OMIM Allelic Variant Phenotypes Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles) Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes) The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci) Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the allelic variants in the Online Mendelian Inheritance in Man (OMIM) database that have associated dbSNP identifiers. Note: The latest OMIM annotation contains many variants found only in dbSNP build 132, which is available only for the GRCh37/hg19 assembly. This (hg18) track was built with dbSNP build 130 annotations and is therefore missing many entries. We srongly encourage users to use the GRCh37/hg19 OMIM Alleles track instead of this one if possible. Display Conventions and Configuration Genomic positions of OMIM allelic variants are marked by solid blocks, which appear as tick marks when zoomed out. The details page for each variant displays the allelic variant description, the amino acid replacement, and the dbSNP identifier, with a link to that variant's details page in the "SNPs (130)" track. The descriptions of OMIM entries are shown on the main browser display when Full display mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Methods This track was constructed as follows: The OMIM allelic variant data file mimAV.txt was obtained from OMIM and loaded into the MySQL table omimAv. The genomic position for each allelic variant in omimAv with an associated dbSnp identifier was obtained from the snp130 table. The OMIM AV identifiers and their corresponding genomic positions from dbSNP were then loaded into the omimAvSnp table. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM®). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. Epub 2008 Oct 8. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. omimLocation OMIM Cyto Loci OMIM Cytogenetic Loci Phenotypes - Gene Unknown Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles) Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes) The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci) Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the cytogenetic locations of phenotype entries in the Online Mendelian Inheritance in Man (OMIM) database for which the gene is unknown. Display Conventions and Configuration Cytogenetic locations of OMIM entries are displayed as solid blocks. The entries are colored according to the OMIM phenotype map key of associated disorders: Lighter Green for phenotype map key 1 OMIM records - the disorder has been placed on the map based on its association with a gene, but the underlying defect is not known. Light Green for phenotype map key 2 OMIM records - the disorder has been placed on the map by linkage; no mutation has been found. Dark Green for phenotype map key 3 OMIM records - the molecular basis for the disorder is known; a mutation has been found in the gene. Purple for phenotype map key 4 OMIM records - a contiguous gene deletion or duplication syndrome; multiple genes are deleted or duplicated causing the phenotype. Gene symbols and disease information, when available, are displayed on the details pages. The descriptions of OMIM entries are shown on the main browser display when Full display mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Items displayed can be filtered according to phenotype map key on the track controls page. Methods This track was constructed as follows: The data file genemap.txt from OMIM was loaded into the MySQL table omimGeneMap. Entries in genemap.txt having disorder info were parsed and loaded into the omimPhenotype table. The phenotype map keys (the numbers (1)(2)(3)(4) from the disorder columns) were placed into a separate field. The cytogenetic location data (from the location column in omimGeneMap) were parsed and converted into genomic start and end positions based on the cytoBand table. These genomic positions, together with the corresponding OMIM IDs, were loaded into the omimLocation table. All entries with no associated phenotype map key and all OMIM gene entries as reported in the "OMIM Genes" track were then excluded from the omimLocation table. Data Access Because OMIM has only allowed Data queries within individual chromosomes, no download files are available from the Genome Browser. Full genome datasets can be downloaded directly from the OMIM Downloads page. All genome-wide downloads are freely available from OMIM after registration. If you need the OMIM data in exactly the format of the UCSC Genome Browser, for example if you are running a UCSC Genome Browser local installation (a partial "mirror"), please create a user account on omim.org and contact OMIM via https://omim.org/contact. Send them your OMIM account name and request access to the UCSC Genome Browser 'entitlement'. They will then grant you access to a MySQL/MariaDB data dump that contains all UCSC Genome Browser OMIM tables. UCSC offers queries within chromosomes from Table Browser that include a variety of filtering options and cross-referencing other datasets using our Data Integrator tool. UCSC also has an API that can be used to retrieve data in JSON format from a particular chromosome range. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. PMID: 18842627; PMC: PMC2686440 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. PMID: 15608251; PMC: PMC539987 omimGene2 OMIM Genes OMIM Gene Phenotypes - Dark Green Can Be Disease-causing Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles) Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes) The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci) Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the genomic positions of all gene entries in the Online Mendelian Inheritance in Man (OMIM) database. Display Conventions and Configuration Genomic locations of OMIM gene entries are displayed as solid blocks. The entries are colored according to the associated OMIM phenotype map key (if any): Lighter Green for phenotype map key 1 OMIM records - the disorder has been placed on the map based on its association with a gene, but the underlying defect is not known. Light Green for phenotype map key 2 OMIM records - the disorder has been placed on the map by linkage; no mutation has been found. Dark Green for phenotype map key 3 OMIM records - the molecular basis for the disorder is known; a mutation has been found in the gene. Purple for phenotype map key 4 OMIM records - a contiguous gene deletion or duplication syndrome; multiple genes are deleted or duplicated causing the phenotype. Light Gray for Others - no associated OMIM phenotype map key info available. Gene symbol, phenotype, and inheritance information, when available, are displayed on the details page for an item, and links to related RefSeq Genes and UCSC Genes are given. The descriptions of the OMIM entries are shown on the main browser display when mousing over each entry. Mode of Inheritance Abbreviation Autosomal Dominant AD Autosomal Recessive AR Digenic Dominant DD Digenic Recessive DR Isolated Cases IC Mitochondrial Mi Multifactorial Mu Pseudoautosomal Dominant PADom Pseudoautosomal Recessive PARec Somatic Mosaicism SomMos Somatic Mutation SMu X-Linked XL X-Linked Dominant XLD X-Linked Recessive XLR Y-Linked YL Brackets, "[ ]", before the phenotype name indicate "nondiseases," mainly genetic variations that lead to apparently abnormal laboratory test values (e.g., dysalbuminemic euthyroidal hyperthyroxinemia). Braces, "{ }", indicate mutations that contribute to susceptibility to multifactorial disorders (e.g., diabetes, asthma) or to susceptibility to infection (e.g., malaria). Question marks, "?", indicate that the relationship between the phenotype and gene is provisional. More details about this relationship are provided in the comment field of the map and in the gene and phenotype OMIM entries. Methods The mappings displayed in this track are based on OMIM gene entries, their Entrez Gene IDs, and the corresponding RefSeq Gene locations: The data file genemap.txt from OMIM was loaded into the MySQL table omimGeneMap. The data file mim2gene.txt from OMIM was processed and loaded into the MySQL table omim2gene. Entries in genemap.txt having disorder info were parsed and loaded into the omimPhenotype table. For each OMIM gene in the omim2gene table, the Entrez Gene ID was used to get the corresponding RefSeq Gene ID via the refLink table, and the RefSeq ID was used to get the genomic location from the refGene table.* The OMIM gene IDs and corresponding RefSeq Gene locations were loaded into the omimGene2 table, the primary table for this track. *The locations in the refGene table are from alignments of RefSeq Genes to the reference genome using BLAT. Data Updates This track is automatically updated once a week from OMIM data. The most recent update time is shown at the top of the track documentation page. Data Access Because OMIM has only allowed Data queries within individual chromosomes, no download files are available from the Genome Browser. Full genome datasets can be downloaded directly from the OMIM Downloads page. All genome-wide downloads are freely available from OMIM after registration. If you need the OMIM data in exactly the format of the UCSC Genome Browser, for example if you are running a UCSC Genome Browser local installation (a partial "mirror"), please create a user account on omim.org and contact OMIM via https://omim.org/contact. Send them your OMIM account name and request access to the UCSC Genome Browser "entitlement". They will then grant you access to a MySQL/MariaDB data dump that contains all UCSC Genome Browser OMIM tables. UCSC offers queries within chromosomes from Table Browser that include a variety of filtering options and cross-referencing other datasets using our Data Integrator tool. UCSC also has an API that can be used to retrieve data in JSON format from a particular chromosome range. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. Example: Retrieve phenotype, Mode of Inheritance, and other OMIM data within a range Go to Table Browser, make sure the right dataset is selected: group: Phenotype and Literature, track: OMIM Genes, table: omimGene2. Define region of interest by entering coordinates or a gene symbol into the "Position" textbox, such as chr1:11,166,591-11,322,608 or MTOR, or upload a list. Format your data by setting the "Output format" dropdown to "selected fields from primary and related Tables" and click get output. This brings up the data field and linked table selection page. Select chrom, chromStart, chromEnd, and name from omimGene2 table. Then select the related tables omim2gene and omimPhenotype and click allow selection from check tables. This brings up the fields of the linked tables, where you can select approvedGeneSymbol, omimID, description, omimPhenotypeMapKey, and inhMode. Click on the get output to proceed to the results page: chr1 11166591 11322608 601231 Gene: MTOR, Synonyms: FRAP1, SKS, Phenotypes: Smith-Kingsmore syndrome, AD, 3; Focal cortical dysplasia, type II, somatic, 3 For a quick link to pre-fill these options, click this session link. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. PMID: 18842627; PMC: PMC2686440 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. PMID: 15608251; PMC: PMC539987 wgEncodeChromatinMap Open Chromatin ENCODE Open Chromatin, Duke/UNC/UT Regulation Description These tracks display evidence of open chromatin in multiple cell types from the Duke/UNC/UT-Austin/EBI ENCODE group. Open chromatin was identified using two independent and complementary methods: DNaseI hypersensitivity (HS) and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE), combined with chromatin immunoprecipitation (ChIP) for select regulatory factors. Each method was verified by two detection platforms: Illumina (formerly Solexa) sequencing by synthesis, and high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen. DNaseI HS data: DNaseI is an enzyme that has long been used to map general chromatin accessibility, and DNaseI "hyperaccessibility" or "hypersensitivity" is a feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, silencers, insulators, promotors, locus control regions and novel elements. DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome. FAIRE data: FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) is a method to isolate and identify nucleosome-depleted regions of the genome. FAIRE was initially discovered in yeast and subsequently shown to identify active regulatory elements in human cells (Giresi et al., 2007). Although less well-characterized than DNase, FAIRE also appears to identify functional regulatory elements that include enhancers, silencers, insulators, promotors, locus control regions and novel elements. DNA fragments isolated by FAIRE are 100-200 bp in length, with the average length being 140 bp. ChIP data: ChIP (Chromatin Immunoprecipitation) is a method to identify the specific location of proteins that are directly or indirectly bound to genomic DNA. By identifying the binding location of sequence-specific transcription factors, general transcription machinery components, and chromatin factors, ChIP can help in the functional annotation of the open chromatin regions identified by DNaseI HS mapping and FAIRE. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. Chromatin data displayed here represents a continuum of signal intensities. The Crawford lab recommends setting the "Data view scaling: auto-scale" option when viewing signal data in full mode. In general, for each experiment in each of the cell types, the Open Chromatin tracks contain the following views: Peaks Regions of enriched signal in either DNaseI HS, FAIRE, or ChIP experiments. Peaks were called based on signals created using F-Seq, a software program developed at Duke (Boyle et al., 2008b). Significant regions were determined by performing ROC analysis of sequence data using data from the 1% ENCODE arrays, and determining a cut-off value at approximately the 95% sensitivity level. The solid vertical line in the peak represents the point with highest signal. ENCODE Peaks tables contain a p-value for statistical significance. For these data, this was determined by fitting the data to a gamma distribution. Peaks (Zinba) Enriched regions for FAIRE data were called using ZINBA (Zero Inflated Negative Binomial Algorithm). ZINBA is a flexible statistical method that uses a generalized linear model to select genomic windows with enriched sequence counts after adjusting for relevant confounding factors such as mappability, GC content, and copy number alterations. Significant regions are selected using the set of standardized residuals below a false discovery rate (qvalue) threshold. Peaks were further refined using a shape detection algorithm to identify local maxima and boundaries of the Signal (Base Overlap) data within each significant region. Signal (F-Seq Density) Density graph (wiggle) of signal enrichment calculated using F-Seq for the combined set of sequences from all replicates. F-Seq employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). This method does not look at fixed-length windows but rather weights contributions of nearby sequences in proportion to their distance from that base. It only considers sequences aligned 4 or less times in the genome, and uses an alignability background model to try to correct for regions where sequences cannot be aligned. For the K562, HepG2 and HelaS3 cell types, where there is an abnormal karyotype, a model to try to correct for amplifications and deletions was also used. No control data were used in the creation of these annotations. Signal (Base Overlap) An alternative version of the Signal (F-Seq Density) track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended in the following way: for DNase, the read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA; for FAIRE and ChIP, the sequence is extend to a fragment length of 134 bp from the 5' aligned end representing the approximate average fragment length. The score at each base pair represents the number of extended fragments that overlap the base pair. Alignments Mappings of short reads to the genome (currently only available for download). Additional data that were used to generate these tracks are located in the ENCODE Mappability track: Uniqueness The Duke uniqueness tracks were used in identify regions of unique sequence for different tag lengths. The tracks also identify regions where high-throughput sequence tags cannot be mapped. Excluded RegionsThe Duke excluded regions track was used to identify problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). These regions of the genome were excluded from the Open Chromatin tracks. Methods Cells were grown according to the approved ENCODE cell culture protocols. DNaseI hypersensitive sites were isolated using methods called DNase-seq or DNase-chip (Boyle et al., 2008a, Crawford et al., 2006). Briefly, cells were lysed with NP40, and intact nuclei were digested with optimal levels of DNaseI enzyme. DNaseI digested ends were captured from three different DNase concentrations, and material was sequenced using Illumina (Solexa) sequencing. DNase-seq data were verified using material that was hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome). Multiple independent growths (replicates) were compared to verify the reproducibility of the data. A more detailed protocol is available here. FAIRE was performed (Giresi et al., 2007) by cross-linking proteins to DNA using 1% formaldehyde solution, and the complex was sheared using sonication. Phenol/chloroform extractions were performed to remove DNA fragments cross-linked to protein. The DNA recovered in the aqueous phase was hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome) and sequenced using a Solexa sequencing system. The ENCODE array data were used to verify the accuracy of the sequencing data, and multiple independent growths (replicates) were compared to assess the reproducibility of the data. A more detailed protocol is available here. Also see Giresi et al., 2009. To perform ChIP, proteins were cross-linked to DNA in vivo using 1% formaldehyde solution (Bhinge et al., 2007, ENCODE Project Consortium., 2007). Cross-linked chromatin was sheared by sonication and immunoprecipitated using a specific antibody against the protein of interest. After reversal of the cross-links, the immunoprecipitated DNA was used to identify the genomic location of transcription factor binding. This was accomplished by Solexa sequencing of the ends of the immunoprecipitated DNA (ChIP-seq), as well as labeling and hybridization of the immunoprecipitated DNA to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (ChIP-chip). The ENCODE array data were used to verify the accuracy of the sequencing data, and multiple independent growths (replicates) were compared to assess the reproducibility of the data. A more detailed protocol is available here. ENCODE Array data were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck, et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created to allow for ROC analysis. Sequences from each experiment were aligned to the genome using Maq (Li et al., 2008) and those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes). The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Discrete DNase HS, FAIRE, and ChIP sites (peaks) were identified from DNase/FAIRE/ChIP-seq using F-Seq by setting a Parzen cutoff based on ROC curve analysis using peaks and non-peaks identified from DNase/FAIRE/ChIP-chip using NimbleGen Human ENCODE tiling arrays (1% of the genome). Input data was generated for GM12878, K562, HeLa-S3, HepG2, and HUVEC. These were used directly to create a control/background model used for F-Seq when generating signal annotations and subsequenntly peaks for these cell lines. These models are meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data is not being generated directly for other cell lines. Instead, a general background model was derived from the five Input data sets. This should provide corrections for sequencing biases and alignment artifacts, but obviously not for cell type specific copy number changes. Release Notes This is Release 3 (Mar 2010) of this track, which includes 18 new cell line or cell/treatment experiments. In addition, a number of new experiments were added to existing cell lines. Almost all Peaks have been called anew using improved cut-offs and p-Values. Finally, a second type of peak called using a ZINBA algorithm has been provided for several of the FAIRE-seq experiments. For all new versions of previously-released data, the affected database tables and files include 'V2' or 'V3' in the name, and metadata is marked with "submittedDataVersion=V", followed by a number and reason for replacement. Previous versions of these files are available for download from the FTP site. Credits These data and annotations were created by a collaboration of multiple institutions (contact: Terry Furey): Duke University's Institute for Genome Sciences & Policy (IGSP): Alan Boyle, Lingyun Song, Terry Furey (now at UNC), and Greg Crawford University of North Carolina at Chapel Hill: Paul Giresi and Jason Lieb Universty of Texas at Austin: Zheng Liu, Ryan McDaniell, Bum-Kyu Lee, and Vishy Iyer European Bioinformatics Insitute: Paul Flicek, Damian Keefe, and Ewan Birney University of Cambridge, Department of Oncology and CR-UK Cambridge Research Institute (CRI) : Stefan Graf We thank NHGRI for ENCODE funding support. References Bhinge AA, Kim J, Euskirchen GM, Snyder M, Iyer, VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6. Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22. Boyle AP, Guinney J, Crawford GE, and Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8. Buck MJ, Nobel AB, Lieb JD. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 2005;6(11):R97. Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods. 2006 Jul;3(7):503-9. Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolated active regulatory elements in human chromatin. Genome Res. 2007 Jun;17(6):877-85. Giresi PG, Lieb JD. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods. 2009 Jul;48(3):233-9. Li H, Ruan J, and Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008 Nov;18(11):1851-8. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeChromatinMapViewZinba Zinba Peaks ENCODE Open Chromatin, Duke/UNC/UT Regulation wgEncodeUncFAIREseqZinbaNhek NHEK FAIRE Zn NHEK FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 558 Crawford UNC wgEncodeUncFAIREseqZinbaNhek None peaksZinba epidermal keratinocytes FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in NHEK cells) Regulation wgEncodeUncFAIREseqZinbaK562 K562 FAIRE Zn K562 FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 531 Crawford UNC wgEncodeUncFAIREseqZinbaK562 None peaksZinba leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in K562 cells) Regulation wgEncodeUncFAIREseqZinbaHuvec HUVEC FAIRE Zn HUVEC FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 549 Crawford UNC wgEncodeUncFAIREseqZinbaHuvec None peaksZinba umbilical vein endothelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in HUVEC cells) Regulation wgEncodeUncFAIREseqZinbaHepg2 HepG2 FAIRE Zn HepG2 FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 546 Crawford UNC wgEncodeUncFAIREseqZinbaHepg2 None peaksZinba hepatocellular carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in HepG2 cells) Regulation wgEncodeUncFAIREseqZinbaHelas3 HeLa FAIRE Zn HeLa-S3 FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 544 Crawford UNC wgEncodeUncFAIREseqZinbaHelas3 None peaksZinba cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in HeLa-S3 cells) Regulation wgEncodeUncFAIREseqZinbaH1hesc H1es FAIRE Zn H1-hESC FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 557 Crawford UNC wgEncodeUncFAIREseqZinbaH1hesc None peaksZinba embryonic stem cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in H1-hESC cells) Regulation wgEncodeUncFAIREseqZinbaGm12878 GM12878 FAIRE Zn GM12878 FaireSeq ENCODE June 2010 Freeze 2010-03-24 2010-12-24 533 Crawford UNC wgEncodeUncFAIREseqZinbaGm12878 None peaksZinba B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina ENCODE Open Chromatin, UNC FAIRE-seq Zinba Peaks (in GM12878 cells) Regulation wgEncodeChromatinMapViewSIG Signal (F-Seq Density) ENCODE Open Chromatin, Duke/UNC/UT Regulation wgEncodeUtaustinChIPseqSignalProgfib ProgFib Input Input ProgFib ChipSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-22 593 Crawford UT-A fseq v 1.82 input wgEncodeUtaustinChIPseqSignalProgfib Signal fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in ProgFib cells) Regulation wgEncodeUtaChIPseqSignalProgfibPol2 ProgFib Pol2 Sig Pol2 ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 606 Crawford UT-A fseq v 1.82, iff_FB0167P exp wgEncodeUtaChIPseqSignalProgfibPol2 Signal RNA Polymerase II fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in ProgFib cells) Regulation wgEncodeUtaChIPseqSignalProgfibCtcf ProgFib CTCF FD CTCF ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-02 2010-10-02 600 Crawford UT-A fseq v 1.82, iff_FB0167P exp wgEncodeUtaChIPseqSignalProgfibCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in ProgFib cells) Regulation wgEncodeDukeDNaseSeqSignalProgfib ProgFib DNase FD ProgFib DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 576 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalProgfib Signal fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in ProgFib cells) Regulation wgEncodeUncFAIREseqSignalPanislets PanIsle FAIRE FD PanIslets FaireSeq ENCODE Sep 2009 Freeze 2009-10-14 2010-07-14 573 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalPanislets Signal pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in PanIslets cells) Regulation wgEncodeDukeDNaseSeqSignalPanislets PanIs DNase FD PanIslets DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 575 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalPanislets Signal pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in PanIslets cells) Regulation wgEncodeUtaChIPseqSignalNhekCtcf NHEK CTCF FD CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 559 Crawford UT-A fseq v 1.82, iff_generic_female exp wgEncodeUtaChIPseqSignalNhekCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in NHEK cells) Regulation wgEncodeUncFAIREseqSignalNhek NHEK FAIRE FD NHEK FaireSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 558 Crawford UNC fseq v 1.82, iff_generic_female wgEncodeUncFAIREseqSignalNhek Signal epidermal keratinocytes FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in NHEK cells) Regulation wgEncodeDukeDNaseSeqSignalNhek NHEK DNase FD NHEK DnaseSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 553 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalNhek Signal epidermal keratinocytes DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in NHEK cells) Regulation wgEncodeUncFAIREseqSignalNhbe NHBE FAIRE FD NHBE FaireSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 604 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalNhbe Signal bronchial epithelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in NHBE cells) Regulation wgEncodeDukeDNaseSeqSignalMyometr Myometr DNase FD Myometr DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 603 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalMyometr Signal myometrial cells DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in Myometr cells) Regulation wgEncodeDukeDNaseSeqSignalMelano Melano DNase FD Melano DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 602 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalMelano Signal epidermal melanocytes DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in Melano cells) Regulation wgEncodeDukeDNaseSeqSignalMedullo Medullo DNase FD Medullo DnaseSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 574 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalMedullo Signal medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in Medullo cells) Regulation wgEncodeUtaustinChIPseqSignalMcf7 MCF7 Input Sig Input MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-22 2010-09-22 589 Crawford UT-A fseq v 1.82 input wgEncodeUtaustinChIPseqSignalMcf7 Signal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in MCF-7 cells) Regulation wgEncodeUtaChIPseqSignalMcf7Cmyc MCF7 cMyc FD c-Myc MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 599 Crawford UT-A fseq v 1.82, iff_MCF7 exp wgEncodeUtaChIPseqSignalMcf7Cmyc Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in MCF-7 cells) Regulation wgEncodeUtaChIPseqSignalMcf7Ctcf MCF7 CTCF FD CTCF MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 598 Crawford UT-A fseq v 1.82, iff_MCF7 exp wgEncodeUtaChIPseqSignalMcf7Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in MCF-7 cells) Regulation wgEncodeDukeDNaseSeqSignalMcf7 MCF7 DNase FD MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 579 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalMcf7 Signal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in MCF-7 cells) Regulation wgEncodeUncFAIREseqSignalLhsrAndro LHSR FAIRE FD LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 591 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalLhsrAndro androgen Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in LHSR/androgen cells) Regulation wgEncodeDukeDNaseSeqSignalLhsrAndro LHSR DNase FD LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 582 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalLhsrAndro androgen Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in LHSR/androgen cells) Regulation wgEncodeUncFAIREseqSignalLhsr LHSR FAIRE FD LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 590 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalLhsr Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in LHSR cells) Regulation wgEncodeDukeDNaseSeqSignalLhsr LHSR DNase FD LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 578 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalLhsr Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in LHSR cells) Regulation wgEncodeUtaustinChIPseqSignalK562Input K562 Input FD Input K562 ChipSeq ENCODE Feb 2009 Freeze 2008-12-05 2009-08-05 529 Crawford UT-A F-Seq 1.0 input wgEncodeUtaustinChIPseqSignalK562Input Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in K562 cells) Regulation wgEncodeUtaChIPseqSignalK562Pol2 K562 Pol2 FD Pol2 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 555 Crawford UT-A fseq v 1.82, iff_K562 exp wgEncodeUtaChIPseqSignalK562Pol2 Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in K562 cells) Regulation wgEncodeUtaChIPseqSignalK562CmycV2 K562 c-Myc FD c-Myc K562 ChipSeq ENCODE Sep 2009 Freeze 2009-09-08 2009-02-27 2009-11-27 536 Crawford UT-A fseq v 1.82, iff_K562 exp wgEncodeUtaChIPseqSignalK562CmycV2 Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in K562 cells) Regulation wgEncodeUtaChIPseqSignalK562CtcfV2 K562 CTCF FD CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-09-08 2009-02-27 2009-11-27 535 Crawford UT-A fseq v 1.82, iff_K562 exp wgEncodeUtaChIPseqSignalK562CtcfV2 Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in K562 cells) Regulation wgEncodeUncFAIREseqSignalK562V2 K562 FAIRE FD K562 FaireSeq ENCODE Sep 2009 Freeze 2009-10-02 2009-02-26 2009-11-26 531 Crawford UNC fseq v 1.82, iff_K562 wgEncodeUncFAIREseqSignalK562V2 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in K562 cells) Regulation wgEncodeDukeDNaseSeqSignalK562V2 K562 DNase FD K562 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-08 2009-02-26 2009-11-26 530 Crawford Duke fseq v 1.82, iff_K562 wgEncodeDukeDNaseSeqSignalK562V2 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in K562 cells) Regulation wgEncodeUtaustinChIPseqSignalHuvecInput HUVEC Input FD Input HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-24 550 Crawford UT-A fseq v 1.82, iff_HUVEC input wgEncodeUtaustinChIPseqSignalHuvecInput Signal umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in HUVEC cells) Regulation wgEncodeUtaChIPseqSignalHuvecPol2 HUVEC Pol2 FD Pol2 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 552 Crawford UT-A fseq v 1.82, iff_HUVEC exp wgEncodeUtaChIPseqSignalHuvecPol2 Signal RNA Polymerase II umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in HUVEC cells) Regulation wgEncodeUtaChIPseqSignalHuvecCmyc HUVEC c-Myc FD c-Myc HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-01 2010-07-01 561 Crawford UT-A fseq v 1.82, iff_HUVEC exp wgEncodeUtaChIPseqSignalHuvecCmyc Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in HUVEC cells) Regulation wgEncodeUtaChIPseqSignalHuvecCtcf HUVEC CTCF FD CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-25 2010-06-25 551 Crawford UT-A fseq v 1.82, iff_HUVEC exp wgEncodeUtaChIPseqSignalHuvecCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in HUVEC cells) Regulation wgEncodeUncFAIREseqSignalHuvec HUVEC FAIRE FD HUVEC FaireSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-24 549 Crawford UNC fseq v 1.82, iff_HUVEC wgEncodeUncFAIREseqSignalHuvec Signal umbilical vein endothelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqSignalHuvec HUVEC DNase FD HUVEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-24 548 Crawford Duke fseq v 1.82, iff_HUVEC wgEncodeDukeDNaseSeqSignalHuvec Signal umbilical vein endothelial cells DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqSignalHsmmt HSMMt DNase FD HSMMtube DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 585 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalHsmmt Signal skeletal muscle myotubes differentiated from the HSMM cell line DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HSMMtube cells) Regulation wgEncodeDukeDNaseSeqSignalHsmm HSMM DNase FD HSMM DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 584 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalHsmm Signal skeletal muscle myoblasts DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HSMM cells) Regulation wgEncodeUtaustinChIPseqSignalHepg2Input HepG2 Input FD Input HepG2 ChipSeq ENCODE Feb 2009 Freeze 2009-03-13 2009-12-13 538 Crawford UT-A F-Seq 1.81 input wgEncodeUtaustinChIPseqSignalHepg2Input Signal hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in HepG2 cells) Regulation wgEncodeUtaChIPseqSignalHepg2Pol2 HepG2 Pol2 FD Pol2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 554 Crawford UT-A fseq v 1.82, iff_HepG2 exp wgEncodeUtaChIPseqSignalHepg2Pol2 Signal RNA Polymerase II hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in HepG2 cells) Regulation wgEncodeUtaChIPseqSignalHepg2CmycV2 HepG2 c-Myc FD c-Myc HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2009-03-22 2009-12-22 545 Crawford UT-A fseq v 1.82, iff_HepG2 exp wgEncodeUtaChIPseqSignalHepg2CmycV2 Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in HepG2 cells) Regulation wgEncodeUtaChIPseqSignalHepg2CtcfV2 HepG2 CTCF FD CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2009-03-21 2009-12-21 543 Crawford UT-A fseq v 1.82, iff_HepG2 exp wgEncodeUtaChIPseqSignalHepg2CtcfV2 Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in HepG2 cells) Regulation wgEncodeUncFAIREseqSignalHepg2V2 HepG2 FAIRE FD HepG2 FaireSeq ENCODE Sep 2009 Freeze 2009-09-28 2009-04-17 2010-01-17 546 Crawford UNC fseq v 1.82, iff_HepG2 wgEncodeUncFAIREseqSignalHepg2V2 Signal hepatocellular carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in HepG2 cells) Regulation wgEncodeDukeDNaseSeqSignalHepg2V2 HepG2 DNase FD HepG2 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-28 2009-03-11 2009-12-11 537 Crawford Duke fseq v 1.82, iff_HepG2 wgEncodeDukeDNaseSeqSignalHepg2V2 Signal hepatocellular carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HepG2 cells) Regulation wgEncodeUncFAIREseqSignalHelas3Ifng4h HeLa FAIRE Sig HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 588 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalHelas3Ifng4h IFNg4h Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in HeLa-S3/IFNg4h cells) Regulation wgEncodeUncFAIREseqSignalHelas3Ifna4h HeLa FAIRE FD HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 587 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalHelas3Ifna4h IFNa4h Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 4 hours of 500 U/ml Interferon alpha (Crawford) Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in HeLa-S3/IFNa4h cells) Regulation wgEncodeDukeDNaseSeqSignalHelas3Ifna4h HeLa DNase FD HeLa-S3 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 577 Crawford Duke fseq v 1.82, iff_HelaS3 wgEncodeDukeDNaseSeqSignalHelas3Ifna4h IFNa4h Signal cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HeLa-S3/IFNa4h cells) Regulation wgEncodeUtaustinChIPseqSignalHelas3Input HeLa Input FD Input HeLa-S3 ChipSeq ENCODE Feb 2009 Freeze 2009-03-16 2009-12-16 539 Crawford UT-A F-Seq 1.81 input wgEncodeUtaustinChIPseqSignalHelas3Input Signal cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqSignalHelas3Pol2 HeLa Pol2 FD Pol2 HeLa-S3 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 597 Crawford UT-A fseq v 1.82, iff_HelaS3 exp wgEncodeUtaChIPseqSignalHelas3Pol2 Signal RNA Polymerase II cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqSignalHelas3CmycV2 HeLa c-Myc FD c-Myc HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-05 2009-03-21 2009-12-21 542 Crawford UT-A fseq v 1.82, iff_HelaS3 exp wgEncodeUtaChIPseqSignalHelas3CmycV2 Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqSignalHelas3CtcfV2 HeLa CTCF FD CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-09-10 2009-03-21 2009-12-21 541 Crawford UT-A fseq v 1.82, iff_HelaS3 exp wgEncodeUtaChIPseqSignalHelas3CtcfV2 Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in HeLa-S3 cells) Regulation wgEncodeUncFAIREseqSignalHelas3V2 HeLa FAIRE FD HeLa-S3 FaireSeq ENCODE Sep 2009 Freeze 2009-10-02 2009-03-22 2009-12-22 544 Crawford UNC fseq v 1.82, iff_HelaS3 wgEncodeUncFAIREseqSignalHelas3V2 Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqSignalHelas3V2 HeLa DNase FD HeLa-S3 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-10 2009-03-21 2009-12-21 540 Crawford Duke fseq v 1.82, iff_HelaS3 wgEncodeDukeDNaseSeqSignalHelas3V2 Signal cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqSignalH9es H9ES DNase FD H9ES DnaseSeq ENCODE Jan 2010 Freeze 2009-12-24 2010-09-23 594 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalH9es Signal embryonic stem cell (hESC) H9 DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in H9ES cells) Regulation wgEncodeUtaChIPseqSignalH1hescPol2 H1-hESC Pol2 FD Pol2 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 563 Crawford UT-A fseq v 1.82, iff_generic_male exp wgEncodeUtaChIPseqSignalH1hescPol2 Signal RNA Polymerase II embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in H1-hESC cells) Regulation wgEncodeUtaChIPseqSignalH1hescCmyc H1-hESC cMyc FD c-Myc H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 596 Crawford UT-A fseq v 1.82, iff_generic_male exp wgEncodeUtaChIPseqSignalH1hescCmyc Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in H1-hESC cells) Regulation wgEncodeUtaChIPseqSignalH1hescCtcf H1-hESC CTCF FD CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-10-01 2010-07-01 560 Crawford UT-A fseq v 1.82, iff_generic_male exp wgEncodeUtaChIPseqSignalH1hescCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in H1-hESC cells) Regulation wgEncodeUncFAIREseqSignalH1hesc H1-hESC FAIRE FD H1-hESC FaireSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 557 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalH1hesc Signal embryonic stem cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in H1-hESC cells) Regulation wgEncodeDukeDNaseSeqSignalH1hesc H1-hESC DNase FD H1-hESC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 556 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalH1hesc Signal embryonic stem cells DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in H1-hESC cells) Regulation wgEncodeUtaChIPseqSignalGm19240Ctcf GM19240 CTCF FD CTCF GM19240 ChipSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 572 Crawford UT-A fseq v 1.82, iff_generic_female exp wgEncodeUtaChIPseqSignalGm19240Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM19240 cells) Regulation wgEncodeDukeDNaseSeqSignalGm19240 GM19240 DNase FD GM19240 DnaseSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 568 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalGm19240 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM19240 cells) Regulation wgEncodeUtaChIPseqSignalGm19239Ctcf GM19239 CTCF FD CTCF GM19239 ChipSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 571 Crawford UT-A fseq v 1.82, iff_generic_male exp wgEncodeUtaChIPseqSignalGm19239Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM19239 cells) Regulation wgEncodeUncFAIREseqSignalGm19239 GM19239 FAIRE FD GM19239 FaireSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 580 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalGm19239 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in GM19239 cells) Regulation wgEncodeDukeDNaseSeqSignalGm19239 GM19239 DNase FD GM19239 DnaseSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 567 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalGm19239 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM19239 cells) Regulation wgEncodeUtaChIPseqSignalGm19238Ctcf GM19238 CTCF FD CTCF GM19238 ChipSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 570 Crawford UT-A fseq v 1.82, iff_generic_female exp wgEncodeUtaChIPseqSignalGm19238Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM19238 cells) Regulation wgEncodeDukeDNaseSeqSignalGm19238 GM19238 DNase FD GM19238 DnaseSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 566 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalGm19238 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM19238 cells) Regulation wgEncodeUncFAIREseqSignalGm18507 GM18507 FAIRE FD GM18507 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 586 Crawford UNC fseq v 1.82, iff_generic_male wgEncodeUncFAIREseqSignalGm18507 Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in GM18507 cells) Regulation wgEncodeDukeDNaseSeqSignalGm18507 GM18507 DNase FD GM18507 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 581 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalGm18507 Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM18507 cells) Regulation wgEncodeUtaChIPseqSignalGm12892Ctcf GM12892 CTCF FD CTCF GM12892 ChipSeq ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 562 Crawford UT-A fseq v 1.82, iff_generic_female exp wgEncodeUtaChIPseqSignalGm12892Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM12892 cells) Regulation wgEncodeDukeDNaseSeqSignalGm12892 GM12892 DNase FD GM12892 DnaseSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 565 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalGm12892 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM12892 cells) Regulation wgEncodeUtaChIPseqSignalGm12891Ctcf GM12891 CTCF FD CTCF GM12891 ChipSeq ENCODE Sep 2009 Freeze 2009-10-06 2010-07-06 569 Crawford UT-A fseq v 1.82, iff_generic_male exp wgEncodeUtaChIPseqSignalGm12891Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM12891 cells) Regulation wgEncodeDukeDNaseSeqSignalGm12891 GM12891 DNase FD GM12891 DnaseSeq ENCODE Sep 2009 Freeze 2009-10-03 2010-07-03 564 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalGm12891 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM12891 cells) Regulation wgEncodeUtaustinChIPseqSignalGm12878Input GM12878 Input FD Input GM12878 ChipSeq ENCODE Nov 2008 Freeze 2008-11-07 2009-07-07 528 Crawford UT-A 1.0 input wgEncodeUtaustinChIPseqSignalGm12878Input Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in GM12878 cells) Regulation wgEncodeUtaChIPseqSignalGm12878Pol2 GM12878 Pol2 FD Pol2 GM12878 ChipSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-22 592 Crawford UT-A fseq v 1.82, iff_GM12878 exp wgEncodeUtaChIPseqSignalGm12878Pol2 Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in GM12878 cells) Regulation wgEncodeUtaChIPseqSignalGm12878Cmyc GM12878 c-Myc FD c-Myc GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-09-08 2010-06-08 547 Crawford UT-A fseq v 1.82, iff_GM12878 exp wgEncodeUtaChIPseqSignalGm12878Cmyc Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in GM12878 cells) Regulation wgEncodeUtaChIPseqSignalGm12878CtcfV2 GM12878 CTCF FD CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-09-04 2009-02-24 2009-11-24 532 Crawford UT-A fseq v 1.82, iff_GM12878 exp wgEncodeUtaChIPseqSignalGm12878CtcfV2 Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Signal ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM12878 cells) Regulation wgEncodeUncFAIREseqSignalGm12878V2 GM12878 FAIRE FD GM12878 FaireSeq ENCODE Sep 2009 Freeze 2009-09-08 2009-02-25 2009-11-25 533 Crawford UNC fseq v 1.82, iff_GM12878 wgEncodeUncFAIREseqSignalGm12878V2 Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Signal ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqSignalGm12878V2 GM12878 DNase FD GM12878 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-02 2009-02-27 2009-11-27 534 Crawford Duke DNaseHS, fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalGm12878V2 Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqSignalFibrop FibroP DNase FD FibroP DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 605 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalFibrop Signal fibroblasts taken from individuals with Parkinson's disease, AG20443, AG08395 and AG08396 were pooled for this sample DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in FibroP cells) Regulation wgEncodeDukeDNaseSeqSignalFibrobl Fibrobl DNase FD Fibrobl DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 583 Crawford Duke fseq v 1.82, iff_GM12878 wgEncodeDukeDNaseSeqSignalFibrobl Signal child fibroblast DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in Fibrobl cells) Regulation wgEncodeDukeDNaseSeqSignalChorion Chorion DNase FD Chorion DnaseSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 595 Crawford Duke fseq v 1.82, iff_generic_female wgEncodeDukeDNaseSeqSignalChorion Signal chorion cells (outermost of two fetal membranes), fetal membranes were collected from women who underwent planned cesarean delivery at term, before labor and without rupture of membranes. DNaseI HS Sequencing Crawford Crawford - Duke University Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in Chorion cells) Regulation wgEncodeDukeDNaseSeqSignalAosmcSerumfree AoSMC DNase FD AoSMC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 601 Crawford Duke fseq v 1.82, iff_generic_male wgEncodeDukeDNaseSeqSignalAosmcSerumfree serum_free_media Signal aortic smooth muscle cells DNaseI HS Sequencing Crawford Crawford - Duke University Grown with growth factors, then switched to media that contains no FBS for 36 hours (Crawford) Signal ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in AoSMC cells) Regulation wgEncodeChromatinMapViewSIGBO Signal (Base Overlap) ENCODE Open Chromatin, Duke/UNC/UT Regulation wgEncodeUtaChIPseqBaseOverlapSignalProgfibPol2 ProgFib Pol2 BO Pol2 ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 606 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalProgfibPol2 Base_Overlap_Signal RNA Polymerase II fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in ProgFib cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalProgfibCtcf ProgFib CTCF BO CTCF ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-02 2010-10-02 600 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalProgfibCtcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in ProgFib cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalProgfib ProgFib DNase BO ProgFib DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 576 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalProgfib Base_Overlap_Signal fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in ProgFib cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalPanislets PanIsle FAIRE BO PanIslets FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-14 2010-07-14 573 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalPanislets Base_Overlap_Signal pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in PanIslets cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalPanislets PanIs DNase BO PanIslets DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 575 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalPanislets Base_Overlap_Signal pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in PanIslets cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalNhekCtcf NHEK CTCF BO CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-09-30 2010-06-30 559 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalNhekCtcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in NHEK cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalNhek NHEK FAIRE BO NHEK FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-30 2010-06-30 558 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalNhek Base_Overlap_Signal epidermal keratinocytes FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in NHEK cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalNhek NHEK DNase BO NHEK DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-29 2010-06-29 553 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalNhek Base_Overlap_Signal epidermal keratinocytes DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in NHEK cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalNhbe NHBE FAIRE BO NHBE FaireSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 604 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalNhbe Base_Overlap_Signal bronchial epithelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in NHBE cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalMyometr Myometr DNase BO Myometr DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 603 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalMyometr Base_Overlap_Signal myometrial cells DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in Myometr cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalMelano Melano DNase BO Melano DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 602 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalMelano Base_Overlap_Signal epidermal melanocytes DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in Melano cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalMedullo Medullo DNase BO Medullo DnaseSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 574 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalMedullo Base_Overlap_Signal medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in Medullo cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalMcf7Cmyc MCF7 cMyc BO c-Myc MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 599 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalMcf7Cmyc Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in MCF-7 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalMcf7Ctcf MCF7 CTCF BO CTCF MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 598 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalMcf7Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in MCF-7 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalMcf7 MCF7 DNase BO MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 579 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalMcf7 Base_Overlap_Signal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in MCF-7 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalLhsrAndro LHSR FAIRE BO LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 591 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalLhsrAndro androgen Base_Overlap_Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in LHSR/androgen cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalLhsrAndro LHSR DNase BO LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 582 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalLhsrAndro androgen Base_Overlap_Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in LHSR/androgen cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalLhsr LHSR FAIRE BO LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 590 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalLhsr Base_Overlap_Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in LHSR cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalLhsr LHSR DNase BO LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 578 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalLhsr Base_Overlap_Signal prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in LHSR cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalK562Pol2 K562 Pol2 BO Pol2 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-11-06 2009-09-29 2010-06-29 555 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalK562Pol2 Base_Overlap_Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in K562 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalK562CmycV2 K562 c-Myc BO c-Myc K562 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-02-27 2009-11-27 536 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalK562CmycV2 Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in K562 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalK562CtcfV2 K562 CTCF BO CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-02-27 2009-11-27 535 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalK562CtcfV2 Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in K562 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalK562V2 K562 FAIRE BO K562 FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2008-12-09 2009-08-09 531 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalK562V2 Base_Overlap_Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in K562 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalK562V2 K562 DNase BO K562 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-03 2008-12-09 2009-08-09 530 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalK562V2 Base_Overlap_Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in K562 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHuvecPol2 HUVEC Pol2 BO Pol2 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-11-06 2009-09-28 2010-06-28 552 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHuvecPol2 Base_Overlap_Signal RNA Polymerase II umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in HUVEC cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHuvecCmyc HUVEC c-Myc BO c-Myc HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-11-06 2009-10-01 2010-07-01 561 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHuvecCmyc Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in HUVEC cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHuvecCtcf HUVEC CTCF BO CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-09-25 2010-06-25 551 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHuvecCtcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in HUVEC cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalHuvec HUVEC FAIRE BO HUVEC FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-24 2010-06-24 549 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalHuvec Base_Overlap_Signal umbilical vein endothelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHuvec HUVEC DNase BO HUVEC DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-24 2010-06-24 548 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHuvec Base_Overlap_Signal umbilical vein endothelial cells DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHsmmt HSMMt DNase BO HSMMtube DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 585 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHsmmt Base_Overlap_Signal skeletal muscle myotubes differentiated from the HSMM cell line DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HSMMtube cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHsmm HSMM DNase BO HSMM DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 584 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHsmm Base_Overlap_Signal skeletal muscle myoblasts DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HSMM cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHepg2Pol2 HepG2 Pol2 BO Pol2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-11-07 2009-09-29 2010-06-29 554 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHepg2Pol2 Base_Overlap_Signal RNA Polymerase II hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in HepG2 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHepg2CmycV2 HepG2 c-Myc BO c-Myc HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-03-22 2009-12-22 545 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHepg2CmycV2 Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in HepG2 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHepg2CtcfV2 HepG2 CTCF BO CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-03-21 2009-12-21 543 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHepg2CtcfV2 Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in HepG2 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalHepg2V2 HepG2 FAIRE BO HepG2 FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-04-17 2010-01-17 546 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalHepg2V2 Base_Overlap_Signal hepatocellular carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in HepG2 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHepg2V2 HepG2 DNase BO HepG2 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-03-11 2009-12-11 537 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHepg2V2 Base_Overlap_Signal hepatocellular carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HepG2 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalHelas3Ifng4h HeLa FAIRE BO HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 588 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalHelas3Ifng4h IFNg4h Base_Overlap_Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in HeLa-S3/IFNg4h cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalHelas3Ifna4h HeLa FAIRE BO HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 587 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalHelas3Ifna4h IFNa4h Base_Overlap_Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 4 hours of 500 U/ml Interferon alpha (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in HeLa-S3/IFNa4h cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHelas3Ifna4h HeLa DNase BO HeLa-S3 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 577 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHelas3Ifna4h IFNa4h Base_Overlap_Signal cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HeLa-S3/IFNa4h cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHelas3Pol2 HeLa Pol2 BO Pol2 HeLa-S3 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 597 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHelas3Pol2 Base_Overlap_Signal RNA Polymerase II cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHelas3CmycV2 HeLa c-Myc BO c-Myc HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-03-21 2009-12-21 542 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHelas3CmycV2 Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalHelas3CtcfV2 HeLa CTCF BO CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-03-21 2009-12-21 541 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalHelas3CtcfV2 Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in HeLa-S3 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalHelas3V2 HeLa FAIRE BO HeLa-S3 FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-03-22 2009-12-22 544 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalHelas3V2 Base_Overlap_Signal cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalHelas3V2 HeLa DNase BO HeLa-S3 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-07 2009-03-21 2009-12-21 540 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalHelas3V2 Base_Overlap_Signal cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalH9es H9ES DNase BO H9ES DnaseSeq ENCODE Jan 2010 Freeze 2009-12-24 2010-09-23 594 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalH9es Base_Overlap_Signal embryonic stem cell (hESC) H9 DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in H9ES cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalH1hescPol2 H1-hESC Pol2 BO Pol2 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-11-06 2009-10-02 2010-07-02 563 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalH1hescPol2 Base_Overlap_Signal RNA Polymerase II embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in H1-hESC cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalH1hescCmyc H1-hESC cMyc BO c-Myc H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 596 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalH1hescCmyc Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in H1-hESC cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalH1hescCtcf H1-hESC CTCF BO CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-10-01 2010-07-01 560 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalH1hescCtcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in H1-hESC cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalH1hesc H1-hESC FAIRE BO H1-hESC FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-30 2010-06-30 557 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalH1hesc Base_Overlap_Signal embryonic stem cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in H1-hESC cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalH1hesc H1-hESC DNase BO H1-hESC DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-09-30 2010-06-30 556 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalH1hesc Base_Overlap_Signal embryonic stem cells DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in H1-hESC cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm19240Ctcf GM19240 CTCF BO CTCF GM19240 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-10-06 2010-07-06 572 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm19240Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM19240 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm19240 GM19240 DNase BO GM19240 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-06 2010-07-06 568 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm19240 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM19240 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm19239Ctcf GM19239 CTCF BO CTCF GM19239 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-10-06 2010-07-06 571 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm19239Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM19239 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalGm19239 GM19239 FAIRE BO GM19239 FaireSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 580 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalGm19239 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in GM19239 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm19239 GM192391 DNase BO GM19239 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-06 2010-07-06 567 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm19239 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM19239 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm19238Ctcf GM19238 CTCF BO CTCF GM19238 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-10-06 2010-07-06 570 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm19238Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM19238 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm19238 GM19238 DNase BO GM19238 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-06 2010-07-06 566 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm19238 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM19238 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalGm18507 GM18507 FAIRE BO GM18507 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 586 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalGm18507 Base_Overlap_Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in GM18507 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm18507 GM18507 DNase BO GM18507 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 581 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm18507 Base_Overlap_Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM18507 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm12892Ctcf GM12892 CTCF BO CTCF GM12892 ChipSeq ENCODE Sep 2009 Freeze 2009-11-07 2009-10-02 2010-07-02 562 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm12892Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM12892 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm12892 GM12892 DNase BO GM12892 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-06 2010-07-06 565 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm12892 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM12892 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm12891Ctcf GM12891 CTCF BO CTCF GM12891 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-10-06 2010-07-06 569 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm12891Ctcf Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM12891 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm12891 GM12891 DNase BO GM12891 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-10-03 2010-07-03 564 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm12891 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM12891 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm12878Pol2 GM12878 Pol2 BO Pol2 GM12878 ChipSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-22 592 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm12878Pol2 Base_Overlap_Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in GM12878 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm12878Cmyc GM12878 c-Myc BO c-Myc GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-11-06 2009-09-08 2010-06-08 547 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm12878Cmyc Base_Overlap_Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in GM12878 cells) Regulation wgEncodeUtaChIPseqBaseOverlapSignalGm12878CtcfV2 GM12878 CTCF BO CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-02-24 2009-11-24 532 Crawford UT-A baseAlignCounts.pl v 1 exp wgEncodeUtaChIPseqBaseOverlapSignalGm12878CtcfV2 Base_Overlap_Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM12878 cells) Regulation wgEncodeUncFAIREseqBaseOverlapSignalGm12878V2 GM12878 FAIRE BO GM12878 FaireSeq ENCODE Sep 2009 Freeze 2009-11-02 2009-02-25 2009-11-25 533 Crawford UNC baseAlignCounts.pl v 1 wgEncodeUncFAIREseqBaseOverlapSignalGm12878V2 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalGm12878V2 GM12878 DNase BO GM12878 DnaseSeq ENCODE Sep 2009 Freeze 2009-11-03 2009-02-27 2009-11-27 534 Crawford Duke DNaseHS, baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalGm12878V2 Base_Overlap_Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalFibrop FibroP DNase BO FibroP DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 605 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalFibrop Base_Overlap_Signal fibroblasts taken from individuals with Parkinson's disease, AG20443, AG08395 and AG08396 were pooled for this sample DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in FibroP cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalFibrobl Fibrobl DNase BO Fibrobl DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 583 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalFibrobl Base_Overlap_Signal child fibroblast DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in Fibrobl cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalChorion Chorion DNase BO Chorion DnaseSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 595 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalChorion Base_Overlap_Signal chorion cells (outermost of two fetal membranes), fetal membranes were collected from women who underwent planned cesarean delivery at term, before labor and without rupture of membranes. DNaseI HS Sequencing Crawford Crawford - Duke University An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in Chorion cells) Regulation wgEncodeDukeDNaseSeqBaseOverlapSignalAosmcSerumfree AoSMC DNase BO AoSMC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 601 Crawford Duke baseAlignCounts.pl v 1 wgEncodeDukeDNaseSeqBaseOverlapSignalAosmcSerumfree serum_free_media Base_Overlap_Signal aortic smooth muscle cells DNaseI HS Sequencing Crawford Crawford - Duke University Grown with growth factors, then switched to media that contains no FBS for 36 hours (Crawford) An alternative version of the F-Seq Density Signal track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA. The score at each base pair represents the number of extended fragments that overlap the base pair. ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in AoSMC cells) Regulation wgEncodeChromatinMapViewPeaks Peaks ENCODE Open Chromatin, Duke/UNC/UT Regulation wgEncodeUtaChIPseqPeaksProgfibPol2 ProgFib Pol2 Pk Pol2 ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 606 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksProgfibPol2 Peaks RNA Polymerase II fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in ProgFib cells) Regulation wgEncodeUtaChIPseqPeaksProgfibCtcf ProgFib CTCF Pk CTCF ProgFib ChipSeq ENCODE Jan 2010 Freeze 2010-01-02 2010-10-02 600 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksProgfibCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in ProgFib cells) Regulation wgEncodeDukeDNaseSeqPeaksProgfib ProgFib DNase Pk ProgFib DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 576 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksProgfib Peaks fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in ProgFib cells) Regulation wgEncodeUncFAIREseqPeaksPanislets PanIsle FAIRE Pk PanIslets FaireSeq ENCODE Sep 2009 Freeze 2009-10-14 2010-07-14 573 Crawford UNC Lieb Lab peaks wgEncodeUncFAIREseqPeaksPanislets Peaks pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in PanIslets cells) Regulation wgEncodeDukeDNaseSeqPeaksPanislets PanIs DNase Pk PanIslets DnaseSeq ENCODE Jan 2010 Freeze 2009-12-17 2010-09-17 575 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksPanislets Peaks pancreatic islets from 2 donors, the sources of these primary cells are cadavers from National Disease Research Interchange (NDRI) and another sample isolated as in Bucher, P. et al., Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 917 (2005) DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in PanIslets cells) Regulation wgEncodeUtaChIPseqPeaksNhekCtcfV2 NHEK CTCF Pk CTCF NHEK ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-30 2010-06-30 559 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksNhekCtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in NHEK cells) Regulation wgEncodeUncFAIREseqPeaksNhekV2 NHEK FAIRE Pk NHEK FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-30 2010-06-30 558 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksNhekV2 Peaks epidermal keratinocytes FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in NHEK cells) Regulation wgEncodeDukeDNaseSeqPeaksNhekV2 NHEK DNase Pk NHEK DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-29 2010-06-29 553 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksNhekV2 Peaks epidermal keratinocytes DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in NHEK cells) Regulation wgEncodeUncFAIREseqPeaksNhbe NHBE FAIRE Pk NHBE FaireSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 604 Crawford UNC p-value cutoff: 0.01 wgEncodeUncFAIREseqPeaksNhbe Peaks bronchial epithelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in NHBE cells) Regulation wgEncodeDukeDNaseSeqPeaksMyometr Myometr DNase Pk Myometr DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 603 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksMyometr Peaks myometrial cells DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in Myometr cells) Regulation wgEncodeDukeDNaseSeqPeaksMelano Melano DNase Pk Melano DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 602 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksMelano Peaks epidermal melanocytes DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in Melano cells) Regulation wgEncodeDukeDNaseSeqPeaksMedullo Medullo DNase Pk Medullo DnaseSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 574 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksMedullo Peaks medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in Medullo cells) Regulation wgEncodeUtaChIPseqPeaksMcf7Cmyc MCF7 cMyc Pk c-Myc MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 599 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksMcf7Cmyc Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in MCF-7 cells) Regulation wgEncodeUtaChIPseqPeaksMcf7Ctcf MCF7 CTCF Pk CTCF MCF-7 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 598 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksMcf7Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in MCF-7 cells) Regulation wgEncodeDukeDNaseSeqPeaksMcf7 MCF7 DNase Pk MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 579 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksMcf7 Peaks mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in MCF-7 cells) Regulation wgEncodeUncFAIREseqPeaksLhsrAndro LHSR FAIRE Pk LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 591 Crawford UNC p-value cutoff: 0.1 wgEncodeUncFAIREseqPeaksLhsrAndro androgen Peaks prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in LHSR/androgen cells) Regulation wgEncodeDukeDNaseSeqPeaksLhsrAndro LHSR DNase Pk LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 582 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksLhsrAndro androgen Peaks prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in LHSR/androgen cells) Regulation wgEncodeUncFAIREseqPeaksLhsr LHSR FAIRE Pk LHSR FaireSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-23 590 Crawford UNC p-value cutoff: 0.1 wgEncodeUncFAIREseqPeaksLhsr Peaks prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in LHSR cells) Regulation wgEncodeDukeDNaseSeqPeaksLhsr LHSR DNase Pk LHSR DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 578 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksLhsr Peaks prostate epithelial cells (PrEC), multiple human donors, all of whom are HIV-1, Hepatitis B and Hepatitis C negative, treatment: to create LHSR, cells were infected with amphotropic retroviruses encoding the SV40 large T antigen (L), the telomerase catalytic subunit hTERT (H), the SV40 small T antigen (S) and an oncogenic allele of H-ras (R). DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in LHSR cells) Regulation wgEncodeUtaChIPseqPeaksK562Pol2V2 K562 Pol2 Pk Pol2 K562 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-29 2010-06-29 555 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksK562Pol2V2 Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in K562 cells) Regulation wgEncodeUtaChIPseqPeaksK562CmycV3 K562 c-Myc Pk c-Myc K562 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 536 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksK562CmycV3 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in K562 cells) Regulation wgEncodeUtaChIPseqPeaksK562CtcfV3 K562 CTCF Pk CTCF K562 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 535 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksK562CtcfV3 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in K562 cells) Regulation wgEncodeUncFAIREseqPeaksK562V3 K562 FAIRE Pk K562 FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-04-20 2010-01-20 531 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksK562V3 Peaks leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in K562 cells) Regulation wgEncodeDukeDNaseSeqPeaksK562V3 K562 DNase Pk K562 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 530 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksK562V3 Peaks leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in K562 cells) Regulation wgEncodeUtaChIPseqPeaksHuvecPol2V2 HUVEC Pol2 Pk Pol2 HUVEC ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-28 2010-06-28 552 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHuvecPol2V2 Peaks RNA Polymerase II umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in HUVEC cells) Regulation wgEncodeUtaChIPseqPeaksHuvecCmycV2 HUVEC c-Myc Pk c-Myc HUVEC ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-01 2010-07-01 561 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHuvecCmycV2 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in HUVEC cells) Regulation wgEncodeUtaChIPseqPeaksHuvecCtcfV2 HUVEC CTCF Pk CTCF HUVEC ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-25 2010-06-25 551 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHuvecCtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in HUVEC cells) Regulation wgEncodeUncFAIREseqPeaksHuvecV2 HUVEC FAIRE Pk HUVEC FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-24 2010-06-24 549 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksHuvecV2 Peaks umbilical vein endothelial cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqPeaksHuvecV2 HUVEC DNase Pk HUVEC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-24 2010-06-24 548 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksHuvecV2 Peaks umbilical vein endothelial cells DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HUVEC cells) Regulation wgEncodeDukeDNaseSeqPeaksHsmmt HSMMt DNase Pk HSMMtube DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 585 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksHsmmt Peaks skeletal muscle myotubes differentiated from the HSMM cell line DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HSMMtube cells) Regulation wgEncodeDukeDNaseSeqPeaksHsmm HSMM DNase Pk HSMM DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 584 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksHsmm Peaks skeletal muscle myoblasts DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HSMM cells) Regulation wgEncodeUtaChIPseqPeaksHepg2Pol2V2 HepG2 Pol2 Pk Pol2 HepG2 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-29 2010-06-29 554 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHepg2Pol2V2 Peaks RNA Polymerase II hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in HepG2 cells) Regulation wgEncodeUtaChIPseqPeaksHepg2CmycV3 HepG2 c-Myc Pk c-Myc HepG2 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-22 2009-12-22 545 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHepg2CmycV3 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in HepG2 cells) Regulation wgEncodeUtaChIPseqPeaksHepg2CtcfV3 HepG2 CTCF Pk CTCF HepG2 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-21 2009-12-21 543 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHepg2CtcfV3 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in HepG2 cells) Regulation wgEncodeUncFAIREseqPeaksHepg2V3 HepG2 FAIRE Pk HepG2 FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-04-17 2010-01-17 546 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksHepg2V3 Peaks hepatocellular carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in HepG2 cells) Regulation wgEncodeDukeDNaseSeqPeaksHepg2V3 HepG2 DNase Pk HepG2 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 537 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksHepg2V3 Peaks hepatocellular carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HepG2 cells) Regulation wgEncodeUncFAIREseqPeaksHelas3Ifng4h HeLa FAIRE Pk HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 588 Crawford UNC p-value cutoff: 0.05 wgEncodeUncFAIREseqPeaksHelas3Ifng4h IFNg4h Peaks cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in HeLa-S3/IFNg4h cells) Regulation wgEncodeUncFAIREseqPeaksHelas3Ifna4h HeLa FAIRE Pk HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 587 Crawford UNC p-value cutoff: 0.05 wgEncodeUncFAIREseqPeaksHelas3Ifna4h IFNa4h Peaks cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina 4 hours of 500 U/ml Interferon alpha (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in HeLa-S3/IFNa4h cells) Regulation wgEncodeDukeDNaseSeqPeaksHelas3Ifna4h HeLa DNase Pk HeLa-S3 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 577 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksHelas3Ifna4h IFNa4h Peaks cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HeLa-S3/IFNa4h cells) Regulation wgEncodeUtaChIPseqPeaksHelas3Pol2 HeLa Pol2 Pk Pol2 HeLa-S3 ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-28 597 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksHelas3Pol2 Peaks RNA Polymerase II cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqPeaksHelas3CmycV3 HeLa-S3 c-Myc Pk c-Myc HeLa-S3 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-21 2009-12-21 542 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHelas3CmycV3 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in HeLa-S3 cells) Regulation wgEncodeUtaChIPseqPeaksHelas3CtcfV3 HeLa-S3 CTCF Pk CTCF HeLa-S3 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-21 2009-12-21 541 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksHelas3CtcfV3 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in HeLa-S3 cells) Regulation wgEncodeUncFAIREseqPeaksHelas3V3 HeLa-S3 FAIRE Pk HeLa-S3 FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-22 2009-12-22 544 Crawford UNC p-value cutoff: 0.05, ver3 wgEncodeUncFAIREseqPeaksHelas3V3 Peaks cervical carcinoma FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqPeaksHelas3V3 HeLa-S3 DNase Pk HeLa-S3 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-21 2009-12-21 540 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksHelas3V3 Peaks cervical carcinoma DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in HeLa-S3 cells) Regulation wgEncodeDukeDNaseSeqPeaksH9es H9ES DNase Pk H9ES DnaseSeq ENCODE Jan 2010 Freeze 2009-12-24 2010-09-23 594 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksH9es Peaks embryonic stem cell (hESC) H9 DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in H9ES cells) Regulation wgEncodeUtaChIPseqPeaksH1hescPol2V2 H1-hESC Pol2 Pk Pol2 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-02 2010-07-02 563 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksH1hescPol2V2 Peaks RNA Polymerase II embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in H1-hESC cells) Regulation wgEncodeUtaChIPseqPeaksH1hescCmyc H1-hESC cMyc Pk c-Myc H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 596 Crawford UT-A p-value cutoff: 0.01 exp wgEncodeUtaChIPseqPeaksH1hescCmyc Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in H1-hESC cells) Regulation wgEncodeUtaChIPseqPeaksH1hescCtcfV2 H1-hESC CTCF Pk CTCF H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-01 2010-07-01 560 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksH1hescCtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in H1-hESC cells) Regulation wgEncodeUncFAIREseqPeaksH1hescV2 H1-hESC FAIRE Pk H1-hESC FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-30 2010-06-30 557 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksH1hescV2 Peaks embryonic stem cells FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in H1-hESC cells) Regulation wgEncodeDukeDNaseSeqPeaksH1hescV2 H1-hESC DNase Pk H1-hESC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-30 2010-06-30 556 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksH1hescV2 Peaks embryonic stem cells DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in H1-hESC cells) Regulation wgEncodeUtaChIPseqPeaksGm19240CtcfV2 GM19240 CTCF Pk CTCF GM19240 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 572 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm19240CtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM19240 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm19240V2 GM19240 DNase Pk GM19240 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 568 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm19240V2 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM19240 cells) Regulation wgEncodeUtaChIPseqPeaksGm19239CtcfV2 GM19239 CTCF Pk CTCF GM19239 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 571 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm19239CtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM19239 cells) Regulation wgEncodeUncFAIREseqPeaksGm19239 GM19239 FAIRE Pk GM19239 FaireSeq ENCODE Jan 2010 Freeze 2009-12-18 2010-09-18 580 Crawford UNC p-value cutoff: 0.05 wgEncodeUncFAIREseqPeaksGm19239 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in GM19239 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm19239V2 GM19239 DNase Pk GM19239 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 567 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm19239V2 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM19239 cells) Regulation wgEncodeUtaChIPseqPeaksGm19238CtcfV2 GM19238 CTCF Pk CTCF GM19238 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 570 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm19238CtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM19238 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm19238V2 GM19238 DNase Pk GM19238 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 566 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm19238V2 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM19238 cells) Regulation wgEncodeUncFAIREseqPeaksGm18507 GM18507 FAIRE Pk GM18507 FaireSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 586 Crawford UNC p-value cutoff: 0.1 wgEncodeUncFAIREseqPeaksGm18507 Peaks lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in GM18507 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm18507 GM18507 DNase Pk GM18507 DnaseSeq ENCODE Jan 2010 Freeze 2009-12-19 2010-09-18 581 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksGm18507 Peaks lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM18507 cells) Regulation wgEncodeUtaChIPseqPeaksGm12892CtcfV2 GM12892 CTCF Pk CTCF GM12892 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-02 2010-07-02 562 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm12892CtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM12892 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm12892V2 GM12892 DNase Pk GM12892 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 565 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm12892V2 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM12892 cells) Regulation wgEncodeUtaChIPseqPeaksGm12891CtcfV2 GM12891 CTCF Pk CTCF GM12891 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-06 2010-07-06 569 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm12891CtcfV2 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM12891 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm12891V2 GM12891 DNase Pk GM12891 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-10-03 2010-07-03 564 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm12891V2 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM12891 cells) Regulation wgEncodeUtaChIPseqPeaksGm12878Pol2 GM12878 Pol2 Pk Pol2 GM12878 ChipSeq ENCODE Jan 2010 Freeze 2009-12-23 2010-09-22 592 Crawford UT-A p-value cutoff: 0.05 exp wgEncodeUtaChIPseqPeaksGm12878Pol2 Peaks RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in GM12878 cells) Regulation wgEncodeUtaChIPseqPeaksGm12878CmycV2 GM12878 c-Myc Pk c-Myc GM12878 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-09-08 2010-06-08 547 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm12878CmycV2 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in GM12878 cells) Regulation wgEncodeUtaChIPseqPeaksGm12878CtcfV3 GM12878 CTCF Pk CTCF GM12878 ChipSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 532 Crawford UT-A p-value cutoff: 0.05, ver3 exp wgEncodeUtaChIPseqPeaksGm12878CtcfV3 Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Crawford Iyer - University of Texas at Austin Regions of enriched signal in experiment ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM12878 cells) Regulation wgEncodeUncFAIREseqPeaksGm12878V3 GM12878 FAIRE Pk GM12878 FaireSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-04-20 2010-01-20 533 Crawford UNC p-value cutoff: 0.1, ver3 wgEncodeUncFAIREseqPeaksGm12878V3 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus FAIRE-seq Open Chromatin Crawford Lieb - University of North Carolina Regions of enriched signal in experiment ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqPeaksGm12878V3 GM12878 DNase Pk GM12878 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-29 2009-03-20 2009-12-20 534 Crawford Duke p-value cutoff: 0.05, ver3 wgEncodeDukeDNaseSeqPeaksGm12878V3 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM12878 cells) Regulation wgEncodeDukeDNaseSeqPeaksFibrop FibroP DNase Pk FibroP DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 605 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksFibrop Peaks fibroblasts taken from individuals with Parkinson's disease, AG20443, AG08395 and AG08396 were pooled for this sample DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in FibroP cells) Regulation wgEncodeDukeDNaseSeqPeaksFibrobl Fibrobl DNase Pk Fibrobl DnaseSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-20 583 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksFibrobl Peaks child fibroblast DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in Fibrobl cells) Regulation wgEncodeDukeDNaseSeqPeaksChorion Chorion DNase Pk Chorion DnaseSeq ENCODE Jan 2010 Freeze 2009-12-28 2010-09-27 595 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksChorion Peaks chorion cells (outermost of two fetal membranes), fetal membranes were collected from women who underwent planned cesarean delivery at term, before labor and without rupture of membranes. DNaseI HS Sequencing Crawford Crawford - Duke University Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in Chorion cells) Regulation wgEncodeDukeDNaseSeqPeaksAosmcSerumfree AoSMC DNase Pk AoSMC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 601 Crawford Duke p-value cutoff: 0.05 wgEncodeDukeDNaseSeqPeaksAosmcSerumfree serum_free_media Peaks aortic smooth muscle cells DNaseI HS Sequencing Crawford Crawford - Duke University Grown with growth factors, then switched to media that contains no FBS for 36 hours (Crawford) Regions of enriched signal in experiment ENCODE Open Chromatin, Duke DNase-seq Peaks (in AoSMC cells) Regulation oreganno ORegAnno Regulatory elements from ORegAnno Regulation Description This track displays literature-curated regulatory regions, transcription factor binding sites, and regulatory polymorphisms from ORegAnno (Open Regulatory Annotation). For more detailed information on a particular regulatory element, follow the link to ORegAnno from the details page. ORegAnno (Open Regulatory Annotation). --> Display Conventions and Configuration The display may be filtered to show only selected region types, such as: regulatory regions (shown in light blue) regulatory polymorphisms (shown in dark blue) transcription factor binding sites (shown in orange) regulatory haplotypes (shown in red) miRNA binding sites (shown in blue-green) To exclude a region type, uncheck the appropriate box in the list at the top of the Track Settings page. Methods An ORegAnno record describes an experimentally proven and published regulatory region (promoter, enhancer, etc.), transcription factor binding site, or regulatory polymorphism. Each annotation must have the following attributes: A stable ORegAnno identifier. A valid taxonomy ID from the NCBI taxonomy database. A valid PubMed reference. A target gene that is either user-defined, in Entrez Gene or in EnsEMBL. A sequence with at least 40 flanking bases (preferably more) to allow the site to be mapped to any release of an associated genome. At least one piece of specific experimental evidence, including the biological technique used to discover the regulatory sequence. (Currently only the evidence subtypes are supplied with the UCSC track.) A positive, neutral or negative outcome based on the experimental results from the primary reference. (Only records with a positive outcome are currently included in the UCSC track.) The following attributes are optionally included: A transcription factor that is either user-defined, in Entrez Gene or in EnsEMBL. A specific cell type for each piece of experimental evidence, using the eVOC cell type ontology. A specific dataset identifier (e.g. the REDfly dataset) that allows external curators to manage particular annotation sets using ORegAnno's curation tools. A "search space" sequence that specifies the region that was assayed, not just the regulatory sequence. A dbSNP identifier and type of variant (germline, somatic or artificial) for regulatory polymorphisms. Mapping to genome coordinates is performed periodically to current genome builds by BLAST sequence alignment. The information provided in this track represents an abbreviated summary of the details for each ORegAnno record. Please visit the official ORegAnno entry (by clicking on the ORegAnno link on the details page of a specific regulatory element) for complete details such as evidence descriptions, comments, validation score history, etc. Credits ORegAnno core team and principal contacts: Stephen Montgomery, Obi Griffith, and Steven Jones from Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada. The ORegAnno community (please see individual citations for various features): ORegAnno Citation. References Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, Open Regulatory Annotation Consortium.. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res. 2016 Jan 4;44(D1):D126-32. PMID: 26578589; PMC: PMC4702855 Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M, Haeussler M et al. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 2008 Jan;36(Database issue):D107-13. PMID: 18006570; PMC: PMC2239002 Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJ. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics. 2006 Mar 1;22(5):637-40. PMID: 16397004 orfeomeMrna ORFeome Clones ORFeome Collaboration Gene Clones Genes and Gene Predictions Description This track shows alignments of human clones from the ORFeome Collaboration. The goal of the project is to be an "unrestricted source of fully sequence-validated full-ORF human cDNA clones in a format allowing easy transfer of the ORF sequences into virtually any type of expression vector. A major goal is to provide at least one fully-sequenced full-ORF clone for each human gene." This track is updated automatically as new clones become available. Display Conventions and Configuration The track follows the display conventions for gene prediction tracks. Methods ORFeome human clones were obtained from GenBank and aligned against the genome using the blat program. When a single clone aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits and References Visit the ORFeome Collaboration members page for a list of credits and references. xenoEst Other ESTs Non-Human ESTs from GenBank mRNA and EST Description This track displays translated blat alignments of expressed sequence tags (ESTs) in GenBank from organisms other than human. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) for this track is in two parts. The first + or - indicates the orientation of the query sequence whose translated protein produced the match. The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --, nor is +- the same as -+. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To generate this track, the ESTs were aligned against the genome using blat. When a single EST aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 xenoMrna Other mRNAs Non-Human mRNAs from GenBank mRNA and EST Description This track displays translated blat alignments of vertebrate and invertebrate mRNA in GenBank from organisms other than human. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) for this track is in two parts. The first + indicates the orientation of the query sequence whose translated protein produced the match (here always 5' to 3', hence +). The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --, nor is +- the same as -+. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods The mRNAs were aligned against the human genome using translated blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only those alignments having a base identity level within 1% of the best and at least 25% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 xenoRefGene Other RefSeq Non-Human RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than human, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the human genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 ucscGenePfam Pfam in UCSC Gene Pfam Domains in UCSC Genes Genes and Gene Predictions Description Most proteins are composed of one or more conserved functional regions called domains. This track shows the high-quality, manually-curated Pfam-A domains found in transcripts located in the UCSC Genes track by the software HMMER3. Display Conventions and Configuration This track follows the display conventions for gene tracks. Methods The sequences from the knownGenePep table (see UCSC Genes description page) are submitted to the set of Pfam-A HMMs which annotate regions within the predicted peptide that are recognizable as Pfam protein domains. These regions are then mapped to the transcripts themselves using the pslMap utility. A complete shell script log for every version of UCSC genes can be found in our GitHub repository under hg/makeDb/doc/ucscGenes, e.g. mm10.knownGenes17.csh is for the database mm10 and version 17 of UCSC known genes. Of the several options for filtering out false positives, the "Trusted cutoff (TC)" threshold method is used in this track to determine significance. For more information regarding thresholds and scores, see the HMMER documentation and results interpretation pages. Note: There is currently an undocumented but known HMMER problem which results in lessened sensitivity and possible missed searches for some zinc finger domains. Until a fix is released for HMMER /PFAM thresholds, please also consult the "UniProt Domains" subtrack of the UniProt track for more comprehensive zinc finger annotations. Credits pslMap was written by Mark Diekhans at UCSC. References Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al. The Pfam protein families database. Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. PMID: 19920124; PMC: PMC2808889 polyA Poly(A) Poly(A) Sites, Both Reported and Predicted mRNA and EST Description The polyA_DB database is a set of human mRNA polyadenlyation sites based on EST/cDNA evidence. A site is a single base denoting the beginning of a poly(A) tail in a nascent mRNA transcript and is typically 10-30 nucleotides downstream of a polyadenylation signal (most commonly AAUAAA). The polyA_DB web server is found at http://exon.umdnj.edu/polya_db/. The Poly(A) composite track consists of two subtracks: a polyA_DB subtrack that displays reported poly(A) sites, and a poly(A) prediction subtrack that displays poly(A) sites predicted using a support vector machine (SVM). The poly(A) predictions are made using 1500-base DNA sequences centered at the end of each RefSeq gene. The sequences serve as input into the SVM described in Cheng et al., 2006. The SVM scores each base using a model derived from 15 different cis-elements and reports an E-value for a region of DNA between 0 (excellent) and 0.5 (worst). This E-value is then normalized to an integer value between 0 (worst) and 1000 (excellent). High-scoring regions are highlighted, with the highest-scoring base indicated by a thicker line. The median length of these regions is 48 bases. References Cheng Y, Miura RM, Tian B. Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics. 2006 Oct 1;22(19):2320-5. PMID: 16870936 Zhang H, Hu J, Recce M, Tian B. PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D116-20. PMID: 15608159; PMC: PMC540009 polyaPredict Poly(A) SVM Predicted Poly(A) Sites Using an SVM mRNA and EST polyaDb PolyA_DB Reported Poly(A) Sites from PolyA_DB mRNA and EST mammalPsg Pos Sel Genes Positively Selected Genes (6 species) Genes and Gene Predictions Description This track shows the results of a genome-wide scan for positively selected genes (PSGs) based on multiple alignments of the human (hg18), chimp (panTro2), macaque (rheMac2), mouse (mm8), rat (rn4), and dog (canFam2) genome assemblies (Kosiol et al., 2008). The track displays the 16,529 high-confidence orthologs that were tested, and highlights those genes showing evidence of positive selection. It summarizes the results of nine different likelihood ratio tests (LRTs) for positive selection, as described below. Four classes of genes are distinguished by score and color: Score = 1000; shown in red. Genes with strong evidence of positive selection across species. These are the 400 genes whose P-values under test A (see below) meet the threshold required for a false discovery rate (FDR) of 0.05. Score = 700; shown in purple. Genes with strong evidence of positive selection on one or more branches. These are the 144 additional genes that meet the threshold for FDR < 0.05 under any of the branch- and clade-specific tests B-I. Score = 400; shown in blue. Genes with weak evidence of positive selection on one or more branches. These are the 3705 additional genes whose nominal (unadjusted) P-values are < 0.05 under any of tests A-I. Score = 0; shown in black. Genes with no significant evidence of positive selection. These are the remaining genes, having nominal P ≥ 0.05 under all of tests A-I. In some cases, genes were truncated before testing to eliminate regions with frame-shift indels or nonconserved exon boundaries. The track shows just the portions of the genes that were tested, rather than the full gene structures. The 544 genes in groups 1 and 2, above, were also subjected to a novel Bayesian analysis to determine their most likely "selection histories", or patterns of positive selection and non-selection on the branches of the six-species phylogeny. These selection histories are described graphically on the details page, with red indicating branches under positive selection, and black indicating branches free from positive selection. Schema and Identifiers The P-values for all tests are stored in the browser database and can be incorporated into filters using the table browser. A P-value of 2 indicates that a test was not performed, due to insufficient data. (For example, test G could not be performed if the chimp sequence was missing or did not pass the quality filters.) The *isFdr columns indicate which genes are significant with FDR < 0.05 for each test. Thus, filtering for lrtPrimateClPValue < 0.05 will retrieve all genes showing weak evidence of positive selection in the primate clade, and filtering for lrtRodentBrIsFdr = 1 will retrieve all genes showing strong evidence of positive selection on the branch to the rodents. The original source of each gene structure (RefSeq, Vega, or UCSC) is given in its identifier. A suffix of ".inc" indicates that a gene was truncated before testing. Methods Human genes from the RefSeq, Vega, and UCSC Genes sets were mapped onto multiz-based multiple alignments, then subjected to a strict set of filters to identify high-confidence one-to-one orthologs. Each gene was required to be present in at least three species, and recently duplicated genes were removed (see Kosiol et al., 2008 for details). These orthologous genes were then examined for evidence of positive selection using a series of nine likelihood ratio tests (LRTs) based on Yang and Nielsen's (2002) branch-site framework. These LRTs essentially measure evidence for positive selection in terms of how much better the data is fit by a model that allows for positive selection (on particular branches of the tree) than by a model that allows only for purifying selection and neutral evolution. LRTs were performed for positive selection on: A: all branches B: branch to primates C: all branches in primate clade D: branch to rodents E: all branches in rodent clade F: branch to human G: branch to chimp H: branch to hominids I: branch to macaque The Bayesian analysis estimates a posterior distribution over all possible selection histories under a simple model that allows positive selection to switch on and off along the branches of the phylogeny. The inference was performed by Gibbs sampling for just the 544 genes showing strong evidence of positive selection according to the LRTs. See the supplementary website for additional raw data. References Kosiol C, Vinar T, da Fonseca R, Hubisz M, Bustamante C, Nielsen R, and Siepel A. Patterns of Positive Selection in Six Mammalian Genomes. PLoS Genetics. 2008 June;4(8): e1000144. Yang Z, Nielsen R. Codon substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution. 2002 June;19(6):908-17. encodePseudogene Pseudogenes ENCODE Pseudogene Predictions - All ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows the pseudogenes located in ENCODE regions generated by five different methods—Yale Pipeline, GenCode manual annotation, two different UCSC methods, and Gene Identification Signature (GIS)—as well as a consensus pseudogenes subtrack based on the pseudogenes from all five methods. Datasets are displayed in separate subtracks within the annotation and are individually described below. The annotations are colored as follows: Type Color Description Processed_pseudogene pink Pseudogenes arising via retrotransposition (exon structure of parent gene lost) Unprocessed_pseudogene blue Pseudogenes arising via gene duplication (exon structure of parent gene retained) Pseudogene_fragment light blue Pseudogenes sequences that are single-exon and cannot be confidently assigned to either the processed or the duplicated category Undefined gray Consensus Pseudogenes Description This subtrack shows pseudogenes derived from a consensus of the five methods listed above. In the pseudogene.org data freeze dated 6 Jan. 2006, 201 consensus pseudogenes were found. Here, pseudogenes are defined as genomic sequences that are similar to known genes but exhibit various inactivating disablements (e.g. premature stop codons or frameshifts) in their putative protein-coding regions and are flagged as either recently-processed or non-processed. Methods The pseudogene sets were processed as follows: Step I: The four data sets were filtered to remove pseudogenes that overlap with current Gencode coding exons/loci. Pseudogenes overlapping with introns or noncoding genes were kept. Subsequent filtering of pseudogene sets, excluding the Havana set, removed pseudogenes overlapping with exons of UCSC Known Genes. Step II: A union of the pseudogenes from each filtered set was created. If a pseudogenic region was annotated by more than one group, the lowest starting coordinate and highest ending coordinate were used as the boundaries. Step III: A parent protein for each pseudogene in the union was assigned using a protein set from UniProt. Pseudogenes without a matching protein were excluded. Step IV: Each pseudogene was realigned to its parent protein. Step V: The consensus list of pseudogenes was updated with boundaries derived from the alignment in Step IV. Step VI: The consensus list of pseudogenes was updated with the assigned parent proteins and new classifications (processed or non-processed). Verification of the Consensus Pseudogenes All pseudogenes in the list have been extensively curated by Adam Frankish and Jennifer Harrow at the The Wellcome Trust Sanger Institute. References More information about this data set is available from pseudogene.org/ENCODE. Havana-Gencode Annotated Pseudogenes and Immunglobulin Segments Description This track shows pseudogenes annotated by the HAVANA group at the Wellcome Trust Sanger Institute. Pseudogenes have homology to protein sequences but generally have a disrupted CDS. For all annotated pseudogenes, an active homologous gene (the parent) can be identified elsewhere in the genome. Pseudogenes are classified as processed or unprocessed. Methods Prior to manual annotation, finished sequence is submitted to an automated analysis pipeline for similarity searches and ab initio gene predictions. The searches are run on a computer farm and stored in an Ensembl MySQL database using the Ensembl analysis pipeline system (Searle et al., 2004, Harrow et al., 2006). A pseudogene is annotated where the total length of the protein homology to the genomic sequence is >20% of the length of the parent protein or >100 aa in length, whichever is shortest. If a gene structure has an ORF but has lost the structure of the parent gene, a pseudogene is annotated provided there is no evidence of transcription from the pseudogene locus. When an open but truncated reading frame is present, other evidence is used (for example, 3' genomic polyA tract) to allow classification as a pseudogene. When a parent gene has only a single coding exon (e.g. olfactory receptors), a small 5' or 3' truncation to the CDS at the pseudogene locus (compared to other family members) is sufficient to confirm pseudogene status where the truncation is predicted to significantly affect secondary structure by the literature and/or expert community. Processed and unprocessed pseudogenes are distinguished on the basis of structure and genomic context. Processed pseudogenes, which arise via retrotransposition, lose the intron-exon structure of the parent gene, often have an A-rich tract indicative of the insertion site at their 3' end, and are flanked by different genomic sequence to the parent gene. Unprocessed pseudogenes, which arise via gene duplication, share both the intron-exon structure and flanking genomic sequence with the parent gene. Transcribed pseudogenes are indicated by the annotation of a pseudogene and transcript variant alongside each other. References Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, et al. GENCODE: Producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. Searle SM, Gilbert J, Iyer V, Clamp M. The otter annotation system. Genome Res. 2004 May;14(5):963-70. Yale Pseudogenes Description This subtrack shows pseudogenes in the ENCODE regions identified by the Yale Pseudogene Pipeline. In this analysis, pseudogenes are defined as genomic sequences that are similar to known genes with various inactivating disablements (e.g. premature stop codons or frameshifts) in their putative protein-coding regions. Pseudogenes are flagged as recently processed, recently duplicated, or of uncertain origin (either ancient fragments or resulting from a single-exon parent). Methods Step I: Repeat-masked human genome sequence was used as the target for a six-frame TBLASTN where the query was the nonredundant human proteome set (European Bioinformatics Institute). Only high-quality human protein sequences from SWISS-PROT and TrEMBL were used, because this set included processed or duplicated pseudogenes. Step II: BLAST hits that had a significant overlap with annotated multiple-exon Ensembl genes were removed from consideration. Step III: The set of BLAST hits was reduced by selecting hits in decreasing significance level and removing matches that overlapped by more than 10 amino acids or 30 bp with a picked match. Step IV: Adjacent matches on a chromosome were merged together if they were thought to belong to the same pseudogene locus. Merged matches were extended on both sides to include the length of the query protein to which they matched along with an extra 30 bp buffer on either side. Step V: The FASTA program was used to re-align these extended hits to the genome. Redundant hits were removed and hits with gaps greater than 60 bp were split into two alignments. Step VI: Alignments with possible artifactual frameshifts or stop codons introduced by the alignment process were closely inspected. Step VII: False positives (E-value less than 10-10 or amino acid sequence of less than 40% identity) and sequences matching protein queries containing repeats or low-complexity regions were removed. Potential functional genes were also removed. These were defined as having no frameshift disruptions, less than 95% sequence identity to the query protein, and translatable to a protein sequence longer than 95% of the length of the query protein. Step VIII: The remaining putative pseudogene sequences were classified based on several criteria. The intron-exon structure of the functional gene was further used to infer whether a pseudogene was recently duplicated or processed. A duplicated pseudogene retains the intron-exon structure of its parent functional gene, whereas a processed pseudogene shows evidence that this structure has been spliced out. Those sequences where the insertions were 50% or more repeats (as detected by RepeatMasker) are "Disrupted" processed pseudogenes. Small pseudogene sequences that cannot be confidently assigned to either the processed or duplicated category may be ancient fragments. Further details can be found in the references below. Verification of Yale Pseudogenes All pseudogenes in the list have been manually checked. References Zhang Z, Harrison PM, Liu Y, Gerstein M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003 Dec;13(12):2541-58. Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N, Gerstein M. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol. 2005 May 27;349(1):27-45. UCSC Retrogene Predictions Description The Retrogene subtrack shows processed mRNAs that have been inserted back into the genome since the mouse/human split. Retrogenes can be functional genes that have acquired a promoter from a neighboring gene, non-functional pseudogenes, or transcribed pseudogenes. Methods Step I: All GenBank mRNAs for a particular species were aligned to the genome using blastz. Step II: mRNAs that aligned twice in the genome (once with introns and once without introns) were initially screened. Step III: A series of features were scored to determine candidates for retrotranspostion events. These features included position and length of the polyA tail, degree of synteny with mouse, coverage of repetitive elements, number of exons that can still be aligned to the retroGene, and degree of divergence from the parent gene. Retrogenes are classified using a threshold score function that is a linear combination of this set of features. Retrogenes in the final set have a score threshold greater than 425 based on a ROC plot against the Vega annotated pseudogenes. The "type" field has four possible values: singleExon: the parent gene is a single exon gene mrna: the parent gene is a spliced mrna that has no annotation in NCBI refSeq, UCSC knownGene or Mammalian Gene Collection (MGC) annotated: the parent gene has been annotated by one of refSeq, knownGene or MGC expressed: an mRNA overlaps the retrogene, indicating probable transcription These features can be downloaded from the table pseudoGeneLink in many formats using the Table Browser option on the menubar. References Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA. 2003 Sep 30;100(20):11484-9. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. UCSC Pseudogene Predictions Methods Step I: A set of pre-aligned human known genes was mapped across the human genome through the human Blastz Self Alignment using HomoMap (homologous mapping method). The fragments identified by HomoMap are homologs of genes from the Known Genes set. Step II: Each homologous fragment was compared with its known reference gene and a set of features was then collected. The features included sequence identity, Ka/Ks ratio (asynonymous substitution per codon vs. synonymous substitution per codon), splicing sites, and the number of premature stop codons. These homologous fragments are either genes or pseudogenes. Step III: Homologous fragments that overlapped known reference genes were labeled as positive samples; those overlapping known pseudogenes were labeled as negative samples. Step IV: These positive and negative sets were used to train support vector machines (SVMs) to separate coding fragments from pseudo fragments. The trained SVMs were used to classify all homologous fragments into potential coding elements or potential pseudo elements. Step V: Finally, a heuristic filter was used to correct some misclassified fragments and to generate the final potential pseudogene set. GIS-PET Pseudogene Predictions Description This subtrack shows retrotransposed pseudogenes predicted by multiple mapped GIS-PETs (gene identification signature-pair end ditags) collected from two different cancer cell lines HCT116 and MCF7. A total of 49 non-redundant processed pseudogenes predicted in the ENCODE regions are presented in this dataset. Each pseudogene is labeled with an ID of the format AAA-GISPgene-XX, where "AAA" indicates the parental gene name, "GISPgene" is the GIS pseudogene, and "XX" is the unique ID for each pseudogene. Methods PETs were generated from full-length transcripts and computationally mapped onto the human genome to demarcate the transcript start and end positions. The PETs that mapped to multiple genome locations were grouped into PET-based gene families that include parent gene and pseudogenes. A representative member—the shortest PET as defined by genomic coordinates—was selected from each family. This representative PET was aligned to the hg17 genome using in order to identify all the putative pseudogenes at the whole genome level. All hits with an identity >=70% and coverage >=50% within ENCODE regions were reported. In this context, "coverage" refers to alignment coverage of the query sequence, i.e. a measure of how complete the predicted pseudogene is relative to the query sequence. Verification of GIS-PET Pseudogene Predictions Pseudogenes were verified by manual examination. Credits These data were generated by the ENCODE Pseudogene Annotation group: Jennifer Harrow, Wei Chia-Lin, Siew Woh Choo Adam Frankish, Robert Baertsch, France Denoeud, Deyou Zheng, Yontao Lu, Alexandre Reymond, Roderic Guigo Serra, Tom Gingeras, Suganthi Balasubramanian and Mark Gerstein. encodePseudogeneGIS GIS Pseudogenes Genome Institute of Singapore (GIS) Pseudogenes Pilot ENCODE Regions and Genes encodePseudogeneUcsc2 UCSC Pseudogenes UCSC Pseudogene Predictions Pilot ENCODE Regions and Genes encodePseudogeneUcsc UCSC Retrogenes UCSC Retrogene Predictions Pilot ENCODE Regions and Genes encodePseudogeneYale Yale Pseudogenes Yale Pseudogene Predictions Pilot ENCODE Regions and Genes encodePseudogeneHavana Havana-Gencode Pseudogenes Havana-Gencode Annotated Pseudogenes and Immunoglobulin Segments Pilot ENCODE Regions and Genes encodePseudogeneConsensus Consensus Pseudogenes Consensus of Yale, Havana-Gencode, UCSC and GIS ENCODE Pseudogenes Pilot ENCODE Regions and Genes rdmr R-DMR Reprogrammed Differentially Methylated Regions Phenotype and Disease Associations Description This track provides the location of genomic regions that show differential DNA methylation (DNAm) between induced pluripotent stem (iPS) cells and their parental fibroblasts. M values represent averaged methylation values across all samples in a given cell type. R-DMRs were identified using Comprehensive High-Throughput Arrays for Relative Methylation (CHARM) analysis. For a detailed description of CHARM analysis, please see Irizarry RA et al., 2008. Credits Thanks to Andrew P. Feinberg and Christine Ladd-Acosta for providing this annotation data. References Doi A, Park IH, Wen B, Murakami P, Aryee MJ, Irizarry R, Herb B, Ladd-Acosta C, Rho J, Loewer S, Miller J, Schlaeger T, Daley GQ, Feinberg AP. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nature Genetics 2009 Dec;41(12):1350-1353. Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA, Wen B, Feinberg AP. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Research 2008 May;18(5):780-90. recombRate Recomb Rate Recombination Rate from deCODE, Marshfield, or Genethon Maps (deCODE default) Mapping and Sequencing Description The recombination rate track represents calculated sex-averaged rates of recombination based on either the deCODE, Marshfield, or Genethon genetic maps. By default, the deCODE map rates are displayed. Female- and male-specific recombination rates, as well as rates from the Marshfield and Genethon maps, can also be displayed by choosing the appropriate filter option on the track description page. Methods The deCODE genetic map was created at deCODE Genetics and is based on 5,136 microsatellite markers for 146 families with a total of 1,257 meiotic events. For more information on this map, see Kong, et al., 2002. The Marshfield genetic map was created at the Center for Medical Genetics and is based on 8,325 short tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134 individuals with 186 meioses. For more information on this map, see Broman et al., 1998. The Genethon genetic map was created at Genethon and is based on 5,264 microsatellites for 8 CEPH families consisting of 134 individuals with 186 meioses. For more information on this map, see Dib et al., 1996. Each base is assigned the recombination rate calculated by assuming a linear genetic distance across the immediately flanking genetic markers. The recombination rate assigned to each 1 Mb window is the average recombination rate of the bases contained within the window. Using the Filter This track has a filter that can be used to change the map or gender-specific rate displayed. The filter is located at the top of the track description page, which is accessed via the small button to the left of the track's graphical display or through the link on the track's control menu. To view a particular map or gender-specific rate, select the corresponding option from the "Map Distances" pulldown list. By default, the browser displays the deCODE sex-averaged distances. When you have finished configuring the filter, click the Submit button. Credits This track was produced at UCSC using data that are freely available for the Genethon, Marshfield, and deCODE genetic maps (see above links). Thanks to all who played a part in the creation of these maps. References Broman KW, Murray JC, Sheffield VC, White RL, Weber JL. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet. 1998 Sep;63(3):861-9. PMID: 9718341; PMC: PMC1377399 Dib C, Fauré S, Fizames C, Samson D, Drouot N, Vignal A, Millasseau P, Marc S, Hazan J, Seboun E et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature. 1996 Mar 14;380(6570):152-4. PMID: 8600387 Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G et al. A high-resolution recombination map of the human genome. Nat Genet. 2002 Jul;31(3):241-7. PMID: 12053178 rmskRM327 RepMask 3.2.7 Repeating Elements by RepeatMasker version 3.2.7 Variation and Repeats Description This track was created by using a more recent version (3.2.7, Jan. 2009) of Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence, as well as a modified version of the query sequence in which all the annotated repeats have been masked. RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka, J. (2000) in the References section below. Results from the original RepeatMasker run have been kept in the RepeatMasker track in order to avoid disrupting any analyses performed on the original run's results. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit and GIRI for providing the tools and repeat libraries used to generate this track. References Smit, AFA, Hubley, R and Green, P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2007. Repbase Update is described in Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6): 657-63. Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. rgdQtl RGD Human QTL Human Quantitative Trait Locus from RGD Phenotype and Disease Associations Description A quantitative trait locus (QTL) is a polymorphic locus that contains alleles which differentially affect the expression of a continuously distributed phenotypic trait. Usually a QTL is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. Credits Thanks to the RGD for providing this annotation. RGD is funded by grant HL64541 entitled "Rat Genome Database", awarded to Dr. Howard J Jacob, Medical College of Wisconsin, from the National Heart Lung and Blood Institute (NHLBI) of the National Institutes of Health (NIH). References Rapp JP. Genetic analysis of inherited hypertension in the rat. Physiol Rev. 2000 Jan;80(1):135-72. PMID: 10617767 rgdRatQtl RGD Rat QTL Rat Quantitative Trait Locus from RGD Coarsely Mapped to Human Phenotype and Disease Associations Description This track shows Rat quantitative trait loci (QTLs) from the Rat Genome Database (RGD) that have been coarsely mapped by UCSC to the Human genome using stringently filtered cross-species alignments. A quantitative trait locus (QTL) is a polymorphic locus that contains alleles which differentially affect the expression of a continuously distributed phenotypic trait. Usually a QTL is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. For a comprehensive review of QTL mapping techniques in the rat, see Rapp, 2000. To map the Rat QTLs to Human, UCSC's chained and netted blastz alignments of Rat to Human were filtered to retain only those with high chain scores (>=500,000). This removed many valid-but-short alignments and in general retained only very long chains (>10,000, usually >100,000 bp), so that only large regions could be mapped. This choice was made because QTLs in general are extremely large and approximate regions. After the alignment filtering, UCSC's liftOver program was used to map Rat regions to Human via the filtered alignments. To get a sense of how many genomic rearrangments between Rat and Human are in the region of a particular Rat QTL, you may want to view the Human Nets track in the Rat Nov. 2004 (Baylor 3.4/rn4) genome browser. In the position/search box, enter the name of the Rat QTL of interest. Credits Thanks to the RGD for providing the Rat QTLs. RGD is funded by grant HL64541 entitled "Rat Genome Database", awarded to Dr. Howard J Jacob, Medical College of Wisconsin, from the National Heart Lung and Blood Institute (NHLBI) of the National Institutes of Health (NIH). References Rapp JP. Genetic analysis of inherited hypertension in the rat. Physiol Rev. 2000 Jan;80(1):135-72. PMID: 10617767 encodeRikenCage Riken CAGE Riken CAGE - Predicted Gene Start Sites Pilot ENCODE Transcription Description This track shows the number of 5' cap analysis gene expression (CAGE) tags that map to the genome on the "plus" and "minus" strands at a specific location. For clarity, only the first 5' nucleotide in the tag (relative to the transcript direction) is considered. Areas in which many tags map to the same region may indicate a significant transcription start site. Display Conventions and Configuration The position of the first 5' nucleotide in the tag is represented by a solid block. The height of the block indicates the number of 5' cDNA starts that map at that location. This composite annotation track contains multiple subtracks that may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods The CAGE tags are sequenced from the 5' ends of full-length cDNAs produced using RIKEN full-length cDNA technology. To create the tag, a linker was attached to the 5' end of full-length cDNAs which were selected by cap trapping. The first 20 bp of the cDNA were cleaved using class II restriction enzymes, followed by PCR amplification and then concatamers of the resulting 32 bp tags were formed for more efficient sequencing. For more information on CAGE analysis, see Shiraki et al. (2003) below. Refer to the RIKEN website for information about RIKEN full-length cDNA technologies. The mapping methodology employed in this annotation will be described in upcoming publications. Verification The techniques used to verify these data will be described in upcoming publications. Credits These data were contributed by the Functional Annotation of Mouse (FANTOM) Consortium, RIKEN Genome Science Laboratory and RIKEN Genome Exploration Research Group (Genome Network Project Core Group). FANTOM Consortium: P. Carninci, T. Kasukawa, S. Katayama, Gough, M. Frith, N. Maeda, R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. Kodzius, K. Shimokawa, V. B. Bajic, S. E. Brenner, S. Batalov, A. R. R. Forrest, M. Zavolan, M. J. Davis, L. G. Wilming, V. Aidinis, J. Allen, A. Ambesi-Impiombato, R. Apweiler, R. N. Aturaliya, T. L. Bailey, M. Bansal, K. W. Beisel, T. Bersano, H. Bono, A. M. Chalk, K. P. Chiu, V. Choudhary, A. Christoffels, D. R. Clutterbuck, M. L. Crowe, E. Dalla, B. P. Dalrymple, B. de Bono, G. Della Gatta, D. di Bernardo, T. Down, P. Engstrom, M. Fagiolini, G. Faulkner, C. F. Fletcher, T. Fukushima, M. Furuno, S. Futaki, M. Gariboldi, P. Georgii-Hemming, T. R. Gingeras, T. Gojobori, R. E. Green, S. Gustincich, M. Harbers, V. Harokopos, Y. Hayashi, S. Henning, T. K. Hensch, N. Hirokawa, D. Hill, L. Huminiecki, M. Iacono, K. Ikeo, A. Iwama, T. Ishikawa, M. Jakt, A. Kanapin, M. Katoh, Y. Kawasawa, J. Kelso, H. Kitamura, H. Kitano, G. Kollias, S. P. T. Krishnan, A.F. Kruger, K. Kummerfeld, I. V. Kurochkin, L. F. Lareau, L. Lipovich, J. Liu, S. Liuni, S. McWilliam, M. Madan Babu, M. Madera, L. Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. Mignone, S. Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, H. Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, F. Nori, O. Ohara, Y. Okazaki, V. Orlando, K. C. Pang, W. J. Pavan, G. Pavesi, G. Pesole, N. Petrovsky, S. Piazza, W. Qu, J. Reed, J. F. Reid, B. Z. Ring, M. Ringwald, B. Rost, Y. Ruan, S. Salzberg, A. Sandelin, C. Schneider, C. Schoenbach, K. Sekiguchi, C. A. M. Semple, S. Seno, L. Sessa, Y. Sheng, Y. Shibata, H. Shimada, K. Shimada, B. Sinclair, S. Sperling, E. Stupka, K. Sugiura, R. Sultana, Y. Takenaka, K. Taki, K. Tammoja, S. L. Tan, S. Tang, M. S. Taylor, J. Tegner, S. A. Teichmann, H. R. Ueda, E. van Nimwegene, R. Verardo, C. L. Wei, K. Yagi, H. Yamanishi, E. Zabarovsky, S. Zhu, A. Zimmer, W. Hide, C. Bult, S. M. Grimmond, R. D. Teasdale, E. T. Liu, V. Brusic, J. Quackenbush, C. Wahlestedt, J. Mattick, D. Hume. RIKEN Genome Exploration Research Group: C. Kai, D. Sasaki, Y. Tomaru, S. Fukuda, M. Kanamori-Katayama, M. Suzuki, J. Aoki, T. Arakawa, J. Iida, K. Imamura, M. Itoh, T. Kato, H. Kawaji, N. Kawagashira, T. Kawashima, M. Kojima, S. Kondo, H. Konno, K. Nakano, N. Ninomiya, T. Nishio, M. Okada, C. Plessy, K. Shibata, T. Shiraki, S. Suzuki, M. Tagami, K Waki, A. Watahiki, Y. Okamura-Oho, H. Suzuki, J. Kawai. General Organizer: Y. Hayashizaki References Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A. 100(26), 15776-81 (2003). encodeRikenCageMinus Riken CAGE - Riken CAGE Minus Strand - Predicted Gene Start Sites Pilot ENCODE Transcription encodeRikenCagePlus Riken CAGE + Riken CAGE Plus Strand - Predicted Gene Start Sites Pilot ENCODE Transcription wgEncodeRikenCage RIKEN CAGE Loc ENCODE RIKEN RNA Subcellular Localization by CAGE Tags Expression Description This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts from different sub-cellular localizations in multiple cell lines. A CAGE cluster is a region of overlapping tags with an assigned value that represents the expression level. The data in this track were produced as part of the ENCODE Transcriptome Project. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. This track contains the following views: Plus and Minus Clusters These views display clusters of overlapping read mappings on the forward and reverse genomic strands. Alignments The Alignments view shows the individual tags (read mappings), with mismatches from the genomic reference highlighted. Color differences in subtracks are are used as a visual cue to distinguish between the different cell types, and between annotations on the plus and minus strand. Methods Cells were grown according to the approved ENCODE cell culture protocols. RNA molecules longer than 200 nt and present in the RNA population isolated from each subcellular compartment were fractionated into polyA+ and polyA- fractions as described in these protocols. The CAGE tags were sequenced from the 5' ends of cap-trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009). To create the tag, a linker was attached to the 5' end of polyA+ or polyA- reverse-transcribed cDNAs which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. A linker was then attached to the 3' end of the cDNA. After PCR amplification, the tags were sequenced (36 bp single reads) using ABI SOLiD technology (polyA- RNA from the cytosol and nucleus of K562 cell lines, and from whole cell in prostate cells) or Illumina/Solexa GA (all other data). Tags were mapped to the human genome (NCBI Build36, hg18) using the program nexalign (T. Lassmann manuscript in preparation). SOlid CAGE sequences were mapped with up to 3 mismatches; 2 mismatches were allowed for Solexa CAGE. Alignments of sequences mapping 10 times or fewer were retained. The expression level was computed as the number of reads making up the cluster, divided by the total number of reads sequenced, times 1 million. Release Notes This is Release 2 of this track. This release adds data for eight new cell-type/compartment combinations (GM12878 Nucleus, H1-hESC whole cell, HepG2 cytosol/nucleus/nucleolus, HUVEC cytosol, and NHEK cytosol/nucleus). Credits These data were generated and analyzed by Timo Lassmann, Phil Kapranov, Hazuki Takahashi, Yoshihide Hayashizaki, Carrie Davis, Tom Gingeras, and Piero Carninci. Contact: Piero Carninci at RIKEN Omics Science Center References Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, et al. CAGE: cap analysis of gene expression. Nat Methods. 2006 March 1; 3(3):211-222. Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, Murata M, Nishiyori H, Lazarevic D, Motti D, et al. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res. 2009 February; 19(2):255-265. Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics. 1996 November 1; 37(3):327-336. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRikenCageView1PlusClusters Plus Clusters ENCODE RIKEN RNA Subcellular Localization by CAGE Tags Expression wgEncodeRikenCagePlusClustersProstateCellLongnonpolya Pros cell pA- + prostate Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cell longNonPolyA wgEncodeRikenCagePlusClustersProstateCellLongnonpolya PlusClusters prostate tissue purchased for CSHL project CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in Prostate whole cell) Expression wgEncodeRikenCagePlusClustersNhekNucleusLongnonpolya NHEK nucl pA- + NHEK Cage ENCODE Jan 2010 Freeze 2010-01-27 2010-10-27 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCagePlusClustersNhekNucleusLongnonpolya PlusClusters epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in NHEK nucleus) Expression wgEncodeRikenCagePlusClustersNhekCytosolLongnonpolya NHEK cyto pA- + NHEK Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCagePlusClustersNhekCytosolLongnonpolya PlusClusters epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in NHEK cytosol) Expression wgEncodeRikenCagePlusClustersHuvecCytosolLongnonpolya HUVEC cyto pA- + HUVEC Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCagePlusClustersHuvecCytosolLongnonpolya PlusClusters umbilical vein endothelial cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in HUVEC cytosol) Expression wgEncodeRikenCagePlusClustersHepg2NucleolusTotal HepG2 nlos tot + HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleolus total wgEncodeRikenCagePlusClustersHepg2NucleolusTotal PlusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Plus Strand Start Sites (Total RNA in HepG2 nucleolus) Expression wgEncodeRikenCagePlusClustersHepg2NucleusLongnonpolya HepG2 nucl pA- + HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCagePlusClustersHepg2NucleusLongnonpolya PlusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in HepG2 nucleus) Expression wgEncodeRikenCagePlusClustersHepg2CytosolLongnonpolya HepG2 cyto pA- + HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCagePlusClustersHepg2CytosolLongnonpolya PlusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in HepG2 cytosol) Expression wgEncodeRikenCagePlusClustersK562NucleolusTotal K562 nlos tot + K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleolus total wgEncodeRikenCagePlusClustersK562NucleolusTotal PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Plus Strand Clusters (Total RNA in K562 nucleolus) Expression wgEncodeRikenCagePlusClustersK562ChromatinTotal K562 chrm tot + K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN chromatin total wgEncodeRikenCagePlusClustersK562ChromatinTotal PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Plus Strand Clusters (Total RNA in K562 chromatin) Expression wgEncodeRikenCagePlusClustersK562NucleoplasmTotal K562 nplm tot + K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleoplasm total wgEncodeRikenCagePlusClustersK562NucleoplasmTotal PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Plus Strand Clusters (Total RNA in K562 nucleoplasm) Expression wgEncodeRikenCagePlusClustersK562NucleusLongpolya K562 nucl pA+ + K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN nucleus longPolyA wgEncodeRikenCagePlusClustersK562NucleusLongpolya PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA+ RNA in K562 nucleus) Expression wgEncodeRikenCagePlusClustersK562NucleusLongnonpolya K562 nucl pA- + K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 nucleus longNonPolyA wgEncodeRikenCagePlusClustersK562NucleusLongnonpolya PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in K562 nucleus) Expression wgEncodeRikenCagePlusClustersK562CytosolLongpolya K562 cyto pA+ + K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN cytosol longPolyA wgEncodeRikenCagePlusClustersK562CytosolLongpolya PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA+ RNA in K562 cytosol) Expression wgEncodeRikenCagePlusClustersK562CytosolLongnonpolya K562 cyto pA- + K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cytosol longNonPolyA wgEncodeRikenCagePlusClustersK562CytosolLongnonpolya PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in K562 cytosol) Expression wgEncodeRikenCagePlusClustersK562PolysomeLongnonpolya K562 psom pA- + K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN polysome longNonPolyA wgEncodeRikenCagePlusClustersK562PolysomeLongnonpolya PlusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Strand of mRNA with ribosomes attached Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in K562 polysome) Expression wgEncodeRikenCagePlusClustersH1hescCellLongnonpolya H1 cell pA- + H1-hESC Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cell longNonPolyA wgEncodeRikenCagePlusClustersH1hescCellLongnonpolya PlusClusters embryonic stem cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Start Sites (PolyA- RNA in H1-hESC cell) Expression wgEncodeRikenCagePlusClustersGm12878NucleolusTotal GM128 nlos tot + GM12878 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleolus total wgEncodeRikenCagePlusClustersGm12878NucleolusTotal PlusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Plus Strand Start Sites (Total RNA in GM12878 nucleolus) Expression wgEncodeRikenCagePlusClustersGm12878NucleusLongnonpolya GM128 nucl pA- + GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCagePlusClustersGm12878NucleusLongnonpolya PlusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in GM12878 nucleus) Expression wgEncodeRikenCagePlusClustersGm12878CytosolLongnonpolya GM128 cyto pA- + GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCagePlusClustersGm12878CytosolLongnonpolya PlusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Plus Strand Clusters (PolyA- RNA in GM12878 cytosol) Expression wgEncodeRikenCageView3MinusClusters Minus Clusters ENCODE RIKEN RNA Subcellular Localization by CAGE Tags Expression wgEncodeRikenCageMinusClustersProstateCellLongnonpolya Pros cell pA- - prostate Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cell longNonPolyA wgEncodeRikenCageMinusClustersProstateCellLongnonpolya MinusClusters prostate tissue purchased for CSHL project CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in Prostate whole cell) Expression wgEncodeRikenCageMinusClustersNhekNucleusLongnonpolya NHEK nucl pA- - NHEK Cage ENCODE Jan 2010 Freeze 2010-01-27 2010-10-27 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCageMinusClustersNhekNucleusLongnonpolya MinusClusters epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in NHEK nucleus) Expression wgEncodeRikenCageMinusClustersNhekCytosolLongnonpolya NHEK cyto pA- - NHEK Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCageMinusClustersNhekCytosolLongnonpolya MinusClusters epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in NHEK cytosol) Expression wgEncodeRikenCageMinusClustersHuvecCytosolLongnonpolya HUVEC cyto pA- - HUVEC Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCageMinusClustersHuvecCytosolLongnonpolya MinusClusters umbilical vein endothelial cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in HUVEC cytosol) Expression wgEncodeRikenCageMinusClustersHepg2NucleolusTotal HepG2 nlos tot - HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleolus total wgEncodeRikenCageMinusClustersHepg2NucleolusTotal MinusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Minus Strand Start Sites (Total RNA in HepG2 nucleolus) Expression wgEncodeRikenCageMinusClustersHepg2NucleusLongnonpolya HepG2 nucl pA- - HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCageMinusClustersHepg2NucleusLongnonpolya MinusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in HepG2 nucleus) Expression wgEncodeRikenCageMinusClustersHepg2CytosolLongnonpolya HepG2 cyto pA- - HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCageMinusClustersHepg2CytosolLongnonpolya MinusClusters hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in HepG2 cytosol) Expression wgEncodeRikenCageMinusClustersK562NucleolusTotal K562 nlos tot - K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleolus total wgEncodeRikenCageMinusClustersK562NucleolusTotal MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Minus Strand Clusters (Total RNA in K562 nucleolus) Expression wgEncodeRikenCageMinusClustersK562ChromatinTotal K562 chrm tot - K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN chromatin total wgEncodeRikenCageMinusClustersK562ChromatinTotal MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Minus Strand Clusters (Total RNA in K562 chromatin) Expression wgEncodeRikenCageMinusClustersK562NucleoplasmTotal K562 nplm tot - K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleoplasm total wgEncodeRikenCageMinusClustersK562NucleoplasmTotal MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Minus Strand Clusters (Total RNA in K562 nucleoplasm) Expression wgEncodeRikenCageMinusClustersK562NucleusLongpolya K562 nucl pA+ - K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN nucleus longPolyA wgEncodeRikenCageMinusClustersK562NucleusLongpolya MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA+ RNA in K562 nucleus) Expression wgEncodeRikenCageMinusClustersK562NucleusLongnonpolya K562 nucl pA- - K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 nucleus longNonPolyA wgEncodeRikenCageMinusClustersK562NucleusLongnonpolya MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in K562 nucleus) Expression wgEncodeRikenCageMinusClustersK562CytosolLongpolya K562 cyto pA+ - K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN cytosol longPolyA wgEncodeRikenCageMinusClustersK562CytosolLongpolya MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA+ RNA in K562 cytosol) Expression wgEncodeRikenCageMinusClustersK562CytosolLongnonpolya K562 cyto pA- - K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cytosol longNonPolyA wgEncodeRikenCageMinusClustersK562CytosolLongnonpolya MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in K562 cytosol) Expression wgEncodeRikenCageMinusClustersK562PolysomeLongnonpolya K562 psom pA- - K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN polysome longNonPolyA wgEncodeRikenCageMinusClustersK562PolysomeLongnonpolya MinusClusters leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Strand of mRNA with ribosomes attached Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in K562 polysome) Expression wgEncodeRikenCageMinusClustersH1hescCellLongnonpolya H1 cell pA- - H1-hESC Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cell longNonPolyA wgEncodeRikenCageMinusClustersH1hescCellLongnonpolya MinusClusters embryonic stem cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Start Sites (PolyA- RNA in H1-hESC cell) Expression wgEncodeRikenCageMinusClustersGm12878NucleolusTotal GM128 nlos tot - GM12878 Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN nucleolus total wgEncodeRikenCageMinusClustersGm12878NucleolusTotal MinusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) ENCODE RIKEN CAGE Minus Strand Start Sites (Total RNA in GM12878 nucleolus) Expression wgEncodeRikenCageMinusClustersGm12878NucleusLongnonpolya GM128 nucl pA- - GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCageMinusClustersGm12878NucleusLongnonpolya MinusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in GM12878 nucleus) Expression wgEncodeRikenCageMinusClustersGm12878CytosolLongnonpolya GM128 cyto pA- - GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCageMinusClustersGm12878CytosolLongnonpolya MinusClusters B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt ENCODE RIKEN CAGE Minus Strand Clusters (PolyA- RNA in GM12878 cytosol) Expression wgEncodeRikenCageView5Alignments Alignments ENCODE RIKEN RNA Subcellular Localization by CAGE Tags Expression wgEncodeRikenCageAlignmentsProstateCellLongnonpolya Pros cell pA- A prostate Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cell longNonPolyA wgEncodeRikenCageAlignmentsProstateCellLongnonpolya Alignments prostate tissue purchased for CSHL project CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in Prostate whole cell) Expression wgEncodeRikenCageAlignmentsNhekNucleusLongnonpolya NHEK nucl pA- A NHEK Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN nucleus longNonPolyA Nexalign1.3.5 wgEncodeRikenCageAlignmentsNhekNucleusLongnonpolya Alignments epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in NHEK nucleus) Expression wgEncodeRikenCageAlignmentsNhekCytosolLongnonpolya NHEK cyto pA- A NHEK Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN cytosol longNonPolyA Nexalign1.3.5 wgEncodeRikenCageAlignmentsNhekCytosolLongnonpolya Alignments epidermal keratinocytes CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in NHEK cytosol) Expression wgEncodeRikenCageAlignmentsHuvecCytosolLongnonpolya HUVEC cyto pA- A HUVEC Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cytosol longNonPolyA Nexalign1.3.5 wgEncodeRikenCageAlignmentsHuvecCytosolLongnonpolya Alignments umbilical vein endothelial cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in HUVEC cytosol) Expression wgEncodeRikenCageAlignmentsHepg2NucleolusTotal HepG2 nlos tot A HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN nucleolus total Nexalign1.3.5 wgEncodeRikenCageAlignmentsHepg2NucleolusTotal Alignments hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (Total RNA in HepG2 nucleolus) Expression wgEncodeRikenCageAlignmentsHepg2NucleusLongnonpolya HepG2 nucl pA- A HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN nucleus longNonPolyA Nexalign1.3.5 wgEncodeRikenCageAlignmentsHepg2NucleusLongnonpolya Alignments hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in HepG2 nucleus) Expression wgEncodeRikenCageAlignmentsHepg2CytosolLongnonpolya HepG2 cyto pA- A HepG2 Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN cytosol longNonPolyA Nexalign1.3.5 wgEncodeRikenCageAlignmentsHepg2CytosolLongnonpolya Alignments hepatocellular carcinoma CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in HepG2 cytosol) Expression wgEncodeRikenCageAlignmentsK562NucleolusTotal K562 nlos tot A K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleolus total wgEncodeRikenCageAlignmentsK562NucleolusTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (Total RNA in K562 nucleolus) Expression wgEncodeRikenCageAlignmentsK562ChromatinTotal K562 chrm tot A K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN chromatin total wgEncodeRikenCageAlignmentsK562ChromatinTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Nuclear DNA and associated proteins Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (Total RNA in K562 chromatin) Expression wgEncodeRikenCageAlignmentsK562NucleoplasmTotal K562 nplm tot A K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN nucleoplasm total wgEncodeRikenCageAlignmentsK562NucleoplasmTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center That part of the nuclear content other than the chromosomes or the nucleolus Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (Total RNA in K562 nucleoplasm) Expression wgEncodeRikenCageAlignmentsK562NucleusLongpolya K562 nucl pA+ A K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN nucleus longPolyA wgEncodeRikenCageAlignmentsK562NucleusLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA+ RNA in K562 nucleus) Expression wgEncodeRikenCageAlignmentsK562NucleusLongnonpolya K562 nucl pA- A K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 nucleus longNonPolyA wgEncodeRikenCageAlignmentsK562NucleusLongnonpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in K562 nucleus) Expression wgEncodeRikenCageAlignmentsK562CytosolLongpolya K562 cyto pA+ A K562 Cage ENCODE Feb 2009 Freeze 2009-02-04 2009-11-04 Gingeras RIKEN cytosol longPolyA wgEncodeRikenCageAlignmentsK562CytosolLongpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA+ RNA in K562 cytosol) Expression wgEncodeRikenCageAlignmentsK562CytosolLongnonpolya K562 cyto pA- A K562 Cage ENCODE Nov 2008 Freeze 2008-12-09 2009-09-09 Gingeras RIKEN Nexalign 1.3.3 cytosol longNonPolyA wgEncodeRikenCageAlignmentsK562CytosolLongnonpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in K562 cytosol) Expression wgEncodeRikenCageAlignmentsK562PolysomeLongnonpolya K562 psom pA- A K562 Cage ENCODE Feb 2009 Freeze 2009-02-19 2009-11-19 Gingeras RIKEN polysome longNonPolyA wgEncodeRikenCageAlignmentsK562PolysomeLongnonpolya Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Strand of mRNA with ribosomes attached Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in K562 polysome) Expression wgEncodeRikenCageAlignmentsH1hescCellLongnonpolya H1 cell pA- A H1-hESC Cage ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 Gingeras RIKEN cell longNonPolyA wgEncodeRikenCageAlignmentsH1hescCellLongnonpolya Alignments embryonic stem cells CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Whole cell Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in H1-hESC cell) Expression wgEncodeRikenCageAlignmentsGm12878NucleolusTotal GM128 nucl tot A GM12878 Cage ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 Gingeras RIKEN nucleolus total Nexalign 1.3.5 wgEncodeRikenCageAlignmentsGm12878NucleolusTotal Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The part of the nucleus where ribosomal RNA is actively transcribed Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (Total RNA in GM12878 nucleolus) Expression wgEncodeRikenCageAlignmentsGm12878NucleusLongnonpolya GM128 nucl pA- A GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN nucleus longNonPolyA wgEncodeRikenCageAlignmentsGm12878NucleusLongnonpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in GM12878 nucleus) Expression wgEncodeRikenCageAlignmentsGm12878CytosolLongnonpolya GM128 cyto pA- A GM12878 Cage ENCODE Feb 2009 Freeze 2009-03-09 2009-12-09 Gingeras RIKEN cytosol longNonPolyA wgEncodeRikenCageAlignmentsGm12878CytosolLongnonpolya Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus CAGE 5' RNA Tags Gingeras Carninci - RIKEN Omics Science Center The fluid between the cells outer membrane and the nucleus Poly(A)- RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE RIKEN CAGE Tags (PolyA- RNA in GM12878 cytosol) Expression rnaGene RNA Genes Non-coding RNA Genes (dark) and Pseudogenes (light) Genes and Gene Predictions Description This track shows the location of non-protein coding RNA genes and pseudogenes. Feature types include: tRNA: Transfer RNA (or pseudogene) rRNA: Ribosomal RNA (or pseudogene) scRNA: Small cytoplasmic RNA (or pseudogene) snRNA: Small nuclear RNA (or pseudogene) snoRNA: Small nucleolar RNA (or pseudogene) miRNA: MicroRNA (or pseudogene) misc_RNA: Miscellaneous other RNA, such as Xist (or pseudogene) mt-tRNA: Mitochondrial tRNA-derived pseudogene Methods Eddy-tRNAscanSE (tRNA genes, Sean Eddy): tRNAscan-SE 1.23 with default parameters. Score field contains tRNAscan-SE bit score; >20 is good, >50 is great. Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy): Wublast 2.0, with options "-kap wordmask=seg B=50000 W=8 cpus=1". Score field contains % identity in blast-aligned region. Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE in the human oo27 assembly as queries. Kept all nonoverlapping regions that hit one or more of these with P Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson): Wublastn 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg B=5000 W=8 cpus=1". Score field contains blast score. Used each of 104 unique snoRNAs in snorna.lib as a query. Any hit >=95% full length and >=90% identity is annotated as a "true gene". Any other hit with P Eddy-BLAST-otherrnalib (non-tRNA, non-snoRNA noncoding RNAs with GenBank entries for the human gene.): Wublastn 2.0 [15 Apr 2002] with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0 -B=50000 -Z=3000000000". Exceptions to this are: Large ncRNAs (LSU & SSU rRNA, H19, Xist): change "-W=11"; addition "-maskextra=50". Xist contains repetitive elements and was masked with RepeatMasker, Library version 6.8. microRNAs: "-kap -cpus=1 -S=70 -hspmax=0 -B=100" replaces all above parameters. The score field contains the blastn score. 41 unique miRNAs and 29 other ncRNAs were used as queries. Any hit >=95% full length and >=95% identity is annotated as a "true gene". Any other hit with P = 65% identity is annotated as a "related sequence". There is an exception to this: all miRNAs consist of 16-26 bp sequences in GenBank and are annotated only if they are 100% full length and have 100% identity. The set of miRNAs used consists of Let-7 from Pasquinelli et al. (2000) and 40 miRNAs from Mourelatos et al. (2002), as mentioned in the references section below. Credits These data were kindly provided by Sean Eddy at Washington University. References Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B, Hayward DC, Ball EE, Degnan B, M�ller P, et al. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature. 2000 Nov 2;408(6808):86-9. Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, Abel L, Rappsilber J, Mann M, Dreyfuss G. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev. 2002 Mar 15;16(6):720-8. ntSssSnps S SNPs SNPS Used for Selective Sweep Scan (S) Neandertal Assembly and Analysis Description This track shows single nucleotide polymorphisms (SNPs) used in a genome-wide scan for signals of positive selection in the human lineage since divergence from the Neandertal lineage. SNP labels represent the ancestral (A) or derived (D) status, determined by comparison with the chimpanzee reference genome, of alleles in the human reference assembly, five modern human genomes of diverse ancestry (see the Modern Human Seq track), and Neandertals. The first six characters of an item name show the status of the allele (A, D or _ if not known) in six genomes: human reference, San, Yoruba, Han, Papuan, and French, in that order. These characters are followed by a colon, the number of derived alleles found in Neandertals, a comma and the number of ancestral alleles found in Neandertals. For example, a SNP labeled AAADAA:0D,2A has the ancestral allele in the reference human genome and in all of the modern human genomes except Han. Among Neandertals, two instances of the ancestral allele were found, but no instances of the derived allele. SNPs are colored red when at least four of the six modern human genomes are derived while all observed Neandertal alleles are ancestral. An overrepresentation of such SNPs in a region would imply that the region had undergone positive selection in the modern human lineage since divergence from Neandertals; the Sel Swp Scan (S) track displays a signal calculated from these SNPs, and the 5% Lowest S track contains the regions in which the signal most strongly indicates selective pressure on the modern human lineage. Display Conventions and Configuration Red SNPs are those where at least four of the six modern human genomes are derived while all observed Neandertal alleles are ancestral. All other SNPs are black. Methods For the purposes of this analysis, SNPs were defined as single-base sites that are polymorphic among 5 modern human genomes of diverse ancestry (see the Modern Human Seq track) plus the human reference genome. SNPs at CpG sites were excluded because of the higher mutation rate at CpG sites. Ancestral or derived state was determined by comparison with the chimpanzee genome. Credits This track was produced at UCSC using data generated by Ed Green. Reference Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 genomicSuperDups Segmental Dups Duplications of >1000 Bases of Non-RepeatMasked Sequence Variation and Repeats Description This track shows regions detected as putative genomic duplications within the golden path. The following display conventions are used to distinguish levels of similarity: Light to dark gray: 90 - 98% similarity Light to dark yellow: 98 - 99% similarity Light to dark orange: greater than 99% similarity Red: duplications of greater than 98% similarity that lack sufficient Segmental Duplication Database evidence (most likely missed overlaps) For a region to be included in the track, at least 1 Kb of the total sequence (containing at least 500 bp of non-RepeatMasked sequence) had to align and a sequence identity of at least 90% was required. Methods Segmental duplications play an important role in both genomic disease and gene evolution. This track displays an analysis of the global organization of these long-range segments of identity in genomic sequence. Large recent duplications (>= 1 kb and >= 90% identity) were detected by identifying high-copy repeats, removing these repeats from the genomic sequence ("fuguization") and searching all sequence for similarity. The repeats were then reinserted into the pairwise alignments, the ends of alignments trimmed, and global alignments were generated. For a full description of the "fuguization" detection method, see Bailey et al., 2001. This method has become known as WGAC (whole-genome assembly comparison); for example, see Bailey et al., 2002. Credits These data were provided by Ginger Cheng, Xinwei She, Archana Raja, Tin Louie and Evan Eichler at the University of Washington. References Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002 Aug 9;297(5583):1003-7. PMID: 12169732 Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001 Jun;11(6):1005-17. PMID: 11381028; PMC: PMC311093 ntSssZScorePMVar Sel Swp Scan (S) Selective Sweep Scan (S) on Neandertal vs. Human Polymorphisms (Z-Score +- Variance) Neandertal Assembly and Analysis Description This track shows the S score (Z-score +- variance) for positive selection in humans within a 100 kb window surrounding each polymorphic position in the five modern human sequences and the human reference genome as described in Green et al., Supplemental Online Material Text 13, Burbano et al.. A positive score indicates more derived alleles in Neandertal than expected, given the frequency of derived alleles in human. A negative score indicates fewer derived alleles in Neandertal, and may indicate an episode of positive selection in early humans. To view the polymorphic sites on which the S score was computed, open the S SNPs track. Methods Green et al. identified single-base sites that are polymorphic among five modern human genomes of diverse ancestry (in the Modern Human Seq track) plus the human reference genome. CpG sites were excluded because of the higher mutation rate at CpG sites. The ancestral or derived state of each single nucleotide polymorphism (SNP) was determined by comparison with the chimpanzee genome. The SNPs are displayed in the S SNPs track. The fact that SNPs with higher frequencies of the derived allele in modern humans were more likely to show the derived allele in Neandertals was used to calculate the expected number of derived alleles in Neandertal within a given region of the human genome. The observed numbers of derived alleles were compared to the expected numbers to identify regions where the Neandertals carry fewer derived alleles than expected given the human allelic states. The score assigned to each SNP is the z-score of the observed and expected counts relative to the variance in the number of the expected counts of derived alleles within the 100,000-base window around the SNP. Note: In order to display both the score and the variance within the same track in the UCSC Genome Browser, the scores were modified as follows: at the SNP position, the value displayed is the score plus the variance. At the position following the SNP position, the score minus the variance is displayed. When viewing large regions (at least 100,000 bases), the default mean+whiskers condensation of the scores provides an indication of the range covered by the variance. Reference Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 chainSelf Self Chain Human Chained Self Alignments Variation and Repeats Description This track shows alignments of the human genome with itself, using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. The system can also tolerate gaps in both sets of sequence simultaneously. After filtering out the "trivial" alignments produced when identical locations of the genome map to one another (e.g. chrN mapping to chrN), the remaining alignments point out areas of duplication within the human genome. The pseudoautosomal regions of chrX and chrY are an exception: in this assembly, these regions have been copied from chrX into chrY, resulting in a large amount of self chains aligning in these positions on both chromosomes. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the query assembly or an insertion in the target assembly. Double lines represent more complex gaps that involve substantial sequence in both the query and target assemblies. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one of the assemblies. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Display Conventions and Configuration By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Methods The genome was aligned to itself using blastz. Trivial alignments were filtered out, and the remaining alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single target chromosome and a single query chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of 10,000 were discarded; the remaining chains are displayed in this track. Credits Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains were generated by Robert Baertsch and Jim Kent. References Chiaromonte, F., Yap, V.B., Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002). Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003). Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003). sestanBrainAtlas Sestan Brain Sestan Lab Human Brain Atlas Microarrays Expression Description This track displays exon microarray expression data from the late mid-fetal human brain, generated by the Sestan Lab at Yale University. The data represent 13 brain regions, including nine areas of neocortex, and both hemispheres. By default, arrays are grouped by the median for each brain region, including each neocortical area. Alternatively, neocortex areas can be grouped together; arrays can be grouped by mean; or all 95 arrays can be shown individually. Methods RNA was isolated from 13 brain regions, from both hemispheres, of four late mid-fetal human brains, with a total PMI of less than one hour, and hybridized to Affymetrix Human Exon 1.0 ST arrays. Affymetrix CEL files were imported into Partek GS using Robust Multichip Average (RMA) background correction, quantile normalization, and GC content correction. The normalized data were then converted to log-ratios, relative to arrays hybridized with RNA pooled from all regions of the same brain. Signal log-ratios are displayed here as green for negative (underexpression) and red for positive (overexpression). The probe set for this microarray track can be displayed by turning on the Affy HuEx 1.0 track. Core, extended, and full probe sets are shown. "Bounded" probe sets - exons that lie within the intron of more than one gene - and potentially cross-hybridizing probe sets were filtered from this dataset, leaving ~875K probe sets. Credits The data for this track were generated and analyzed by Matthew B. Johnson, Yuka Imamura Kawasawa, Christopher Mason, and the Yale Neuroscience Microarray Center. Links The raw microarray data are available via the NCBI Gene Expression Omnibus. More information is available at https://hbatlas.org/. sgpGene SGP Genes SGP Gene Predictions Using Mouse/Human Homology Genes and Gene Predictions Description This track shows gene predictions from the SGP2 homology-based gene prediction program developed by Roderic Guigó's "Computational Biology of RNA Processing" group, which is part of the Centre de Regulació Genòmica (CRG) in Barcelona, Catalunya, Spain. To predict genes in a genomic query, SGP2 combines geneid predictions with tblastx comparisons of the genome of the target species against genomic sequences of other species (reference genomes) deemed to be at an appropriate evolutionary distance from the target. Credits Thanks to the "Computational Biology of RNA Processing" group for providing these data. sibTxGraph SIB Alt-Splicing Alternative Splicing Graph from Swiss Institute of Bioinformatics mRNA and EST Description This track shows the graphs constructed by analyzing experimental RNA transcripts and serves as basis for the predicted alternative splicing transcripts shown in the SIB Genes track. The blocks represent exons; lines indicate introns. The graphical display is drawn such that no exons overlap, making alternative events easier to view when the track is in full display mode and the resolution is set to approximately gene-level. Further information on the graphs can be found on the Transcriptome Web interface. Methods The splicing graphs were generated using a multi-step pipeline: RefSeq and GenBank RNAs and ESTs are aligned to the genome with SIBsim4, keeping only the best alignments for each RNA. Alignments are broken up at non-intronic gaps, with small isolated fragments thrown out. A splicing graph is created for each set of overlapping alignments. This graph has an edge for each exon or intron, and a vertex for each splice site, start, and end. Each RNA that contributes to an edge is kept as evidence for that edge. Graphs consisting solely of unspliced ESTs are discarded. Credits The SIB Alternative Splicing Graphs track was produced on the Vital-IT high-performance computing platform using a computational pipeline developed by Christian Iseli with help from colleagues at the Ludwig Institute for Cancer Research and the Swiss Institute of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them. sibGene SIB Genes Swiss Institute of Bioinformatics Gene Predictions from mRNA and ESTs Genes and Gene Predictions Description The SIB Genes track is a transcript-based set of gene predictions based on data from RefSeq and EMBL/GenBank. Genes all have the support of at least one GenBank full length RNA sequence, one RefSeq RNA, or one spliced EST. The track includes both protein-coding and non-coding transcripts. The coding regions are predicted using ESTScan. Display Conventions and Configuration This track in general follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks while those for coding open reading frames are thicker. This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Further information on the predicted transcripts can be found on the Transcriptome Web interface. Methods The SIB Genes are built using a multi-step pipeline: RefSeq and GenBank RNAs and ESTs are aligned to the genome with SIBsim4, keeping only the best alignments for each RNA. Alignments are broken up at non-intronic gaps, with small isolated fragments thrown out. A splicing graph is created for each set of overlapping alignments. This graph has an edge for each exon or intron, and a vertex for each splice site, start, and end. Each RNA that contributes to an edge is kept as evidence for that edge. The graph is traversed to generate all unique transcripts. The traversal is guided by the initial RNAs to avoid a combinatorial explosion in alternative splicing. Protein predictions are generated. Credits The SIB Genes track was produced on the Vital-IT high-performance computing platform using a computational pipeline developed by Christian Iseli with help from colleagues at the Ludwig Institute for Cancer Research and the Swiss Institute of Bioinformatics. It is based on data from NCBI RefSeq and GenBank/EMBL. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 wgRna sno/miRNA C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase Genes and Gene Predictions Description This track displays positions of four different types of RNA in the human genome: precursor forms of microRNAs (pre-miRNAs) C/D box small nucleolar RNAs (C/D box snoRNAs) H/ACA box snoRNAs small Cajal body-specific RNAs (scaRNAs) C/D box and H/ACA box snoRNAs are guides for the 2'O-ribose methylation and the pseudouridilation, respectively, of rRNAs and snRNAs, although many of them have no documented target RNA. The scaRNAs guide modifications of the spliceosomal snRNAs transcribed by RNA polymerase II, and often contain both C/D and H/ACA domains. The pre-miRNA data are from the miRBase Sequence Database at the Wellcome Trust Sanger Institute. The snoRNA and scaRNA data are from snoRNABase, which is maintained at the Laboratoire de Biologie Mol�culaire Eucaryote. Display Conventions and Configuration This track follows the general display conventions for gene prediction tracks. At a zoomed-in resolution, arrows superimposed on the blocks indicate the sense orientation of the RNAs. The RNA types are represented by blocks of the following colors: red = pre-miRNA blue = C/D box snoRNA green = H/ACA box snoRNA magenta = scaRNA Methods Pre-miRNA genomic locations from miRBase were calculated using wublastn for sequence alignment with the requirement of 100% identity. The extents of the precursor sequences were not generally known and were predicted based on base-paired hairpin structure. The snoRNAs and scaRNAs genomic locations from snoRNABase were aligned against the human genome using BLAT. In a few cases, no exact match was found for the published sequences; these likely correspond to sequencing errors. In these cases, the best BLAT hit (which differed from the published sno/scaRNA sequence by 1-3 nucleotides) was adopted. Credits The genome coordinates for the pre-miRNAs were obtained from the miRBase Sequence Dabtabse FTP site, and the genome coordinates for the snoRNA and scaRNA were obtained from the snoRNABase coordinates download page. References When making use of these data, please cite the following articles and, if applicable, the primary sources of the RNA sequences: Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006 Jan 1;334(Database issue):D14-4. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008 Jan;36(Database issue):D154-8. Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D158- 62. Weber MJ. New human and mouse microRNA genes found by homology search. Febs J. 2005 Jan;272(1):59-73. For more information on BLAT, see Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002;12(4):656-664. The following publication provides guidelines on miRNA annotation: Ambros V. et al., A uniform system for microRNA annotation. RNA. 2003;9(3):277-9. snp126 SNPs (126) Simple Nucleotide Polymorphisms (dbSNP build 126) Variation and Repeats Description This track contains dbSNP build 126, available from ftp.ncbi.nih.gov/snp. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The configuration categories reflect the following definitions (not all categories apply to this assembly): Location Type: Describes the alignment of the flanking sequence Range - the flank alignments leave a gap of 2 or more bases in the reference assembly Exact - the flank alignments leave exactly one base between them Between - the flank alignments are contiguous; the variation is an insertion RangeInsertion - the flank alignments surround a distinct polymorphism between the submitted sequence and reference assembly; the submitted sequence is shorter RangeSubstitution - the flank alignments surround a distinct polymorphism between the submitted sequence and reference assembly; the submitted sequence and the reference assembly sequence are of equal length RangeDeletion - the flank alignments surround a distinct polymorphism between the submitted sequence and reference assembly; the submitted sequence is longer Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion) Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name No Variation - no variation asserted for sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G} Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap - validated by HapMap project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. Locus Region - variation is 3' to and within 500 bases of a transcript, or is 5' to and within 2000 bases of a transcript (dbSNP term: locus; Sequence Ontology term: feature_variant) Coding - Synonymous - no change in peptide for allele with respect to the reference assembly (dbSNP term: coding-synon; Sequence Ontology term: synonymous_variant) Coding - Non-Synonymous - change in peptide for allele with respect to the reference assembly (dbSNP term: coding-nonsynon; Sequence Ontology term: protein_altering_variant) Untranslated - variation is in a transcript, but not in a coding region interval (dbSNP term: untranslated; Sequence Ontology term: UTR_variant) Intron - variation is in an intron, but not in the first two or last two bases of the intron (dbSNP term: intron; Sequence Ontology term: intron_variant) Splice Site - variation is in the first two or last two bases of an intron (dbSNP term: splice-site; Sequence Ontology term: splice_site_variant) Reference (coding) - one of the observed alleles of a SNP in a coding region matches the reference assembly (cds-reference) Sequence Ontology term: coding_sequence_variant) Unknown - no known functional classification Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Average heterozygosity: Calculated by dbSNP as described here Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 3. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene). Insertions/Deletions dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and 'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions relative to the reference assembly. The location type 'between' indicates insertions relative to the reference assembly. For the new location types, the class 'in-del' is preserved. UCSC Annotations In addition to presenting the dbSNP data, the following annotations are provided: The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the dbSNP reference allele is the reverse complement of the UCSC reference allele. Single-base substitutions where the alignments of the flanking sequences are adjacent or have a gap of more than one base are noted. Observed alleles with an unexpected format are noted. The length of observed alleles is checked for consistency with location types; exceptions are noted. Single-base substitutions are checked to see that one of the observed alleles matches the reference allele; exceptions are noted. Simple deletions are checked to see that the observed allele matches the reference allele; exceptions are noted. Tri-allelic and quad-allelic single-base substitutions are noted. Variants that have multiple mappings are noted. Data Sources Coordinates, orientation, location type and dbSNP reference allele data were obtained from b126_SNPContigLoc_36_1.bcp.gz. b126_SNPMapInfo_36_1.bcp.gz provided the alignment weights; alignments with weight = 0 or weight = 10 were filtered out. Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz, using the univar_id from SNP.bcp.gz as an index. Functional classification was obtained from b126_SNPContigLocusId_36_1.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. The header lines in the rs_fasta files were used for molecule type. Orthologous Alleles (human only) Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' locType = 'exact' chromEnd = chromStart + 1 align to just one location are not aligned to a chrN_random chrom are biallelic (not tri or quad allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. . Nucleic Acids Res. 2001 Jan 1;29(1):308-11. snp128 SNPs (128) Simple Nucleotide Polymorphisms (dbSNP build 128) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 128, available from ftp.ncbi.nih.gov/snp. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The configuration categories reflect the following definitions (not all categories apply to this assembly): Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name No Variation - no variation asserted for sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G} Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap - validated by HapMap project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. As of dbSNP build 128, several functional terms have been replaced by more detailed functional terms; for filtering and coloring, the new function terms are grouped into more general categories: Locus Region - variation is 3' to and within 500 bases of a transcript, or is 5' to and within 2000 bases of a transcript (dbSNP terms: near-gene-3 and near-gene-5 replace older term locus; Sequence Ontology terms: downstream_gene_variant, upstream_gene_variant) Coding - Synonymous - no change in peptide for allele with respect to reference assembly (dbSNP term: coding-synon; Sequence Ontology term: synonymous_variant) Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly (dbSNP terms: nonsense, missense, frameshift replace older term coding-nonsynon; Sequence Ontology terms: stop_gained, missense_variant, frameshift_variant) Untranslated - variation in transcript, but not in coding region interval (dbSNP terms: untranslated-3, untranslated-5 replace older term untranslated; Sequence Ontology terms: 3_prime_UTR_variant, 5_prime_UTR_variant) Intron - variation in intron, but not in first two or last two bases of intron (dbSNP term: intron; Sequence Ontology term: intron_variant) Splice Site - variation in first two or last two bases of intron (dbSNP terms: splice-3, splice-5 replace older term splice-site; Sequence Ontology terms: splice_acceptor_variant, splice_donor_variant) Reference (coding) - one of the observed alleles of a SNP in a coding region matches the reference assembly (dbSNP term: cds-reference; Sequence Ontology term: coding_sequence_variant) Unknown - no known functional classification Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Average heterozygosity: Calculated by dbSNP as described here Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 3. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene). Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Annotations UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found: The dbSNP reference allele is not the same as the UCSC reference allele, i.e. the bases in the mapped position range. Class is single, in-del, mnp or mixed and the UCSC reference allele does not match any observed allele. In NCBI's alignment of flanking sequences to the genome, part of the flanking sequence around the SNP does not align to the genome. Class is single, but the size of the mapped SNP is not one base. Class is named and indicates an insertion or deletion, but the size of the mapped SNP implies otherwise. Class is single and the format of observed alleles is unexpected. The length of the observed allele(s) is not available because it is too long. Multiple distinct insertion SNPs have been mapped to this location. At least one observed allele contains an ambiguous IUPAC base (e.g. R, Y, N). Another condition, which does not necessarily imply any problem, is noted: Class is single and SNP is tri-allelic or quad-allelic. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Data Sources The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b128_SNPContigLoc_36_2.bcp.gz and b128_SNPContigInfo_36_2.bcp.gz. b128_SNPMapInfo_36_2.bcp.gz provided the alignment weights. Functional classification was obtained from b128_SNPContigLocusId_36_2.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for mm9 and hg18 (snp128*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human only) Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' chromEnd = chromStart + 1 align to just one location are not aligned to a chrN_random chrom are biallelic (not tri or quad allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download here. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. snp129 SNPs (129) Simple Nucleotide Polymorphisms (dbSNP build 129) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 129, available from ftp.ncbi.nih.gov/snp. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The configuration categories reflect the following definitions (not all categories apply to this assembly): Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name No Variation - no variation asserted for sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G} Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap - validated by HapMap project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. For filtering and coloring, function terms are grouped into more general categories: Locus Region - variation is 3' to and within 500 bases of a transcript, or is 5' to and within 2000 bases of a transcript (dbSNP terms: near-gene-3, near-gene-5; Sequence Ontology terms: downstream_gene_variant, upstream_gene_variant) Coding - Synonymous - no change in peptide for allele with respect to reference assembly (dbSNP term: coding-synon; Sequence Ontology term: synonymous_variant) Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly (dbSNP terms: nonsense, missense, frameshift; Sequence Ontology terms: stop_gained, missense_variant, frameshift_variant) Untranslated - variation in transcript, but not in coding region interval (dbSNP terms: untranslated-3, untranslated-5; Sequence Ontology terms: 3_prime_UTR_variant, 5_prime_UTR_variant) Intron - variation in intron, but not in first two or last two bases of intron (dbSNP term: intron; Sequence Ontology term: intron_variant) Splice Site - variation in first two or last two bases of intron (dbSNP terms: splice-3, splice-5; Sequence Ontology terms: splice_acceptor_variant, splice_donor_variant) Reference (coding) - one of the observed alleles of a SNP in a coding region matches the reference assembly (dbSNP term: cds-reference; Sequence Ontology term: coding_sequence_variant) Unknown - no known functional classification Molecule Type: Sample used to find this variant Note: the dbSNP release 129 fasta headers have swapped values: "genomic" for "cDNA" SNPs and vice versa. UCSC has swapped them back, so the displayed molecule type should be correct but might disagree with files downloaded from dbSNP. Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Average heterozygosity: Calculated by dbSNP as described here Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 3. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene). Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Annotations UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found: The dbSNP reference allele is not the same as the UCSC reference allele, i.e. the bases in the mapped position range. Class is single, in-del, mnp or mixed and the UCSC reference allele does not match any observed allele. In NCBI's alignment of flanking sequences to the genome, part of the flanking sequence around the SNP does not align to the genome. Class is single, but the size of the mapped SNP is not one base. Class is named and indicates an insertion or deletion, but the size of the mapped SNP implies otherwise. Class is single and the format of observed alleles is unexpected. The length of the observed allele(s) is not available because it is too long. Multiple distinct insertion SNPs have been mapped to this location. At least one observed allele contains an ambiguous IUPAC base (e.g. R, Y, N). Another condition, which does not necessarily imply any problem, is noted: Class is single and SNP is tri-allelic or quad-allelic. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Data Sources The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b129_SNPContigLoc_36_3.bcp.gz and b129_SNPContigInfo_36_3.bcp.gz. b129_SNPMapInfo_36_3.bcp.gz provided the alignment weights. Functional classification was obtained from b129_SNPContigLocusId_36_3.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Orthologous Alleles (human assemblies only) Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. Beginning with dbSNP build 129, the orangutan assembly is also included. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' chromEnd = chromStart + 1 align to just one location are not aligned to a chrN_random chrom are biallelic (not tri or quad allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download here. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. encodeStanfordPromoters Stanf Promoter Stanford Promoter Activity Pilot ENCODE Transcription Description This track displays activity levels of 643 putative promoter fragments in the ENCODE regions, based on high-throughput transient transfection luciferase reporter assays. The activity of each putative promoter is indicated by color, ranging from black (no activity) to red (strong activity). Each of the fragments was tested in a panel of 16 cell lines: Cell LineClassificationIsolated From AGSgastric adenocarcinomastomach BE(2)-Cneuroblastomabrain (metastatic, from bone marrow) T98G (CRL-1690)glioblastomabrain G-402renal leiomyoblastomakidney HCT 116colorectal carcinomacolon HMCBmelanomaskin HT-1080fibrosarcomaconnective tissue SK-N-SH (HTB-11)neuroblastomabrain (metastatic, from bone marrow) HeLaadenocarcinomacervix HepG2hepatocellular carcinomaliver JEG-3choriocarcinomaplacenta MG-63osteosarcomabone MRC-5fibroblastlung PANC-1epithelioid carcinomapancreas (duct) SNU-182hepatocellular carcinomaliver U-87 MGglioblastoma-astrocytomabrain Methods Promoters in the ENCODE region were predicted using a variation on methods previously described (Trinklein et al., 2003, Trinklein et al., 2004). Using BLAT alignments of human cDNAs in Genbank to the genome, those with at least one bp of exon overlap were merged, generating gene models. The transcription start sites were predicted by assigning the 5' end of each gene model as one transcription start site and alternative 5' ends that were at least 500 bp downstream and supported by full-length cDNAs as other start sites. Promoters were defined as the regions approximately 600 bp upstream and 100 bp downstream of each transcription start site. Primer3 was used to pick primers yielding approximately 500 bp amplicons containing the predicted transcription start site. Each fragment of DNA represented in this track was cloned into a luciferase reporter vector (pGL3-Basic, Promega) using the BD Clontech Infusion Cloning System. The Dual Luciferase system (Promega) was used to co-transfect the experimental DNA along with a control plasmid expressing Renilla - to control for variation in transcription efficiency - in 96-well format into one of the sixteen cell types using FuGENE Transfection Reagent (Roche). Each transfection was done in duplicate. Data are reported as normalized and log2 transformed averages of the Luciferase/Renilla ratio. This normalization was based on the activity of 102 random genomic fragments (negative controls) derived from exons and intergenic regions. Such a normalization allows for a meaningful comparison between cell types. The average log transformed Luciferase/Renilla ratio was scaled linearly to create a score where the maximum value is 1000 and the minimum value is 0. This score is arbitrary and for visualization purposes only; the raw ratio values should be used for all analyses. Verification Data were verified by repeating the preparation and measurement of 48 random fragments. No significant variation between the two preparations was detected. A spreadsheet containing the negative control data can be downloaded here. Credits This work was done in collaboration at the Myers Lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). The following people contributed: Sara J. Cooper, Nathan D. Trinklein, Elizabeth D. Anton, Loan Nguyen, and Richard M. Myers. References Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 2006 Jan;16(1):1-10. Epub 2005 Dec 12. Trinklein ND, Aldred SJ, Saldanha AJ, Myers RM. Identification and functional analysis of human transcriptional promoters. Genome Res. 2003 Feb;13(2):308-12. Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM. An abundance of bidirectional promoters in the human genome. Genome Res. 2004 Jan;14(1):62-6. encodeStanfordPromotersAverage Stan Pro Average Stanford Promoter Activity (Average) Pilot ENCODE Transcription encodeStanfordPromotersU87 Stan Pro U87 Stanford Promoter Activity (U87 cells) Pilot ENCODE Transcription encodeStanfordPromotersSnu182 Stan Pro Snu182 Stanford Promoter Activity (Snu182 cells) Pilot ENCODE Transcription encodeStanfordPromotersPanc1 Stan Pro Panc1 Stanford Promoter Activity (Panc1 cells) Pilot ENCODE Transcription encodeStanfordPromotersMRC5 Stan Pro MRC5 Stanford Promoter Activity (MRC5 cells) Pilot ENCODE Transcription encodeStanfordPromotersMG63 Stan Pro MG63 Stanford Promoter Activity (MG63 cells) Pilot ENCODE Transcription encodeStanfordPromotersJEG3 Stan Pro JEG3 Stanford Promoter Activity (JEG3 cells) Pilot ENCODE Transcription encodeStanfordPromotersHepG2 Stan Pro HepG2 Stanford Promoter Activity (HepG2 cells) Pilot ENCODE Transcription encodeStanfordPromotersHela Stan Pro Hela Stanford Promoter Activity (HeLa cells) Pilot ENCODE Transcription encodeStanfordPromotersHTB11 Stan Pro HTB11 Stanford Promoter Activity (HTB11 cells) Pilot ENCODE Transcription encodeStanfordPromotersHT1080 Stan Pro HT1080 Stanford Promoter Activity (HT1080 cells) Pilot ENCODE Transcription encodeStanfordPromotersHMCB Stan Pro HMCB Stanford Promoter Activity (HMCB cells) Pilot ENCODE Transcription encodeStanfordPromotersHCT116 Stan Pro HCT116 Stanford Promoter Activity (HCT116 cells) Pilot ENCODE Transcription encodeStanfordPromotersG402 Stan Pro G402 Stanford Promoter Activity (G402 cells) Pilot ENCODE Transcription encodeStanfordPromotersCRL1690 Stan Pro CRL1690 Stanford Promoter Activity (CRL1690 cells) Pilot ENCODE Transcription encodeStanfordPromotersBe2C Stan Pro Be2c Stanford Promoter Activity (Be2c cells) Pilot ENCODE Transcription encodeStanfordPromotersAGS Stan Pro AGS Stanford Promoter Activity (AGS cells) Pilot ENCODE Transcription encodeStanfordRtPcr Stanf RTPCR Stanford Endogenous Transcript Levels in HCT116 Cells Pilot ENCODE Transcription Description This track displays absolute transcript copy numbers for 136 genes and 12 negative control intergenic regions, determined by RTPCR in HCT116 cells. Display Conventions and Configuration The genomic regions are indicated by solid blocks. The shade of an item gives a rough indication of its count, ranging from light gray for zero to black for a count of 7000 or greater. To display only those items that exceed a specific unnormalized score, enter a minimum score between 0 and 1000 in the text box at the top of the track description page. Methods Total RNA was prepared in quadruplicate from HCT116 cells grown in culture. cDNA was prepared as described in Trinklein et al. (2004). Duplicate primer pairs were designed to each gene, and the absolute number of cDNA molecules containing each amplicon were determined by real-time PCR. The submitted data are the calculated number of molecules of each transcript containing the defined amplicon. Verification Four biological replicates were performed, and two primer pairs were used to measure the abundance of each transcript. Credits These data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. Transcriptional regulation and binding of HSF1 and HSF2 to 32 human heat shock genes during thermal stress and differentiation. Cell Stress Chaperones 9(1), 21-28 (2004). cnp Structural Var Structural Variation Variation and Repeats Description This annotation shows regions detected as putative copy number polymorphisms (CNP) and sites of detected intermediate-sized structural variation (ISV). The CNPs and ISVs were determined by various methods, displayed in individual subtracks within the annotation: Deletions from genotype analysis (Conrad): 935 deletions detected by analysis of SNP genotypes, using the HapMap Phase I data, release 16c.1, CEU and YRI samples. Deletions from haploid hybridization analysis (Hinds): 100 deletions from haploid hybridization analysis in 24 unrelated individuals from the Polymorphism Discovery Resource, selected for SNP LD study. BAC microarray analysis (Iafrate): 236 putative CNP regions detected by BAC microarray analysis in a population of 55 individuals, 16 of which had previously-characterized chromosomal abnormalities. The group consisted of 10 Caucasians, 4 Amerindians, 2 Chinese, 2 Indo-Pakistani, 2 Sub-Saharan African, and 35 of unknown ethnic origin. CNP in duplication-rich regions (Locke): 243 CNP regions were identified using array CGH in the HapMap populations (269 individuals). The study was specific to 130 putative rearrangement hotspot regions. Deletions from genotype analysis (McCarroll): 540 deletions detected by analysis of SNP genotypes, using the HapMap Phase I data, release 16a. SNP and BAC microarray analysis of HapMap data (Redon): 1,445 copy number variable regions found in the HapMap Phase II data. Representational oligonucleotide microarray analysis (ROMA) (Sebat): 80 putative CNP regions detected by ROMA in a population of 20 normal individuals comprised of 1 Biaka, 1 Mbuti, 1 Druze, 1 Melanesian, 4 French, 1 Venezualan, 1 Cambodian, 1 Mayan and 9 of unknown ethnicity. BAC microarray analysis (Sharp): 140 putative CNP regions detected by BAC microarray analysis in a population of 47 individuals comprised of 8 Chinese, 4 Japanese, 10 Czech, 2 Druze, 7 Biaka, 9 Mbuti, and 7 Amerindians. Fosmid mapping (Tuzun): 297 ISV sites detected by mapping paired-end sequences from a human fosmid DNA library. Display Conventions and Configuration CNP and ISV regions are indicated by solid blocks that are color-coded to indicate the type of variation detected: Green: gain (duplications) Red: loss (deletions) Blue: gain and loss (both deletion and duplication) Black: inversion Gray: gain or loss (unknown direction) Note that display IDs are not preserved between assemblies. Conrad subtrack The method used to identify these deletions approximates the breakpoints of each event; therefore, a set of minimal and maximal endpoints is associated with each deletion. Thick lines delineate the minimally deleted region; thin lines delineate the maximally deleted region. Sharp subtrack On the details pages for elements in this subtrack, the table shows value/threshold data for each individual in the population. "Value" is defined as the log2 ratio of fluorescence intensity of test versus reference DNA. "Threshold" is defined as 2 standard deviations from the mean log2 ratio of all autosomal clones per hybridization. The "Disease Percent" value reflects the percent of the BAC that lies within a "rearrangement hotspot", as defined in Sharp et al. (2005). A rearrangement hotspot is defined by the presence of flanking intrachromosomal duplications >10 kb in length with >95% similarity and separated by 50 kb - 10 Mb of intervening sequence. Methods Conrad genotype analysis SNPs in regions that are hemizygous for a deletion are generally miscalled as homozygous for the allele that is present. Hence, when a deletion is transmitted from parent to child, the genotypes at SNPs within the deletion region will often appear to violate the rules of Mendelian transmission. The authors developed a simple algorithm for scanning trio data for unusual runs of consecutive SNPs that, in a single family, have genotype configurations consistent with the presence of a deletion. Hinds haploid hybridization analysis Approximately 600 Mb of genomic DNA from 24 unrelated individuals were obtained from the Polymorphism Discovery Resource. Haploid hybridization was used to identify genomic intervals showing a reduced hybridization signal in comparison to the reference assembly. PCR amplification was performed on 215 candidate deletions. 100 deletions were selected that were unambiguously confirmed. Iafrate BAC microarray analysis All hybridizations were performed in duplicate incorporating a dye-reversal using proprietary 1 Mb GenomeChip V1.2 Human BAC Arrays consisting of 2,632 BAC clones (Spectral Genomics, Houston, TX). The false positive rate was estimated at ~1 clone per 5,264 tested. Further information is available from the Database of Genomic Variants website. Locke analysis of duplication-rich regions DNA samples were obtained from Coriell Cell Repositories. The reference DNA used for all hybridizations was from a single male of Czechoslovakian descent, Coriell ID GM15724 (also used in the Sharp study). A locus was considered a CNV (copy number variation) if the log ratio of fluroescence measurements for the individuals assayed exceeded twice the standard deviation of the autosomal clones in replicate dye-swapped experiments. A CNV was classified as a CNP if altered copy number was observed in more than 1% of the 269 individuals. McCarroll genotype analysis A segregating deletion can leave "footprints" in SNP genotype data, including apparent deviations from Mendelian inheritance, apparent deviations from Hardy-Weinberg equilibrium and null genotypes. Using these clues to discover true variants is challenging, however, because the vast majority of such observations represent technical artifacts and genotyping errors. To determine whether a subset of "failed" SNP genotyping assays in the HapMap data might reflect structural variation, the authors examined whether such failures were physically clustered in a manner that is specific to individuals. Consistent with this hypothesis, the rate of Mendelian-inconsistent genotypes was elevated near other Mendelian-inconsistent genotypes in the same individual but was unrelated to Mendelian inconsistencies in other individuals. The authors systematically looked for regions of the genome in which the same failure profile appeared repeatedly at nearby markers in a manner that was statistically unexpected based on chance. A set of statistical thresholds was tailored to each mode of failure, genotyping center and genotyping platform used in the project. The same procedure could readily apply to dense SNP data from any platform or study. Redon analysis of HapMap data Experiments were performed with the International HapMap DNA and cell-line collection using two technologies: comparative analysis of hybridization intensities on Affymetric GeneChip Human Mapping 500K early access arrays (500K EA) and comparative genomic hybridization with a Whole Genome TilePath (WGTP) array. Sebat ROMA Following digestion with BglII or HindIII, genomic DNA was hybridized to a custom array consisting of 85,000 oligonucleotide probes. The probes were selected to be free of common repeats and have unique homology within the human genome. The average resolution of the array was ~35 kb; however, only intervals in which three consecutive probes showed concordant signals were scored as CNPs. All hybridizations were performed in duplicate incorporating a dye-reversal, with the false positive rate estimated to be ~6%. Sharp BAC microarray analysis All hybridizations were performed in duplicate incorporating a dye-reversal using a custom array consisting of 2,194 end-sequence or FISH-confirmed BACs, targeted to regions of the genome flanked by segmental duplications. The false positive rate was estimated at ~3 clones per 4,000 tested. Tuzun fosmid mapping Paired-end sequences from a human fosmid DNA library were mapped to the assembly. The average resolution of this technique was ~8 kb, and included 56 sites of inversion not detectable by the array-based approaches. However, because of the physical constraints of fosmid insert size, this technique was unable to detect insertions greater than 40 kb in size. Validation Conrad genotype analysis The authors first tested 12 predicted deletions using quantitative PCR. For all 12 deletions, DNA concentrations consistent with transmission of a deletion from parent to child were observed. To provide more extensive validation by comparative genome hybridization (CGH), the authors designed a custom oligonucleotide microarray comprised of 380,000 probes that tile across all 134 candidate deletions identified in 9 HapMap offspring (8 YRI and 1 CEU). The results of this CGH analysis indicate that the majority (about 85%) of candidate deletions detected by the method are real. Locke duplication-rich regions The authors performed validation using a custom oligonucleotide array, hybridized to 9 of the HapMap individuals. Their analysis of the validation experiments indicated a false-negative rate of 5% and a false-positive rate of less than 0.2%. McCarroll genotype analysis Four methods of validation were used: fluorescent in situ hybridization (FISH), two-color fluorescence intensity measurements, PCR amplification and quantitative PCR. The authors performed fluorescent in situ hybridization for five candidate deletions large enough to span available FISH probes. In all five cases, FISH assays confirmed the deletions in the predicted individuals. The authors examined two-color allele-specific fluorescence data from SNP genotyping assays from a data subset available at the Broad Institute, looking for a reduction in fluorescence intensity in individuals predicted to carry a deletion. At most SNPs in the genome, fluorescence intensity measurements clustered into two or three discrete groups corresponding to homozygous and hetrozygous genotypes. At 15 of 17 candidate deletion loci, fluorescence intensity data for one or more SNPs clustered into additional groups that corresponded to the predicted deletion genotypes. The authors used PCR amplification to query 60 loci for which the pattern of genotypes suggested multiple individuals with homozygous deletions. Variants were considered confirmed if the pattern of amplification success and failure matched prediction across a set of 12-24 individuals. The authors confirmed 51 of 60 candidate variants by this criterion. The authors performed quantitative PCR in all 269 HapMap DNA samples for 11 candidate deletions that overlapped the coding exons of genes and that were discovered in many individuals. At 10/11 loci, the authors observed three discrete clusters, identifying individuals with zero, one and two gene copies. All 60 trios displayed Mendelian inheritance for the ten deletions, as well as Hardy-Weinberg equilibrium in all four populations surveyed, and transmission rates close to 50%. This suggests that the deletions behave as a stable, heritable genetic polymorphism. Redon analysis of HapMap data The authors utilized numerous quality meaures, including repeated experiments on the WGTP array for 82 individuals and on the 500K EA array for 15 individuals. The average false-positive rate per experiment was held beneath 5%. Aberrant chromosomes were removed from the analysis. Credits Thanks to Lars Feuk at The Hospital for Sick Children in Toronto for providing these data in hg18 coordinates. References Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet. 2006 Feb;7(2):85-97. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006 Jan;38(1):75-81. Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 2006 Jan;38(1):82-5. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51. Locke DP, Sharp AJ, McCarroll SA, McGrath SD, Newman TL, Cheng Z, Schwartz S, Albertson DG, Pinkel D, Altshuler DM et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet. 2006 Aug;79(2):275-90. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S, Gabriel SB, Lee C, Daly MJ et al. Common deletion polymorphisms in the human genome. Nat Genet. 2006 Jan;38(1):86-92. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al. Global variation in copy number in the human genome. Nature. 2006 Nov 23;444(7118):444-454. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M et al. Large-scale copy number polymorphism in the human genome. Science. 2004 July 23;305(5683):525-8. Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM, Clark RA, Schwartz S, Segraves R et al. Segmental duplications and copy number variation in the human genome. Am J Hum Genet. 2005 Jul;77(1):78-88. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, Hamilton G, Hindle AK, Huey B, Kimura K et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet. 2001 Nov;29(3):263-4. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D et al. Fine-scale structural variation of the human genome. Nat Genet. 2005 Jul;37(7):727-32. Nguyen DQ, Webber C, Ponting CP. Bias of selection on human copy-number variants. PLoS Genet. 2006 Feb;2(2):e20. cnpTuzun Tuzun Fosmids Structural Variation identified by Fosmids (Tuzun) Variation and Repeats cnpSharp2 Sharp CNPs Copy Number Polymorphisms from BAC Microarray Analysis (Sharp) Variation and Repeats cnpSebat2 Sebat CNPs Copy Number Polymorphisms from ROMA (Sebat) Variation and Repeats cnpRedon Redon CNPs Copy Number Polymorphisms from SNP and BAC microarrays (Redon) Variation and Repeats Description 1447 copy number variable regions found in the HapMap Phase II data using SNP and BAC microarray analysis. Display Conventions and Configuration CNP and ISV regions are indicated by solid blocks that are color-coded to indicated the type of variation detected: Green: gain (duplications) Red: loss (deletions) Blue: gain and loss (both deletion and duplication) Black: inversion Gray: gain or loss (unknown direction) Methods Experiments were performed with the International HapMap DNA and cell-line collection using two technologies: comparative analysis of hybridization intensities on Affymetric GeneChip Human Mapping 500K early access arrays (500K EA) and comparative genomic hybridization with a Whole Genome TilePath (WGTP) array. Validation The authors utilized numerous quality meaures, including repeated experiments on the WGTP for 82 individual and on the 500K EA for 15 individuals. The average false-positive rate per experiment was held beneath 5%. Aberrant chromosomes were removed from the analysis. Further details are available in the Nature paper cited below. References Redon, R., Ishikawa, S., Fitch, K., Feuk, L., Perry, G., Andrews, T., Fiegler, H., Lee, C., Jones, K., Scherer, S., Hurles, M. et al. Global variation in copy number in the human genome. Nature 444(7118), 444-454 (2006). delMccarroll McCarroll Dels Deletions from Genotype Analysis (McCarroll) Variation and Repeats cnpLocke Locke CNPs Copy Number Polymorphisms from BAC Microarray Analysis (Locke) Variation and Repeats cnpIafrate2 Iafrate CNPs Copy Number Polymorphisms from BAC Microarray Analysis (Iafrate) Variation and Repeats delHinds2 Hinds Dels Deletions from Haploid Hybridization Analysis (Hinds) Variation and Repeats delConrad2 Conrad Dels Deletions from Genotype Analysis (Conrad) Variation and Repeats stsMap STS Markers STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps Mapping and Sequencing Description This track shows locations of Sequence Tagged Site (STS) markers along the draft assembly. These markers have been mapped using either genetic mapping (Genethon, Marshfield, and deCODE maps), radiation hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) or YAC mapping (the Whitehead YAC map) techniques. Since August 2001, this track no longer displays fluorescent in situ hybridization (FISH) clones, which are now displayed in a separate track. Genetic map markers are shown in blue; radiation hybrid map markers are shown in black. When a marker maps to multiple positions in the genome, it is shown in a lighter color. Methods Positions of STS markers are determined using both full sequences and primer information. Full sequences are aligned using blat, while isPCR (Jim Kent) and ePCR are used to find locations using primer information. Both sets of placements are combined to give final positions. In nearly all cases, full sequence and primer-based locations are in agreement, but in cases of disagreement, full sequence positions are used. Sequence and primer information for the markers were obtained from the primary sites for each of the maps, and from UniSTS. Using the Filter The track filter can be used to change the color or include/exclude a set of map data within the track. This is helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: In the pulldown menu, select the map whose data you would like to highlight or exclude in the display. By default, the "All Genetic" option is selected. Choose the color or display characteristic that will be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display data from the map selected in the pulldown list. If "include" is selected, the browser will display only data from the selected map. When you have finished configuring the filter, click the Submit button. Credits This track was designed and implemented by Terry Furey. Many thanks to the researchers who worked on these maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang, and Sanja Rogic for helping process the data. Additional data on the individual maps can be found at the following links: Genethon map Marshfield map deCODE map GeneMap99 GB4 and G3 maps Stanford TNG Whitehead YAC and RH maps wgEncodeSunyalbanyRnaGeneChip SUNY RBP ENCODE SUNY Albany RNA Binding Proteins by RIP-chip Regulation Description This track shows expression of target RNA binding proteins (RBPs) as measured by RNA-binding protein immunoprecipitation-microarray profiling (RIP-chip) using different RIP antibodies in multiple cell lines. The RBP Assoc RNA view shows the genomic location of transcripts associated with the array probes. Data for this track was produced as part of the Encyclopedia of DNA Elements (ENCODE) Project, In eukaryotic organisms gene regulatory networks require an additional level of coordination that links transcriptional and post-transcriptional processes. Messenger RNAs have traditionally been viewed as passive molecules in the pathway from transcription to translation. However, it is now clear that RNA-binding proteins play a major role in regulating multiple mRNAs in order to facilitate gene expression patterns. These tracks show the associated mRNAs that co-precipitate with the targeted RNA-binding proteins using RIP-Chip profiling. Display Conventions and Configuration This track is a multi-view composite track. For each view there are multiple subtracks that display individually in the browser. The subtracks within this track correspond to different antibodies/target proteins tested in different cell lines. This track is initially released with a single view: RBP Assoc RNA The RBP Assoc RNA view shows the genomic extent of the transcripts associated with the Affymetrix Exon Array probes, shaded according to score. Instructions for configuring multi-view tracks are here. Methods RBP-mRNA complexes were purified from cells grown according to the approved ENCODE cell culture protocols . The associated messages were identified using Affymetrix Human Exon 1.0 ST Arrays. Measurements of expression at gene-level were extracted using Affymetrix tools, and were further processed to generate average fold-change and p-values for immunoprecipitation. Enriched regions were scored in the range ~100 to ~1000, and interrogated regions without significant signal were scored at 1. The signal value contains the minimum log2 fold-change, the p-value contains -log10 of the maximum p-value, and the q-value was left at the default of -1. For additional methods detail, see Tenenbaum et al. 2002; Baroni et al. 2008; Penalva et al. 2004, below. Details of the RIP-chip analysis methods are available here. Credits These data were produced and analyzed by a collaboration between the Tenenbaum lab at the University at Albany-SUNY, Gen*NY*Sis Center For Excellence in Cancer Genomics and the Luiz Penalva group at the Greehey Children's Cancer Research Institute, University of Texas Health Science Center. Contact: Scott Tenenbaum References Tenenbaum SA, Lager PJ, Carson CC, Keene JD. Ribonomics: identifying mRNA subsets in mRNP complexes using antibodies to RNA-binding proteins and genomic arrays. Methods. 2002 Feb;26(2):191-8. Baroni TE, Chittur SV, George AD, Tenenbaum SA. Advances in RIP-chip analysis : RNA-binding protein immunoprecipitation-microarray profiling. Methods Mol Biol. 2008;419:93-108. Penalva LO, Tenenbaum SA, Keene JD. Gene expression analysis of messenger RNP complexes. Methods Mol Biol. 2004;257:125-34. Keene JD, Tenenbaum SA. Eukaryotic mRNPs may represent posttranscriptional operons. Mol Cell. 2002;9(6):1161-7. George AD, Tenenbaum SA. MicroRNA modulation of RNA-binding protein regulatory elements. RNA Biol. 2006;3(2):57-9. Epub 2006 Apr 1. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeSunyalbanyRnaGeneChipViewRbpAssocRna RBP Assoc RNA ENCODE SUNY Albany RNA Binding Proteins by RIP-chip Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Nov69522T7tag K562 Control T7Tag K562 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 354 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Nov69522T7tag RbpAssocRna T7 (MASMTGGQQMG) leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (T7Tag Control in K562 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sigp6246Pabpc1 K562 PABPC1 PABPC1 K562 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 353 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sigp6246Pabpc1 RbpAssocRna Poly(A) binding protein, cytoplasmic 1 (Homo sapiens). leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (PABPC1 in K562 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sc21027Igf2bp1 K562 IGF2BP1 IGF2BP1 K562 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 352 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sc21027Igf2bp1 RbpAssocRna Insulin-like growth factor 2 mRNA binding protein 1 (Homo sapiens) leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (IGF2BP1 in K562 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sc5261Elavl1 K562 ELAVL1 ELAVL1 K562 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 351 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaK562Sc5261Elavl1 RbpAssocRna (Embryonic lethal, abnormal vision, Drosophila)-like 1 (Huantigen R) (Homo sapiens) leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (ELAVL1 in K562 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Nov69522T7tag GM12878 Control T7Tag GM12878 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 350 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Nov69522T7tag RbpAssocRna T7 (MASMTGGQQMG) B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (T7Tag Control in GM12878 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Sigp6246Pabpc1 GM12878 PABPC1 PABPC1 GM12878 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 349 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Sigp6246Pabpc1 RbpAssocRna Poly(A) binding protein, cytoplasmic 1 (Homo sapiens). B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (PABPC1 in GM12878 cells) Regulation wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Sc5261Elavl1 GM12878 ELAVL1 ELAVL1 GM12878 RipChip ENCODE Nov 2008 Freeze 2008-12-02 2009-09-02 348 Tenenbaum SunyAlbany gene.core.presAllIP[DABG_10e-256].enrichedIPvTotal_minFC_maxPVal[RMA_PLIER] wgEncodeSunyalbanyRnaGeneChipRbpAssocRnaGm12878Sc5261Elavl1 RbpAssocRna (Embryonic lethal, abnormal vision, Drosophila)-like 1 (Huantigen R) (Homo sapiens) B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus RNA IP Microarray Tenenbaum Tenenbaum - SUNY at Albany Ribosome binding protein associated RNA ENCODE SUNY Albany RBP Associated RNA (ELAVL1 in GM12878 cells) Regulation switchDbTss SwitchGear TSS SwitchGear Genomics Transcription Start Sites Regulation Description This track describes the location of transcription start sites (TSS) throughout the human genome along with a confidence measure for each TSS based on experimental evidence. The TSSs of a gene are important landmarks that help define the promoter regions of a gene. These TSSs were determined by SwitchGear Genomics by integrating experimental data using an empirically derived scoring function. Each TSS has a unique identifier that associates it with a gene model (see details below), and each TSS is color-coded to reflect its confidence score. These TSSs are also available in a searchable format at SwitchDB, an open-access online database of human TSSs. Expermental tools are available through SwitchGear to study the function of the promoter regions associated with these TSSs. Methods The predicted TSSs are associated with a genome-wide set of gene models. SwitchGear gene models are defined as clusters of cDNA alignments that have overlapping exons on the same strand. These gene models were created from over 250,000 human cDNA alignments to construct a genome-wide set of ~37,000 gene models. Each gene model is identified by its chromosome number, strand, and unique identifier. For example, ID CHR7_P0362 indicates a cDNA cluster (0362) aligning to the plus strand (P) of chromosome 7 (CHR7). Existing gene annotation is mapped to the gene models through the NCBI annotation associated with Refseq accession numbers. The SwitchGear TSS prediction algorithm identifies the most likely sites of transcription initiation for each gene model. The algorithm employs a scoring metric to assign a confidence level to each TSS prediction based on existing experimental evidence. In addition to the ~250,000 human cDNAs listed in Genbank, more than 5 million additional 5' human cDNA sequence tags have been generated using a combination of approaches. While these short sequence reads do not reveal gene structure, they provide a significant amount of experimental evidence for identifying transcript start sites. For each gene model, the algorithm counts the number of TSSs (defined as the 5' end of a cDNA) within 200 bp of one another. The TSS score is based on the total number of TSSs identified within this window, with each TSS weighted according to several discriminating features: cDNA library source, relative location within the gene model, and exon structure of the transcript. Furthermore, the TSSs for each gene model are ranked to identify the TSS representing the most likely transcription initiation site for a gene model. Rankings are indicated in the TSS unique identifier by the addition of a suffix (i.e. CHR7_P0362_R1 or CHR7_P0362_R2). Using the Filter This track has a filter that can be used to change the TSS elements displayed by the browser. This filter is based on the score of the TSS element. The filter is located at the top of the track description page, which is accessed via the small button to the left of the track's graphical display or through the link on the track's control menu. By default the track displays only those TSSs with a score of 10 or above. By default, the TSSs for predicted pseudogenes are not displayed. If you would like to display them, check the box next to the Include TSSs for predicted pseudogenes label. When you have finished configuring the filter, click the Submit button. Credits This track was created by Nathan Trinklein and Shelley Force Aldred of SwitchGear Genomics. tajD Tajima's D Tajima's D (from Human May 2004 assembly) Variation and Repeats Description This track shows Tajima's D (Tajima, 1989), a measure of nucleotide diversity, estimated from the Perlegen data set (Hinds et al., 2005). Tajima's D is a statistic used to compare an observed nucleotide diversity against the expected diversity under the assumption that all polymorphisms are selectively neutral and constant population size. The track data were originally computed on the Human May 2004 assembly; their coordinates were transformed to this assembly using UCSC's liftOver program. Methods Tajima's D was estimated in 100 kbp sliding windows across the autosomal genome, reporting the Tajima's D measure at the central 10 kbp of the window and stepping by 10 kbp. Thus, the Tajima's D for the window chr1:100,001-200,000 is reported at coordinates chr1:145,001-155,000, the Tajima's D for the window chr1:110,001-210,000 is reported at coordinates chr1:155,001-165,000, and so forth. The theoretical distribution of Tajima's D (95% c.i. between -2 and +2) assumes that polymorphism ascertainment is independent of allele frequency. High values of Tajima's D suggest an excess of common variation in a region, which can be consistent with balancing selection, population contraction. Negative values of Tajima's D, on the other hand, indicate an excess of rare variation, consistent with population growth, or positive selection. Population admixture can lead to either high or low Tajima's D values in theory. Demographic parameters would be expected to affect the genome more evenly than selective pressures, so previous analyses have suggested that using the empiric distribution of Tajima's D from a collection of regions across the genome provides advantages in assessing whether selection or demography might explain an observed deviation from expectation. Because of the ascertainment bias toward common polymorphism in the Perlegen data set, positive Tajima's D values are difficult to interpret, and modeling ascertainment is difficult. However, given that the ascertainment bias raises the mean of the distribution, extreme negative values in extended regions can be useful in qualitatively identifying interesting regions for full resequencing and more rigorous theoretical analysis of nucleotide diversity. For further discussion, see Carlson et al. (2005). In full display mode, this track shows the nucleotide diversity across three human populations: 23 individuals of African American Descent (AD), 24 individuals of European Descent (ED) and 24 individuals of Chinese Descent (XD), as well as the polymorphic sites within each population used to estimate nucleotide diversity. Only SNPs observed to be polymorphic within each subpopulation were used in the Tajima's D calculation. Nucleotide diversity is shown in dense display mode using a grayscale density gradient, with light colors indicating low diversity. Credits This track was created at the University of Washington using gfetch from the Nickerson Laboratory and the R statistical software package. References Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989 Nov;123(3):585-95. Carlson CS, Thomas DJ, Eberle M, Livingston R, Rieder M, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 2005 Nov;15(11):1553-65. tajdXd Tajima's D XD Tajima's D from Chinese Descent (from Human May 2004 assembly) Variation and Repeats tajdEd Tajima's D ED Tajima's D from European Descent (from Human May 2004 assembly) Variation and Repeats tajdAd Tajima's D AD Tajima's D from African Descent (from Human May 2004 assembly) Variation and Repeats tajdSnp Tajima's D SNPs Tajima's D SNPs (from Human May 2004 assembly) Variation and Repeats Description This track shows the SNPs that were used in the calculation of Tajima's D (Tajima, 1989), a measure of nucleotide diversity, estimated from the Perlegen data set (Hinds et al., 2005). Tajima's D is a statistic used to compare an observed nucleotide diversity against the expected diversity under the assumption that all polymorphisms are selectively neutral and constant population size. The original SNP coordinates were from the Human May 2004 assembly; they were transformed to this assembly using UCSC's liftOver program. Methods See the Tajima's D track or Carlson et al. for more details on the use of this track. Credits This track was created at the University of Washington using gfetch from the Nickerson Laboratory and the R statistical software package. References Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989 Nov;123(3):585-95. Carlson CS, Thomas DJ, Eberle M, Livingston R, Rieder M, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 2005 Nov;15(11):1553-65. tajdSnpXd SNPs XD SNPs from Chinese Descent used for Tajima's D (from Human May 2004 assembly) Variation and Repeats tajdSnpEd SNPs ED SNPs from European Descent used for Tajima's D (from Human May 2004 assembly) Variation and Repeats tajdSnpAd SNPs AD SNPs from African Descent used for Tajima's D (from Human May 2004 assembly) Variation and Repeats tfbsConsSites TFBS Conserved HMR Conserved Transcription Factor Binding Sites Regulation Description This track contains the location and score of transcription factor binding sites conserved in the human/mouse/rat alignment. A binding site is considered to be conserved across the alignment if its score meets the threshold score for its binding matrix in all 3 species. The score and threshold are computed with the Transfac Matrix Database (v7.0) created by Biobase. The data are purely computational, and as such not all binding sites listed here are biologically functional binding sites. In the graphical display, each box represents one conserved putative tfbs. Clicking on a box brings up detailed information on the binding site, namely its Transfac I.D., a link to its Transfac Matrix (free registration with Transfac required), its location in the human genome (chromosome, start, end, and strand), its length in bases, its raw score, and its Z score. All binding factors that are known to bind to the particular binding matrix of the binding site are listed along with their species, SwissProt ID, and a link to that factor's page on the UCSC Protein Browser if such an entry exists. Methods The Transfac Matrix Database (v.7.0) contains position-weight matrices for 398 transcription factor binding sites, as characterized through experimental results in the scientific literature. Only binding matrices for known transcription factors in human, mouse, or rat were used for this track (258 of the 398). A typical (in this case ficticious) matrix (call it mat) will look something like: A C G T 01 15 15 15 15 N 02 20 10 15 15 N 03 0 0 60 0 G 04 60 0 0 0 A 05 0 0 0 60 T The above matrix specifies the results of 60 (the sum of each row) experiments. In the experiments, the first position of the binding site was A 15 times, C 15 times, G 15 times, and T 15 times (and so on for each position.) The consensus sequence of the above binding site as characterized by the matrix is NNGAT. The format of the consensus sequence is the deduced consensus in the IUPAC 15-letter code. In the general case, the goal is to find all matches to a matrix of length n that are conserved across ns sequences. For this example, n=5 and ns=3 (human, mouse, and rat.) Denote the multispecies alignment s, such that sji is the nucleotide at position j of species i. Also, define an ns x 4 background matrix (call it back) giving the background frequencies of each nucleotide in each species. A sliding window (of length n) calculates the "species score" for each species at each position: From this, a log-odds score is calculated for each species (normalizing by the length of the matrix and the number of species in the alignment): These scores are then summed for all species, yielding a final log-odds score for the current position: Note that the log-odds score of each species must exceed the threshold for that species. The threshold is calculated for each species such that the only hits that will be reported will have a Z score (to be discussed later) of 1.64 or higher in each species (corresponding to a p-value of 0.05). Next, the maximum and minimum possible log-odds scores are computed and summed across all species for the given binding matrix: These are then used to normalize the final, raw log-odds score so that its range is between 0 and 1: Next, the best raw score for each binding matrix is calculated for the 5,000 base upstream region of each human RefSeq gene (taken from the RefGene table for hg18.) The mean and standard deviation for each binding matrix are then calculated across all RefSeq genes. These are then used to create the threshold for each binding matrix, namely, 1.64 standard deviations above the mean. Tfloc is then run with this threshold on each chromosome for the 3-way multiz alignments. Finally, a Z score is calculated for each binding site hit h to matrix m according to the following formula: This final Z score can be interpreted as the number of standard deviations above the mean raw score for that binding matrix across the upstream regions of all RefSeq genes. The default Z score cutoff for display in the browser is 2.33 (corresponding to a p-value of 0.01.) This cutoff can be adjusted at the top of this page. After all hits have been recorded genome-wide, one final filtering step is performed. Due to the inherant redundancy of the Transfac database, several binding sites that all bind the same factor often appear together. For example, consider the following binding sites: 585 chr1 4021 4042 V$MEF2_02 875 - 2.83 585 chr1 4021 4042 V$MEF2_03 917 - 3.38 585 chr1 4021 4042 V$MEF2_04 844 - 3.45 585 chr1 4022 4037 V$HMEF2_Q6 810 - 2.34 585 chr1 4022 4037 V$MEF2_01 802 - 2.47 585 chr1 4022 4038 V$RSRFC4_Q2 875 - 2.65 585 chr1 4022 4039 V$AMEF2_Q6 823 - 2.44 585 chr1 4023 4038 V$RSRFC4_01 878 + 2.53 585 chr1 4024 4035 V$MEF2_Q6_01 913 + 2.41 585 chr1 4024 4039 V$MMEF2_Q6 861 - 2.39 These 10 overlapping binding sites bind a total of 19 factors. However, of these 19 factors, only 7 of them are unique. Many of the above binding sites are redundant (they add no additional factors). In fact, the first 3 binding sites all bind the same two factors (namely, aMEF-2 and MEF-2A). These ten binding sites can therefore be filtered down to the following four binding sites, without any loss of information (in terms of transcription factors). The final table entry then has the following four lines, since these four binding sites account for all 7 of the unique factors: 585 chr1 4021 4042 V$MEF2_04 844 - 3.45 585 chr1 4022 4038 V$RSRFC4_Q2 875 - 2.65 585 chr1 4024 4035 V$MEF2_Q6_01 913 + 2.41 585 chr1 4024 4039 V$MMEF2_Q6 861 - 2.39 In the event that multiple binding sites bind the same factors, the site with the highest Z score is chosen. Only binding sites which overlap each other and whose start positions are within 5 bases of each other are considered for merging. It should be noted that the positions of many of these conserved binding sites coincide with known exons and other highly conserved regions. Regions such as these are more likely to contain false positive matches, as the high sequence identity across the alignment increases the likelihood of a short motif that looks like a binding site to be conserved. Conversely, matches found in introns and intergenic regions are more likely to be real binding sites, since these regions are mostly poorly conserved. These data were obtained by running the program tfloc (Transcription Factor binding site LOCater) on multiz alignments of the February 2006 (mm8) mouse genome assembly and the November 2004 rat assembly (rn4) to the March 2006 human genome assembly (hg18.) Transcription factor information was culled from the Transfac Factor database, version 7.0. Table Format The format of the tfbsConsSites sql table is shown above. The columns are (from left to right): bin, chromosome, from, to, binding matrix ID, raw score, strand, and Z score. To get the corresponding transcription factor information for a given binding matrix, use the table tfbsConsFactors. The format of the tfbsConsFactors sql table is: V$MYOD_01 M00001 mouse MyoD P10085 V$E47_01 M00002 human E47 N V$CMYB_01 M00004 mouse c-Myb P06876 V$AP4_01 M00005 human AP-4 Q01664 V$MEF2_01 M00006 mouse aMEF-2 Q60929 V$MEF2_01 M00006 rat MEF-2 N V$MEF2_01 M00006 human MEF-2A Q02078 V$ELK1_01 M00007 human Elk-1 P19419 V$SP1_01 M00008 human Sp1 P08047 V$EVI1_06 M00011 mouse Evi-1 P14404 The columns are (from left to right): transfac binding matrix id, transfac binding matrix accession number, transcription factor species, transcription factor name, SwissProt accesssion number. When no factor species, name, or id information exists in the transfac factor database for a binding matrix, an 'N' appears in the corresponding column(s). Notice also that if more than one transcription factor is known for one binding matrix, each occurs on its own line, so multiple lines can exist for one binding matrix. Credits These data were generated using the Transfac Matrix and Factor databases created by Biobase. The tfloc program was developed at The Pennsylvania State University (with numerous updates done at UCSC) by Matt Weirauch. This track was created by Matt Weirauch and Brian Raney at The University of California at Santa Cruz. Track last updated July 17, 2007 with 624,398 additional entries. tRNAs tRNA Genes Transfer RNA Genes Identified with tRNAscan-SE Genes and Gene Predictions Description This track displays tRNA genes predicted by using tRNAscan-SE v.1.23. tRNAscan-SE is an integrated program that uses tRNAscan (Fichant) and an A/B box motif detection algorithm (Pavesi) as pre-filters to obtain an initial list of tRNA candidates. The program then filters these candidates with a covariance model-based search program COVE (Eddy) to obtain a highly specific set of primary sequence and secondary structure predictions that represent 99-100% of true tRNAs with a false positive rate of fewer than 1 per 15 gigabases. Detailed tRNA annotations for eukaryotes, bacteria, and archaea are available at Genomic tRNA Database (GtRNAdb). What does the tRNAscan-SE score mean? Anything with a score above 20 bits is likely to be derived from a tRNA, although this does not indicate whether the tRNA gene still encodes a functional tRNA molecule (i.e. tRNA-derived SINES probably do not function in the ribosome in translation). Vertebrate tRNAs with scores of >60.0 (bits) are likely to encode functional tRNA genes, and those with scores below ~45 have sequence or structural features that indicate they probably are no longer involved in translation. tRNAs with scores between 45-60 bits are in the "grey" zone, and may or may not have all the required features to be functional. In these cases, tRNAs should be inspected carefully for loss of specific primary or secondary structure features (usually in alignments with other genes of the same isotype), in order to make a better educated guess. These rough score range guides are not exact, nor are they based on specific biochemical studies of atypical tRNA features, so please treat them accordingly. Please note that tRNA genes marked as "Pseudo" are low scoring predictions that are mostly pseudogenes or tRNA-derived elements. These genes do not usually fold into a typical cloverleaf tRNA secondary structure and the provided images of the predicted secondary structures may appear rotated. Credits Both tRNAscan-SE and GtRNAdb are maintained by the Lowe Lab at UCSC. Cove-predicted tRNA secondary structures were rendered by NAVIEW (c) 1988 Robert E. Bruccoleri. References When making use of these data, please cite the following articles: Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009 Jan;37(Database issue):D93-7. PMID: 18984615; PMC: PMC2686519 Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994 Jun 11;22(11):2079-88. PMID: 8029015; PMC: PMC308124 Fichant GA, Burks C. Identifying potential tRNA genes in genomic DNA sequences. J Mol Biol. 1991 Aug 5;220(3):659-71. PMID: 1870126 Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997 Mar 1;25(5):955-64. PMID: 9023104; PMC: PMC146525 Pavesi A, Conterio F, Bolchi A, Dieci G, Ottonello S. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res. 1994 Apr 11;22(7):1247-56. PMID: 8165140; PMC: PMC523650 targetScanS TS miRNA sites TargetScan miRNA Regulatory Sites Regulation Description This track shows conserved mammalian microRNA regulatory target sites for conserved microRNA families in the 3' UTR regions of Refseq Genes, as predicted by TargetScanS. Method Putative miRNA binding sites in UTRs were identified using seven-nucleotide seed regions from all known miRNA families conserved among human, mouse, rat, dog and sometimes chicken. Using all human RefSeq transcripts and CDS annotation from NCBI, aligned vertebrate 3' UTRs were extracted from multiz alignments and masked for overlap with protein-coding sequences. These 3' UTRs were scanned to identify conserved matches to the miRNA seed region, as in Lewis et al. (2005). These sites were then assigned a percentile rank (0 to 100) based on their context score (Grimson et al., 2007). For further details of the methods used to generate this annotation, see the references and the TargetScan website. Credit Thanks to George Bell of Bioinformatics and Research Computing at the Whitehead Institute for providing this annotation, which was generated in collaboration with the labs of David Bartel and Chris Burge. Additional information on microRNA target prediction is available on the TargetScan website. References Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Molecular Cell. 2007 Jul 6;27(1):91-105. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005 Jan 14;120(1):15-20. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of mammalian microRNA targets. Cell. 2003 Dec 26;115(7):787-98. knownAlt UCSC Alt Events Alternative Splicing, Alternative Promoter and Similar Events in UCSC Genes Genes and Gene Predictions Description This track shows various types of alternative splicing and other events that result in more than a single transcript from the same gene. The label by an item describes the type of event. The events are: Alternate Promoter (altPromoter) - Transcription starts at multiple places. The altPromoter extends from 100 bases before to 50 bases after transcription start. Alternate Finish Site (altFinish) - Transcription ends at multiple places. Cassette Exon (cassetteExon) - Exon is present in some transcripts but not others. These are found by looking for exons that overlap an intron in the same transcript. Retained Intron (retainedIntron) - Introns are spliced out in some transcripts but not others. In some cases, particularly when the intron is near the 3' end, this can reflect an incompletely processed transcript rather than a true alt-splicing event. Overlapping Exon (bleedingExon) - Initial or terminal exons overlap in an intron in another transcript. These often are associated with incompletely processed transcripts. Alternate 3' End (altThreePrime) - Variations on the 3' end of an intron. Alternate 5' End (altFivePrime) - Variations on the 5' end of an intron. Intron Ends have AT/AC (atacIntron) - An intron with AT/AC ends rather than the usual GT/AG. These are associated with the minor spliceosome. Strange Intron Ends (strangeSplice) - An intron with ends that are not GT/AG, GC/AG, or AT/AC. These are usually artifacts of some sort due to sequencing error or polymorphism. Credits This track is based on an analysis by the txgAnalyse program of splicing graphs produced by the txGraph program. Both of these programs were written by Jim Kent at UCSC. ucsfBrainMethyl UCSF Brain Methyl UCSF Brain DNA Methylation Regulation Description Genome wide methylation (MeDIP-seq and MRE-seq), histone H3 lysine 4 trimethylation (H3K4me3) and gene expression (RNA-seq and RNA-seq (SMART)) data were generated from postmortem human frontal cortex gray matter of a 57 year-old male. This was done to investigate the role that intragenic, tissue-specific CpG island methylation plays in controlling gene expression (Maunakea, et al. 2010). Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: Raw Singal Density graph (wiggle) of signal enrichment. CpG score DNA methylation score on CpG sites. Methods DNA, RNA and native chromatin were extracted using standard methods; assay specific methods are described below. MRE-seq, MeDIP-seq, H3K4me3 ChIP-seq, RNA-seq and RNA-seq (SMART) libraries were sequenced using an Illumina Genome Analyzer II. Sequencing reads are available through the NCBI SRA (study accession number SRP002318). MeDIP-seq (Methylated DNA immunoprecipitation and sequencing) MeDIP-seq uses immunoprecipitation to extract the methylated fraction of the genome. Purified DNA was first sheared and processed following the Illumina Genomic DNA Library Kit protocol. These DNA fragments were then immunoprecipitated using an antibody raised against 5-methylcytosine, the methylated form of cytosine, before constructing a library, which was sequenced and mapped to the genome. MRE-seq (Methyl-sensitive restriction enzyme digest and sequencing) MRE-seq identifies unmethylated CpG sites by sequencing size-selected fragments from parallel DNA digestions with the MREs HpaII, Hin6I, and AciI. Since these enzymes require unmethylated CpG sites within their recognition sequences to cut DNA, identifying the end of each fragment generated allows inference of a single unmethylated cytosine. The 3 digests were combined and size-selected by gel electrophoresis to enrich for unmethylated CpG sites in close proximity. A library was constructed and sequenced; the sequence reads were then mapped to the genome with the additional requirement that they map to a known MRE recognition site. H3K4me3 ChIP-seq (Histone H3 lysine 4 trimethylation chromatin immunoprecipitation and sequencing) Chromatin immunoprecipitation was performed to enrich for histone H3 modified at lysine position 4 with trimethylation (H3K4me3), as this histone modification is associated with promoters. A ChIP-seq library was constructed as described in Robertson, et al. 2007, sequenced and mapped to the genome. RNA-seq and RNA-seq SMART (RNA sequencing and SMART-tagged RNA sequencing) Sheared RNA was used to synthesize full-length single-stranded cDNAs as described by Morin, et al. 2008. A library was constructed and sequenced, and sequence reads are mapped to the genome. The 5' end of transcripts were tagged with a sequence tag, called a "SMART tag", while making cDNA library for sequencing. SMART tagged reads were used to infer transcription initiation, while all reads together are used to infer gene expression level. Verification MeDIP-seq Each post-amplification library was QC'd for quantity, quality and size distribution by spectrophotometry and Agilent DNA Bioanalyzer analysis. Four independent PCR reactions were performed to confirm enrichment for methylated and de-enrichment for unmethylated sequences, compared to input sonicated DNA. Visual inspection of extended coverage browser tracks confirmed expectations: lack of MeDIP signal in most 5' CpG island promoters and in regions devoid of CpG sites, as well as high MeDIP signal at known methylated sites (i.e. some imprinted regions). MRE-seq Each post-amplification library was QC'd for quantity, quality and size distribution by Nanodrop spectrophotometry and Agilent DNA Bioanalyzer analysis. Prior to high-throughput sequencing, a portion of each library was cloned into a sequencing vector and ~24 individual clones were Sanger sequenced to confirm the presence of MRE sites at the ends of each insert. Illumina sequencing reads were filtered to only include those that map to MRE sites in the reference. MRE reads occured frequently in 5' CpG islands, which are often unmethylated and are enriched for the MRE recognition sequences relative to rest of the genome. H3K4me3 ChIP-seq Each post-amplification library was examined for quantity, quality and size distribution by Nanodrop spectrophotometry, Qubit fluoremetry and Agilent DNA Bioanalyzer. Fold H3K4me3 enrichment was confirmed by comparison to non-specific rabbit IgG enrichment. Visual inspection of the browser track of called peaks confirmed enrichment at a subset of annotated promoters. RNA-seq and RNA-seq (SMART) Each post-amplification library was examined for quantity, quality and size distribution by Nanodrop spectrophotometry, Qubit fluoremetry and Agilent DNA Bioanalyzer. Visual inspection of the browser track of extended reads confirmed enrichment at annotated exons and UTRs. SMART-tagged reads were enriched at known promoters, as expected. Credits UCSF: Joseph Costello, Raman Nagarajan, Shaun Fouse, Brett Johnson, Chibo Hong, Ksenya Shchors, Vivi M. Heine, David H. Rowitch Genome Sciences Centre, BC Cancer Agency: Mikhail Bilenky, Cletus D'Souza, Cydney Nielsen, Yongjun Zhao, Allen Delaney, Richard Varhol, Nina Thiessen, Steven S.J. Jones, Marco A. Marra, Martin Hirst Washington University, St. Louis, MO: Ting Wang, Xiaoyun Xing, Chris Fiore, Maximiliaan Schillebeeckx UCSC: Tracy J. Ballinger, David Haussler McGill: Gustavo Turecki References Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D'Souza C, Fouse SD, Johnson BE, Hong C, Nielsen C, Zhao Y et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature. 2010 Jul 8;466(7303):253-7. PMID: 20613842 Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques. 2008 Jul;45(1):81-94. PMID: 18611170 Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007 Aug;4(8):651-7. PMID: 17558387 ucsfBrainMethylViewCOV Raw Signal UCSF Brain DNA Methylation Regulation ucsfRnaSeqBrainSmartCoverage Smart RawSignal RNA-seq Smart-Tagged Raw Signal Regulation ucsfRnaSeqBrainAllCoverage RNA-seq RawSignal RNA-seq Raw Signal Regulation ucsfMedipSeqBrainCoverage MeDIP RawSignal MeDIP-seq Raw Signal Regulation ucsfChipSeqH3K4me3BrainCoverage H3K4me3 RawSignal H3K4me3 ChIP-seq Raw Signal Regulation ucsfBrainMethylViewCG CpG score UCSF Brain DNA Methylation Regulation ucsfMedipSeqBrainCpG MeDIP CpG MeDIP-seq CpG Score Regulation ucsfMreSeqBrainCpG MRE CpG MRE-seq CpG Score Regulation uniGene_3 UniGene UniGene Alignments mRNA and EST Description This track shows the UniGene genes from NCBI. Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location. Coding exons are represented by blocks connected by horizontal lines representing introns. In full display mode, arrowheads on the connecting intron lines indicate the direction of transcription. Methods The UniGene sequence file, Hs.seq.uniq.gz, is downloaded from NCBI. Sequences are aligned to base genome using BLAT to create this track. When a single UniGene gene aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.2% of the best and at least 96.5% base identity with the genomic sequence were kept. Credits Thanks to UniGene for providing this annotation. uppsalaChipSignal UU ChIP Signal Uppsala University ChIP-chip Signal Regulation Description This track displays genome wide localization of two transcription factors (USF1 and USF2) and acetylated histone H3 (H3ac) in a liver cell line (HepG2). ChIP was performed on three biological replicates and the samples were hybridized to Affymetrix arrays covering the human genome at an average resolution of 35 base pairs. This track shows average probe level intensities for each factor. The companion, Uppsala ChIP Sites track displays identified positive regions for each factor. The raw data for this track is available at EBI ArrayExpress , as experiment E-TABM-314. Methods Chromatin immunoprecipitation was performed as previously described (Rada-Iglesias et al. 2005), but sonication conditions were optimized to obtain smaller fragments (approximately 300 bp), to further improve the resolution of our experiments. The antibodies against USF1 (H-86, sc-8983) and USF2 (C-20, sc-862) were from Santa Cruz biotechnology; anti-Histone H3 acetyl K9/14 (06-599) and normal rabbit IgG (12-370) were purchased from Upstate. Three completely independent biological replicates were performed for each antibody, obtaining the corresponding input as total genomic DNA reference. Hybridizations were performed using Affymetrix GeneChip Human Tiling 2.0R Array set (7 arrays set). Array data files for USF1, USF2, H3ac and IgG were normalized against corresponding Input arrays using Affymetrix Tiling Array Software (TAS) two-group normalization. An empirical Bayes algorithm was used to ensure that identified positive regions for USF1, USF2 and H3ac show low enrichment of IgG. Verification 54 regions were selected to be analyzed by qPCR using new ChIP DNA obtained using USF1 (C20 and H86) and USF2 antibodies. Of those 54 regions, 6 were negative for both USF proteins and were used to set background/cut-off thresholds. For the remaining 48 regions all but one was positive both for USF1 and USF2 (with that unique negative region being different for USF1 and USF2). Furthermore, there was a high correlation (R=0,79-0,85) between the enrichments levels determined microarray hybridizations and qPCR. Credits These experiments were performed in the Claes Wadelius lab Dept. of Genetics and Pathology, Uppsala University, Sweden. Microarray hybridizations were performed at Affymetrix Inc., Santa Clara, USA. Data processing and statistical analysis was done at The Linnaeus Centre for Bioinformatics, Uppsala University, Sweden. References Rada-Iglesias A, Ameur A, Kapranov P, Enroth S, Komorowski J, Gingeras TR, Wadelius C. Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res. 2008 Mar;18(3):380-92. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et. al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. uppsalaChipSuper Uppsala ChIP Uppsala University ChIP-chip Regulation Overview This super-track combines related tracks of genome-wide ChIP-chip data generated by the Wadelius lab at Uppsala University, Sweden. These tracks display localization of two transcription factors (USF1 and USF2) and acetylated histone H3 (H3ac) in a liver cell line (HepG2), as assayed by Affymetrix arrays tiled at 35 bp resolution. For each factor, the average probe level intensities as well as identified positive regions are presented. The raw data for these tracks is available at EBI ArrayExpress, as experiment E-TABM-314. Credits These experiments were performed in the Claes Wadelius lab Dept. of Genetics and Pathology, Uppsala University, Sweden. Microarray hybridizations were performed at Affymetrix Inc., Santa Clara, USA. Data processing and statistical analysis was done at The Linnaeus Centre for Bioinformatics, Uppsala University, Sweden. References Rada-Iglesias A, Ameur A, Kapranov P, Enroth S, Komorowski J, Gingeras TR, Wadelius C. Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res. 2008 Mar;18(3):380-92. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et. al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. uppsalaChipUsf2Signal UU Usf2 Signal Uppsala University ChIP-chip Signal (Usf2) Regulation uppsalaChipUsf1Signal UU Usf1 Signal Uppsala University ChIP-chip Signal (Usf1) Regulation uppsalaChipH3acSignal UU H3ac Signal Uppsala University ChIP-chip Signal (H3ac) Regulation uppsalaChipSites UU ChIP Sites Uppsala University ChIP-chip Sites Regulation Description This track displays genome wide localization of two transcription factors (USF1 and USF2) and acetylated histone H3 (H3ac) in a liver cell line (HepG2). ChIP was performed on three biological replicates and the samples were hybridized to Affymetrix arrays covering the human genome at an average resolution of 35 base pairs. In this track, identified positive regions are presented for each factor. The companion track, Uppsala ChIP Signal, shows average probe level intensities for each factor. The raw data for this track is available at EBI ArrayExpress , as experiment E-TABM-314. Methods Chromatin immunoprecipitation was performed as previously described (Rada-Iglesias et al. 2005), but sonication conditions were optimized to obtain smaller fragments (approximately 300 bp), to further improve the resolution of our experiments. The antibodies against USF1 (H-86, sc-8983) and USF2 (C-20, sc-862) were from Santa Cruz biotechnology; anti-Histone H3 acetyl K9/14 (06-599) and normal rabbit IgG (12-370) were purchased from Upstate. Three completely independent biological replicates were performed for each antibody, obtaining the corresponding input as total genomic DNA reference. Hybridizations were performed using Affymetrix GeneChip Human Tiling 2.0R Array set (7 arrays set). Array data files for USF1, USF2, H3ac and IgG were normalized against corresponding Input arrays using Affymetrix Tiling Array Software (TAS) two-group normalization. An empirical Bayes algorithm was used to ensure that identified positive regions for USF1, USF2 and H3ac show low enrichment of IgG. Verification 54 regions were selected to be analyzed by qPCR using new ChIP DNA obtained using USF1 (C20 and H86) and USF2 antibodies. Of those 54 regions, 6 were negative for both USF proteins and were used to set background/cut-off thresholds. For the remaining 48 regions all but one was positive both for USF1 and USF2 (with that unique negative region being different for USF1 and USF2). Furthermore, there was a high correlation (R=0,79-0,85) between the enrichments levels determined microarray hybridizations and qPCR. Credits These experiments were performed in the Claes Wadelius lab Dept. of Genetics and Pathology, Uppsala University, Sweden. Microarray hybridizations were performed at Affymetrix Inc., Santa Clara, USA. Data processing and statistical analysis was done at The Linnaeus Centre for Bioinformatics, Uppsala University, Sweden. References Rada-Iglesias A, Ameur A, Kapranov P, Enroth S, Komorowski J, Gingeras TR, Wadelius C. Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders. Genome Res. 2008 Mar;18(3):380-92. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et. al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. uppsalaChipUsf2Sites UU Usf2 Sites Uppsala University ChIP-chip Sites (Usf2) Regulation uppsalaChipUsf1Sites UU Usf1 Sites Uppsala University ChIP-chip Sites (Usf1) Regulation uppsalaChipH3acSites UU H3ac Sites Uppsala University ChIP-chip Sites (H3ac) Regulation wgEncodeUwAffyExonArray UW Affy Exon ENCODE UW Affy All-Exon Arrays Expression Description This track displays human tissue microarray data using Affymetrix Human Exon 1.0 GeneChip. The track was produced as part of the ENCODE Project. Release Notes This is release 2 of this track. This release includes 28 new cell types, and replaces the data for four existing tables (replicate 1 for K562, NB4, and SKMC; replicate 2 for HeLa-S3). Display Conventions and Configuration The display for this track shows probe location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks. Items with score of 1000 are in the highest 10% quantile for signal value of that particular cell type. Similarly, items scoring 900 are the next 10% quantile and at the bottom of scale, items scoring 100 are in the lowest 10% quantile for signal value. The subtracks within this composite annotation track correspond to data from different cell types and tissues. The configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For information regarding specific microarray probes, turn on the Affy Exon Probes track, which can be found inside the Affy Exon supertrack in the "Expression" track group. Methods Cells were grown according to the approved ENCODE cell culture protocols. A subset of the cells was stored frozen in RNALater. Total RNA was labeled and hybridized to Affymetrix Human Exon 1.0 ST arrays. Exon and gene level expression analysis were carried out using Affymetrix ExACT 1.2.1 and Affymetrix Expression Console 1.1 software tools. Samples were quantile normalized for background correction and Probe Logarithmic Intensity Error summarized. More detailed methods are here. Verification Data were verified by sequencing biological replicates displaying correlation coefficient >0.9. Credits These data were generated by the University of Washington ENCODE group. Contact: Richard Sandstrom Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeUwAffyExonArraySimpleSignalRep1Th1 Th1 1 Th1 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 364 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Th1 primary Th1 T cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in Th1 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Sknshra SK-N-SH_RA 2 SK-N-SH_RA AffyExonArray ENCODE July 2009 Freeze 2009-06-29 2010-03-29 358 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Sknshra neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 2 (in SK-N-SH_RA cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Sknshra SK-N-SH_RA 1 SK-N-SH_RA AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 358 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Sknshra neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in SK-N-SH_RA cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Skmc SKMC 2 SKMC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 363 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Skmc SimpleSignal skeletal muscle cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in SKMC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1SkmcV2 SKMC 1 SKMC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2009-07-02 2010-04-02 363 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1SkmcV2 SimpleSignal skeletal muscle cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in SKMC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Saec SAEC 2 SAEC AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 374 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Saec SimpleSignal small airway epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in SAEC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Saec SAEC 1 SAEC AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 374 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Saec SimpleSignal small airway epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in SAEC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Panc1 PANC-1 2 PANC-1 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 373 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Panc1 SimpleSignal pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in PANC-1 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Panc1 PANC-1 1 PANC-1 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 373 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Panc1 SimpleSignal pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in PANC-1 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Nhlf NHLF 2 NHLF AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 391 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Nhlf SimpleSignal lung fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in NHLF cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Nhlf NHLF 1 NHLF AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 391 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Nhlf SimpleSignal lung fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in NHLF cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Nhek NHEK 1 NHEK AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 372 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Nhek SimpleSignal epidermal keratinocytes Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in NHEK cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Nhdfneo NHDF-neo 2 NHDF-neo AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 390 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Nhdfneo SimpleSignal neonatal dermal fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in NHDF-neo cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Nhdfneo NHDF-neo 1 NHDF-neo AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 390 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Nhdfneo SimpleSignal neonatal dermal fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in NHDF-neo cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Nb4 NB4 2 NB4 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 371 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Nb4 SimpleSignal acute promyelocytic leukemia cell line. (PMID: 1995093) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in NB4 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Nb4V2 NB4 1 NB4 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2009-09-21 2010-06-21 371 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Nb4V2 SimpleSignal acute promyelocytic leukemia cell line. (PMID: 1995093) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in NB4 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Jurkat Jurkat 2 Jurkat AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 369 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Jurkat SimpleSignal T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in Jurkat cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Jurkat Jurkat 1 Jurkat AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 369 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Jurkat SimpleSignal T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in Jurkat cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hrpe HRPEpiC 2 HRPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 389 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hrpe SimpleSignal retinal pigment epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HRPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hrpe HRPEpiC 1 HRPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 389 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hrpe SimpleSignal retinal pigment epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HRPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hre HRE 2 HRE AffyExonArray ENCODE July 2009 Freeze 2009-06-29 2010-03-29 356 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hre renal epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 2 (in HRE cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hre HRE 1 HRE AffyExonArray ENCODE July 2009 Freeze 2009-06-29 2010-03-29 356 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hre renal epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HRE cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hrcepic HRCEpiC 2 HRCEpiC AffyExonArray ENCODE July 2009 Freeze 2009-06-29 2010-03-29 355 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hrcepic renal cortical epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 2 (in HRCEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hrcepic HRCEpiC 1 HRCEpiC AffyExonArray ENCODE July 2009 Freeze 2009-06-29 2010-03-29 355 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hrcepic renal cortical epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HRCEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hnpce HNPCEpiC 2 HNPCEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 388 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hnpce SimpleSignal non-pigment ciliary epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HNPCEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hnpce HNPCEpiC 1 HNPCEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 388 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hnpce SimpleSignal non-pigment ciliary epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HNPCEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hmec HMEC 1 HMEC AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 367 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hmec SimpleSignal mammary epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HMEC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hipe HIPEpiC 2 HIPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 387 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hipe SimpleSignal iris pigment epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HIPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hipe HIPEpiC 1 HIPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 387 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hipe SimpleSignal iris pigment epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HIPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hee HEEpiC 2 HEEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 386 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hee SimpleSignal esophageal epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HEEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hee HEEpiC 1 HEEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 386 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hee SimpleSignal esophageal epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HEEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hcpe HCPEpiC 2 HCPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 385 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hcpe SimpleSignal choroid plexus epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HCPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hcpe HCPEpiC 1 HCPEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 385 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hcpe SimpleSignal choroid plexus epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HCPEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hcm HCM 2 HCM AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 384 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hcm SimpleSignal cardiac myocytes Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HCM cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hcm HCM 1 HCM AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 384 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hcm SimpleSignal cardiac myocytes Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HCM cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hcf HCF 2 HCF AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 383 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hcf SimpleSignal cardiac fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HCF cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hcf HCF 1 HCF AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 383 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hcf SimpleSignal cardiac fibroblasts Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HCF cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hae HAEpiC 2 HAEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 382 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hae SimpleSignal amniotic epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in HAEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hae HAEpiC 1 HAEpiC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 382 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hae SimpleSignal amniotic epithelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in HAEpiC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1H7es H7-hESC 1 H7-hESC AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 381 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1H7es SimpleSignal undifferentiated embryonic stem cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in H7-hESC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Gm06990 GM06990 2 GM06990 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 360 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Gm06990 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 2 (in GM06990 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Gm06990 GM06990 1 GM06990 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 360 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Gm06990 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in GM06990 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Cmk CMK 1 CMK AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 380 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Cmk SimpleSignal acute megakaryocytic leukemia cells, "established from the peripheral blood of a 10-month-old boy with Down's syndrome and acute megakaryocytic leukemia (AML M7) at relapse in 1985" - DSMZ. (PMID: 3016165) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in CMK cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Caco2 Caco-2 2 Caco-2 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 359 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Caco2 SimpleSignal colorectal adenocarcinoma. (PMID: 1939345) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in Caco-2 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Caco2 Caco-2 1 Caco-2 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 359 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Caco2 colorectal adenocarcinoma. (PMID: 1939345) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in Caco-2 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Bjtert BJ 2 BJ AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 365 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Bjtert SimpleSignal skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in BJ cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Bjtert BJ 1 BJ AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 365 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Bjtert SimpleSignal skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in BJ cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Ag10803 AG10803 2 AG10803 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 379 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Ag10803 SimpleSignal abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in AG10803 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Ag10803 AG10803 1 AG10803 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 379 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Ag10803 SimpleSignal abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in AG10803 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Ag09319 AG09319 2 AG09319 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 378 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Ag09319 SimpleSignal gum tissue fibroblasts from apparently heathly 24 year old Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in AG09319 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Ag09319 AG09319 1 AG09319 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 378 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Ag09319 SimpleSignal gum tissue fibroblasts from apparently heathly 24 year old Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in AG09319 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Ag09309 AG09309 2 AG09309 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 377 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Ag09309 SimpleSignal adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in AG09309 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Ag09309 AG09309 1 AG09309 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 377 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Ag09309 SimpleSignal adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in AG09309 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Ag04450 AG04450 2 AG04450 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 376 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Ag04450 SimpleSignal fetal lung fibroblast Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in AG04450 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Ag04450 AG04450 1 AG04450 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 376 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Ag04450 SimpleSignal fetal lung fibroblast Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in AG04450 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Ag04449 AG04449 2 AG04449 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 375 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Ag04449 SimpleSignal fetal buttock/thigh fibroblast Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in AG04449 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Ag04449 AG04449 1 AG04449 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 375 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Ag04449 SimpleSignal fetal buttock/thigh fibroblast Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 1 (in AG04449 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Mcf7 MCF-7 2 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2010-01-10 2010-10-10 370 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Mcf7 SimpleSignal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All Exon Array Signal Replicate 2 (in MCF-7 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Mcf7 MCF-7 1 MCF-7 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 370 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Mcf7 SimpleSignal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Huvec HUVEC 1 HUVEC AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 368 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Huvec SimpleSignal umbilical vein endothelial cells Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HUVEC cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Hepg2 HepG2 2 HepG2 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 361 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Hepg2 hepatocellular carcinoma Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 2 (in HepG2 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Hepg2 HepG2 1 HepG2 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 361 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Hepg2 hepatocellular carcinoma Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HepG2 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2Helas3V2 HeLa-S3 2 HeLa-S3 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2009-06-29 2010-03-29 357 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2Helas3V2 SimpleSignal cervical carcinoma Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Helas3 HeLa-S3 1 HeLa-S3 AffyExonArray ENCODE July 2009 Freeze 2009-07-02 2010-04-02 357 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Helas3 cervical carcinoma Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington ENCODE UW Affy All-Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep2K562 K562 2 K562 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 362 Stam UW 2 wgEncodeUwAffyExonArraySimpleSignalRep2K562 SimpleSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 2 (in K562 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1K562V2 K562 1 K562 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2009-07-02 2010-04-02 362 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1K562V2 SimpleSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in K562 cells) Expression wgEncodeUwAffyExonArraySimpleSignalRep1Gm12878 GM12878 1 GM12878 AffyExonArray ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 366 Stam UW 1 wgEncodeUwAffyExonArraySimpleSignalRep1Gm12878 SimpleSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Stamatoyannopoulous Stamatoyannopoulous - University of Washington Simple Signal ENCODE UW Affy All-Exon Array Signal Replicate 1 (in GM12878 cells) Expression wgEncodeUwDGF UW DNase DGF ENCODE Univ. Washington Digital DNase Genomic Footprinting Regulation Description This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind specific DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. For each cell type, this track contains the following views: HotSpots DNaseI hypersensitive zones identified using the HotSpot algorithm. Peaks DNaseI hypersensitive sites (DHSs) identified as signal peaks within FDR 0.5% hypersensitive zones. Footprint While the HotSpots algorithm identifies genomic regions of elevated tag counts relative to random expectation, the FP-DETECTOR algorithm (developed in our center) searches for smaller locations where proteins are most likely bound within the identified HotSpot regions. --> Signal The density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). Raw Signal Density graph (wiggle) of signal enrichment based on aligned read density. DNaseI sensitivity is shown as the absolute density of in vivo cleavage sites across the genome mapped using the Digital DNaseI methodology (see below). Data have been normalized to 25 million reads per cell type. --> Methods Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the genome; only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 0.5% (FDR 0.005) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 0.5% (FDR 0.005) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags. FP-DETECTOR algorithm (developed by the UW ENCODE group) searches for smaller locations where proteins are most likely bound within the identified HotSpot regions. The reported regions marked as footprints are disjoint, 6 to 40 base-pairs in length, and show a marked depletion of tag counts relative to their respective, local backgrounds. --> Verification Results were validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting methods. Release Notes This is the initial release of this track. Credits These data were generated by the UW ENCODE group. Contact: Richard Sandstrom References Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays Nat Methods. 2006 Jul;3(7):511-8. Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M, Shafer A, Kawamoto J, Hall R, Mack J, Dorschner M et al. Discovery of functional noncoding elements by digital analysis of chromatin structure. Proc Natl Acad Sci U S A. 2004 Nov 30;101(48):16837-42. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeUwDGFViewSignal Signal ENCODE Univ. Washington Digital DNase Genomic Footprinting Regulation wgEncodeUwDGFSignalTh1 Th1 Sig Th1 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-05 2010-07-05 478 Stam UW wgEncodeUwDGFSignalTh1 Signal primary Th1 T cells DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Signal ENCODE UW Digital Genomic Footprinting - Per-base Signal (in Th1 cells) Regulation wgEncodeUwDGFSignalSknshra SK-N-SH_RA Sig SK-N-SH_RA DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 477 Stam UW wgEncodeUwDGFSignalSknshra Signal neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Signal ENCODE UW Digital Genomic Footprinting - Per-base Signal (SK-N-SH_RA cells) Regulation wgEncodeUwDGFSignalK562 K562 Sig K562 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 480 Stam UW wgEncodeUwDGFSignalK562 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Signal ENCODE UW Digital Genomic Footprinting - Per-base Signal (in K562 cells) Regulation wgEncodeUwDGFSignalHepg2 HepG2 Sig HepG2 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 476 Stam UW wgEncodeUwDGFSignalHepg2 Signal hepatocellular carcinoma DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Signal ENCODE UW Digital Genomic Footprinting - Per-base Signal (in HepG2 cells) Regulation wgEncodeUwDGFSignalGm06990 GM06990 Sig GM06990 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 479 Stam UW wgEncodeUwDGFSignalGm06990 Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Signal ENCODE UW Digital Genomic Footprinting - Per-base Signal (in GM06990 cells) Regulation wgEncodeUwDGFViewRawSignal RawSignal ENCODE Univ. Washington Digital DNase Genomic Footprinting Regulation wgEncodeUwDGFRawSignalTh1 Th1 Raw Th1 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-05 2010-07-05 478 Stam UW WindowDensity-bin20-win+/-75 wgEncodeUwDGFRawSignalTh1 RawSignal primary Th1 T cells DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital Genomic Footprinting - Raw Signal (in Th1 cells) Regulation wgEncodeUwDGFRawSignalSknshra SK-N-SH_RA Raw SK-N-SH_RA DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 477 Stam UW WindowDensity-bin20-win+/-75 wgEncodeUwDGFRawSignalSknshra RawSignal neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital Genomic Footprinting - Raw Signal (in SK-N-SH_RA cells) Regulation wgEncodeUwDGFRawSignalK562 K562 Raw K562 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 480 Stam UW WindowDensity-bin20-win+/-75 wgEncodeUwDGFRawSignalK562 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital Genomic Footprinting - Raw Signal (in K562 cells) Regulation wgEncodeUwDGFRawSignalHepg2 HepG2 Raw HepG2 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 476 Stam UW WindowDensity-bin20-win+/-75 wgEncodeUwDGFRawSignalHepg2 RawSignal hepatocellular carcinoma DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital Genomic Footprinting - Raw Signal (in HepG2 cells) Regulation wgEncodeUwDGFRawSignalGm06990 GM06990 Raw GM06990 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 479 Stam UW WindowDensity-bin20-win+/-75 wgEncodeUwDGFRawSignalGm06990 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital Genomic Footprinting - Raw Signal (in GM06990 cells) Regulation wgEncodeUwDGFViewPeaks Peaks ENCODE Univ. Washington Digital DNase Genomic Footprinting Regulation wgEncodeUwDGFPeaksTh1 Th1 Pk Th1 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-05 2010-07-05 478 Stam UW lmax-v1.0, FDR 0.5% wgEncodeUwDGFPeaksTh1 Peaks primary Th1 T cells DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital Genomic Footprinting - Peaks (FDR 0.5%) (in Th1 cells) Regulation wgEncodeUwDGFPeaksSknshra SK-N-SH_RA Pk SK-N-SH_RA DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 477 Stam UW lmax-v1.0, FDR 0.5% wgEncodeUwDGFPeaksSknshra Peaks neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital Genomic Footprinting - Peaks (FDR 0.5%) (in SK-N-SH_RA cells) Regulation wgEncodeUwDGFPeaksK562 K562 Pk K562 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 480 Stam UW lmax-v1.0, FDR 0.5% wgEncodeUwDGFPeaksK562 Peaks leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital Genomic Footprinting - Peaks (FDR 0.5%) (in K562 cells) Regulation wgEncodeUwDGFPeaksHepg2 HepG2 Pk HepG2 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 476 Stam UW lmax-v1.0, FDR 0.5% wgEncodeUwDGFPeaksHepg2 Peaks hepatocellular carcinoma DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital Genomic Footprinting - Peaks (FDR 0.5%) (in HepG2 cells) Regulation wgEncodeUwDGFPeaksGm06990 GM06990 Pk GM06990 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 479 Stam UW lmax-v1.0, FDR 0.5% wgEncodeUwDGFPeaksGm06990 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital Genomic Footprinting - Peaks (FDR 0.5%) (in GM06990 cells) Regulation wgEncodeUwDGFViewHotspots Hotspots ENCODE Univ. Washington Digital DNase Genomic Footprinting Regulation wgEncodeUwDGFHotspotsTh1 Th1 Hspot Th1 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-05 2010-07-05 478 Stam UW Hotspot-v5.1 wgEncodeUwDGFHotspotsTh1 Hotspots primary Th1 T cells DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital Genomic Footprinting - Hotspots (in Th1 cells) Regulation wgEncodeUwDGFHotspotsSknshra SK-N-SH_RA Hspot SK-N-SH_RA DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 477 Stam UW Hotspot-v5.1 wgEncodeUwDGFHotspotsSknshra Hotspots neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital Genomic Footprinting - Hotspots (in SK-N-SH_RA cells) Regulation wgEncodeUwDGFHotspotsK562 K562 Hspot K562 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 480 Stam UW Hotspot-v5.1 wgEncodeUwDGFHotspotsK562 Hotspots leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital Genomic Footprinting - Hotspots (in K562 cells) Regulation wgEncodeUwDGFHotspotsHepg2 HepG2 Hspot HepG2 DnaseDgf ENCODE Sep 2009 Freeze 2009-10-02 2010-07-02 476 Stam UW Hotspot-v5.1 wgEncodeUwDGFHotspotsHepg2 Hotspots hepatocellular carcinoma DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital Genomic Footprinting - Hotspots (in HepG2 cells) Regulation wgEncodeUwDGFHotspotsGm06990 GM06990 Hspot GM06990 DnaseDgf ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 479 Stam UW Hotspot-v5.1 wgEncodeUwDGFHotspotsGm06990 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNase Digital Footprinting Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital Genomic Footprinting - Hotspots (in GM06990 cells) Regulation wgEncodeUwDnaseSeq UW DNaseI HS ENCODE Univ. Washington DNaseI Hypersensitivity by Digital DNaseI Regulation Description This track is produced as part of the ENCODE Project. This track shows DNaseI sensitivity measured genome-wide in different cell lines using the Digital DNaseI methodology (see below), and DNaseI hypersensitive sites. DNaseI has long been used to map general chromatin accessibility and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. For each experiment (cell type) this track shows DNaseI hypersensitive zones (HotSpots) and hypersensitive sites (Peaks) based on the sequencing tag density (Signal). Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. For each cell type, this track contains the following views: HotSpots DNaseI hypersensitive zones identified using the HotSpot algorithm. Peaks DNaseI hypersensitive sites (DHSs) identified as signal peaks within FDR 0.5% hypersensitive zones. Raw Signal Density graph (wiggle) of signal enrichment based on aligned read density. DNaseI sensitivity is shown as the absolute density of in vivo cleavage sites across the genome mapped using the Digital DNaseI methodology (see below). Data have been normalized to 25 million reads per cell type. --> Methods Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by performing DNaseI digestion of intact nuclei, isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). Uniquely mapping high-quality reads were mapped to the genome. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). 0.5% false discovery rate thresholds (FDR 0.005) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 27mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within FDR 0.5% hypersensitive zones using a peak-finding algorithm. Verification Data were verified by sequencing biological replicates displaying correlation coefficient > 0.9. Results are extensively validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting methods. Multiple cell type Southern blotting methods are available in supplemental materials. Release Notes This is release 5 (Oct 2011) of the UW DNaseI HS track. Southern blot validation images have been added. The files corresponding to tables that have been revoked or replaced in previous releases are still available for download from the FTP site. Credits These data were generated by the UW ENCODE group. Contact: Richard Sandstrom References Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M, Shafer A, Kawamoto J, Hall R, Mack J, Dorschner M et al. Discovery of functional noncoding elements by digital analysis of chromatin structure. PNAS. 2004 Nov 30;101(48):16837-42. Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nature Methods. 2006 Jul;3(7):511-8. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeUwDnaseSeqViewzRawSig Raw Signal ENCODE Univ. Washington DNaseI Hypersensitivity by Digital DNaseI Regulation wgEncodeUwDnaseSeqRawSignalRep1Th2 Th2 Raw 1 Th2 DnaseSeq ENCODE July 2009 Freeze 2009-04-29 2010-01-29 491 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Th2 RawSignal primary Th2 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in Th2 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Th1 Th1 Raw 1 Th1 DnaseSeq ENCODE July 2009 Freeze 2008-11-20 2009-08-20 483 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Th1 RawSignal primary Th1 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in Th1 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2SknshraV2 SK-N-SH_RA Raw 2 SK-N-SH_RA DnaseSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-07-17 2010-04-17 485 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2SknshraV2 RawSignal neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Sknshra SK-N-SH_RA Raw 1 SK-N-SH_RA DnaseSeq ENCODE July 2009 Freeze 2009-04-17 2010-01-17 485 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Sknshra RawSignal neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Skmc SKMC Raw 2 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2010-07-26 490 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Skmc RawSignal skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in SKMC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1SkmcV2 SKMC Raw 1 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2009-04-29 2010-01-29 490 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1SkmcV2 RawSignal skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in SKMC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Saec SAEC Raw 2 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Saec RawSignal small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in SAEC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Saec SAEC Raw 1 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Saec RawSignal small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in SAEC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Panc1 PANC-1 Raw 2 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Panc1 RawSignal pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Panc1 PANC-1 Raw 1 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Panc1 RawSignal pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Nhlf NHLF Raw 2 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Nhlf RawSignal lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in NHLF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Nhlf NHLF Raw 1 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Nhlf RawSignal lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in NHLF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Nhek NHEK Raw 1 NHEK DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 499 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Nhek RawSignal epidermal keratinocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in NHEK cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Nhdfneo NHDF-neo Raw 2 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 518 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Nhdfneo RawSignal neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Nhdfneo NHDF-neo Raw 1 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 518 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Nhdfneo RawSignal neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Nb4 NB4 Raw 2 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 498 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Nb4 RawSignal acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in NB4 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Nb4V2 NB4 Raw 1 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2009-09-19 2010-06-19 498 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Nb4V2 RawSignal acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in NB4 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Jurkat Jurkat Raw 2 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Jurkat RawSignal T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in Jurkat cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Jurkat Jurkat Raw 1 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Jurkat RawSignal T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in Jurkat cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hrpe HRPEpiC Raw 2 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hrpe RawSignal retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hrpe HRPEpiC Raw 1 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hrpe RawSignal retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hre HRE Raw 2 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hre RawSignal renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HRE cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hre HRE Raw 1 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hre RawSignal renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HRE cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hrcepic HRCEpiC Raw 2 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hrcepic RawSignal renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hrcepic HRCEpiC Raw 1 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hrcepic RawSignal renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hnpce HNPCEpiC Raw 2 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hnpce RawSignal non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hnpce HNPCEpiC Raw 1 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hnpce RawSignal non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hmec HMEC Raw 1 HMEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 503 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hmec RawSignal mammary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HMEC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hl60 HL-60 Raw 2 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 489 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hl60 RawSignal promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HL-60 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hl60V2 HL-60 Raw 1 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-04-29 2010-01-29 489 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hl60V2 RawSignal promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HL-60 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hgf HGF Raw 2 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hgf RawSignal gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HGF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hgf HGF Raw 1 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hgf RawSignal gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HGF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hee HEEpiC Raw 2 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 515 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hee RawSignal esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hee HEEpiC Raw 1 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 515 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hee RawSignal esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hcpe HCPEpiC Raw 2 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hcpe RawSignal choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hcpe HCPEpiC Raw 1 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hcpe RawSignal choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hcm HCM Raw 2 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hcm RawSignal cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HCM cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hcm HCM Raw 1 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hcm RawSignal cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HCM cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hcf HCF Raw 2 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 513 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hcf RawSignal cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HCF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hcf HCF Raw 1 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 513 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hcf RawSignal cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HCF cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hae HAEpiC Raw 2 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hae RawSignal amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hae HAEpiC Raw 1 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hae RawSignal amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1H7es H7-hESC Raw 1 H7-hESC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 511 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1H7es RawSignal undifferentiated embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in H7-hESC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Gm12865 GM12865 Raw 2 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Gm12865 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in GM12865 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Gm12865 GM12865 Raw 1 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Gm12865 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in GM12865 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Gm06990 GM06990 Raw 2 GM06990 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 481 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Gm06990 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in GM06990 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Gm06990 GM06990 Raw 1 GM06990 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 481 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Gm06990 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in GM06990 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Cmk CMK Raw 1 CMK DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 510 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Cmk RawSignal acute megakaryocytic leukemia cells, "established from the peripheral blood of a 10-month-old boy with Down's syndrome and acute megakaryocytic leukemia (AML M7) at relapse in 1985" - DSMZ. (PMID: 3016165) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in CMK cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Caco2 Caco-2 Raw 2 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-03-30 486 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Caco2 RawSignal colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Caco2 Caco-2 Raw 1 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-04-24 2010-01-24 486 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Caco2 RawSignal colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Bjtert BJ Raw 2 BJ DnaseSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 487 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Bjtert RawSignal skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in BJ cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Bjtert BJ Raw 1 BJ DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 487 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Bjtert RawSignal skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in BJ cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Ag10803 AG10803 Raw 2 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Ag10803 RawSignal abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in AG10803 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Ag10803 AG10803 Raw 1 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Ag10803 RawSignal abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in AG10803 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Ag09319 AG09319 Raw 2 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Ag09319 RawSignal gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in AG09319 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Ag09319 AG09319 Raw 1 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Ag09319 RawSignal gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in AG09319 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Ag09309 AG09309 Raw 2 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Ag09309 RawSignal adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in AG09309 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Ag09309 AG09309 Raw 1 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Ag09309 RawSignal adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in AG09309 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Ag04450 AG04450 Raw 2 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Ag04450 RawSignal fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in AG04450 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Ag04450 AG04450 Raw 1 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Ag04450 RawSignal fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in AG04450 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Ag04449 AG04449 Raw 2 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 505 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Ag04449 RawSignal fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in AG04449 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Ag04449 AG04449 Raw 1 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-05 505 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Ag04449 RawSignal fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in AG04449 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Mcf7 MCF-7 Raw 2 MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 502 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Mcf7 RawSignal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Mcf7 MCF-7 Raw 1 MCF-7 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 502 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Mcf7 RawSignal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Huvec HUVEC Raw 1 HUVEC DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 488 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Huvec RawSignal umbilical vein endothelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HUVEC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Hepg2 HepG2 Raw 2 HepG2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 482 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Hepg2 RawSignal hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HepG2 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Hepg2 HepG2 Raw 1 HepG2 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 482 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Hepg2 RawSignal hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HepG2 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Helas3 HeLa-S3 Raw 2 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Helas3 RawSignal cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Helas3 HeLa-S3 Raw 1 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Helas3 RawSignal cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2K562 K562 Raw 2 K562 DnaseSeq ENCODE July 2009 Freeze 2008-12-09 2009-09-09 484 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2K562 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in K562 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1K562 K562 Raw 1 K562 DnaseSeq ENCODE July 2009 Freeze 2008-12-09 2009-09-09 484 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1K562 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in K562 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1H1es H1-hESC Raw 1 H1-hESC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-18 496 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1H1es RawSignal embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in H1-hESC cells) Regulation wgEncodeUwDnaseSeqRawSignalRep1Gm12878 GM12878 Raw 2 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 492 Stam UW WindowDensity-bin20-win+/-75 1 wgEncodeUwDnaseSeqRawSignalRep1Gm12878 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 2nd (in GM12878 cells) Regulation wgEncodeUwDnaseSeqRawSignalRep2Gm12878 GM12878 Raw 1 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 492 Stam UW WindowDensity-bin20-win+/-75 2 wgEncodeUwDnaseSeqRawSignalRep2Gm12878 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Digital DNaseI Raw Signal - 1st (in GM12878 cells) Regulation wgEncodeUwDnaseSeqViewaPeaks Peaks ENCODE Univ. Washington DNaseI Hypersensitivity by Digital DNaseI Regulation wgEncodeUwDnaseSeqPeaksRep1Th2 Th2 Pk 1 Th2 DnaseSeq ENCODE July 2009 Freeze 2009-04-29 2010-01-29 491 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Th2 Peaks primary Th2 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in Th2 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Th1 Th1 Pk 1 Th1 DnaseSeq ENCODE July 2009 Freeze 2008-11-20 2009-08-20 483 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Th1 Peaks primary Th1 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in Th1 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2SknshraV2 SK-N-SH_RA Pk 2 SK-N-SH_RA DnaseSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-07-17 2010-04-17 485 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2SknshraV2 Peaks neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Sknshra SK-N-SH_RA Pk 1 SK-N-SH_RA DnaseSeq ENCODE July 2009 Freeze 2009-04-17 2010-01-17 485 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Sknshra Peaks neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Skmc SKMC Pk 2 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2010-07-26 490 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Skmc Peaks skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in SKMC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1SkmcV2 SKMC Pk 1 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2009-04-29 2010-01-29 490 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1SkmcV2 Peaks skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in SKMC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Saec SAEC Pk 2 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Saec Peaks small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in SAEC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Saec SAEC Pk 1 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Saec Peaks small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in SAEC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Panc1 PANC-1 Pk 2 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Panc1 Peaks pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Panc1 PANC-1 Pk 1 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Panc1 Peaks pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Nhlf NHLF Pk 2 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Nhlf Peaks lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in NHLF cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Nhlf NHLF Pk 1 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Nhlf Peaks lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in NHLF cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Nhek NHEK Pk 1 NHEK DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 499 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Nhek Peaks epidermal keratinocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in NHEK cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Nhdfneo NHDF-neo Pk 2 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 518 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Nhdfneo Peaks neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Nhdfneo NHDF-neo Pk 1 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 518 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Nhdfneo Peaks neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Nb4 NB4 Pk 2 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 498 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Nb4 Peaks acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in NB4 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Nb4V2 NB4 Pk 1 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2009-09-19 2010-06-19 498 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Nb4V2 Peaks acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in NB4 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Jurkat Jurkat Pk 2 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Jurkat Peaks T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in Jurkat cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Jurkat Jurkat Pk 1 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Jurkat Peaks T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in Jurkat cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hrpe HRPEpiC Pk 2 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hrpe Peaks retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hrpe HRPEpiC Pk 1 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hrpe Peaks retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hre HRE Pk 2 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hre Peaks renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HRE cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hre HRE Pk 1 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hre Peaks renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HRE cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hrcepic HRC Pk 2 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hrcepic Peaks renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hrcepic HRCEpiC Pk 1 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hrcepic Peaks renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hnpce HNPCEpiC Pk 2 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hnpce Peaks non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hnpce HNPCEpiC Pk 1 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hnpce Peaks non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hmec HMEC Pk 1 HMEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 503 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hmec Peaks mammary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HMEC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hl60 HL-60 Pk 2 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 489 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hl60 Peaks promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HL-60 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hl60V2 HL-60 Pk 1 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-04-29 2010-01-29 489 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hl60V2 Peaks promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HL-60 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hgf HGF Pk 2 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hgf Peaks gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HGF cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hgf HGF Pk 1 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hgf Peaks gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HGF cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hee HEEpiC Pk 2 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 515 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hee Peaks esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hee HEEpiC Pk 1 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 515 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hee Peaks esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hcpe HCPEpiC Pk 2 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hcpe Peaks choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hcpe HCPEpiC Pk 1 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hcpe Peaks choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hcm HCM Pk 2 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hcm Peaks cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HCM cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hcm HCM Pk 1 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hcm Peaks cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HCM cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hcf HCF Pk 2 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 513 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hcf Peaks cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HCF cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hcf HCF Pk 1 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 513 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hcf Peaks cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HCF cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hae HAEpiC Pk 2 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hae Peaks amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hae HAEpiC Pk 1 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hae Peaks amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1H7es H7-hESC Pk 1 H7-hESC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 511 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1H7es Peaks undifferentiated embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in H7-hESC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Gm12865 GM12865 Pk 2 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Gm12865 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in GM12865 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Gm12865 GM12865 Pk 1 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Gm12865 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in GM12865 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Gm06990 GM06990 Pk 2 GM06990 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 481 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Gm06990 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in GM06990 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Gm06990 GM06990 Pk 1 GM06990 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 481 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Gm06990 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in GM06990 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Cmk CMK Pk 1 CMK DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 510 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Cmk Peaks acute megakaryocytic leukemia cells, "established from the peripheral blood of a 10-month-old boy with Down's syndrome and acute megakaryocytic leukemia (AML M7) at relapse in 1985" - DSMZ. (PMID: 3016165) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in CMK cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Caco2 Caco-2 Pk 2 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-03-30 486 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Caco2 Peaks colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Caco2 Caco-2 Pk 1 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-04-24 2010-01-24 486 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Caco2 Peaks colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Bjtert BJ Pk 2 BJ DnaseSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 487 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Bjtert Peaks skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in BJ cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Bjtert BJ Pk 1 BJ DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 487 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Bjtert Peaks skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in BJ cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Ag10803 AG10803 Pk 2 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Ag10803 Peaks abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in AG10803 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Ag10803 AG10803 Pk 1 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Ag10803 Peaks abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in AG10803 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Ag09319 AG09319 Pk 2 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Ag09319 Peaks gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in AG09319 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Ag09319 AG09319 Pk 1 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Ag09319 Peaks gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in AG09319 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Ag09309 AG09309 Pk 2 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Ag09309 Peaks adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in AG09309 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Ag09309 AG09309 Pk 1 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Ag09309 Peaks adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in AG09309 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Ag04450 AG04450 Pk 2 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Ag04450 Peaks fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in AG04450 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Ag04450 AG04450 Pk 1 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Ag04450 Peaks fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in AG04450 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Ag04449 AG04449 Pk 2 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 505 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Ag04449 Peaks fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in AG04449 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Ag04449 AG04449 Pk 1 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-05 505 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Ag04449 Peaks fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in AG04449 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Mcf7 MCF-7 Pk 2 MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 502 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Mcf7 Peaks mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Mcf7 MCF-7 Pk 1 MCF-7 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 502 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Mcf7 Peaks mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Huvec HUVEC Pk 1 HUVEC DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 488 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Huvec Peaks umbilical vein endothelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HUVEC cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Hepg2 HepG2 Pk 2 HepG2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 482 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Hepg2 Peaks hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HepG2 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Hepg2 HepG2 Pk 1 HepG2 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 482 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Hepg2 Peaks hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HepG2 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Helas3 HeLa-S3 Pk 2 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Helas3 Peaks cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Helas3 HeLa-S3 Pk 1 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Helas3 Peaks cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2K562 K562 Pk 2 K562 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 484 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2K562 Peaks leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in K562 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1K562 K562 Pk 1 K562 DnaseSeq ENCODE July 2009 Freeze 2008-12-09 2009-09-09 484 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1K562 Peaks leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in K562 cells) Regulation wgEncodeUwDnaseSeqPeaksRep1H1es H1-hESC Pk 1 H1-hESC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-18 496 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1H1es Peaks embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in H1-hESC cells) Regulation wgEncodeUwDnaseSeqPeaksRep1Gm12878 GM12878 Pk 2 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 492 Stam UW lmax-v1.0, FDR 0.5% 1 wgEncodeUwDnaseSeqPeaksRep1Gm12878 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 2nd (in GM12878 cells) Regulation wgEncodeUwDnaseSeqPeaksRep2Gm12878 GM12878 Pk 1 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 492 Stam UW lmax-v1.0, FDR 0.5% 2 wgEncodeUwDnaseSeqPeaksRep2Gm12878 Peaks B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Digital DNaseI Peaks (FDR 0.5%) - 1st (in GM12878 cells) Regulation wgEncodeUwDnaseSeqViewcHot Hot Spots ENCODE Univ. Washington DNaseI Hypersensitivity by Digital DNaseI Regulation wgEncodeUwDnaseSeqHotspotsRep1Th2 Th2 Hspot 1 Th2 DnaseSeq ENCODE July 2009 Freeze 2009-04-29 2010-01-29 491 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Th2 Hotspots primary Th2 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in Th2 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Th1 Th1 1 Th1 DnaseSeq ENCODE July 2009 Freeze 2008-11-20 2009-08-20 483 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Th1 Hotspots primary Th1 T cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in Th1 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2SknshraV2 SK-N-SH_RAHspot2 SK-N-SH_RA DnaseSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-07-17 2010-04-17 485 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2SknshraV2 Hotspots neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Sknshra SK-N-SH_RAHspot1 SK-N-SH_RA DnaseSeq ENCODE July 2009 Freeze 2009-04-17 2010-01-17 485 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Sknshra Hotspots neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in SK-N-SH_RA cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Skmc SKMC Hspot 2 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2010-07-26 490 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Skmc Hotspots skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in SKMC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1SkmcV2 SKMC Hspot 1 SKMC DnaseSeq ENCODE Sep 2009 Freeze 2009-10-27 2009-04-29 2010-01-29 490 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1SkmcV2 Hotspots skeletal muscle cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in SKMC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Saec SAEC 2 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Saec Hotspots small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in SAEC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Saec SAEC 1 SAEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 501 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Saec Hotspots small airway epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in SAEC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Panc1 PANC-1 2 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Panc1 Hotspots pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Panc1 PANC-1 1 PANC-1 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 500 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Panc1 Hotspots pancreatic carcinoma, (PMID: 1140870) PANC-1 was established from a pancreatic carcinoma, which was extracted via pancreatico-duodenectomy specimen from a 56-year-old Caucasian individual. Malignancy of this cell line was verified via in vitro and in vivo assays. DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in PANC-1 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Nhlf NHLF 2 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Nhlf Hotspots lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in NHLF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Nhlf NHLF Hspot 1 NHLF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 521 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Nhlf Hotspots lung fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in NHLF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Nhek NHEK 1 NHEK DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 499 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Nhek Hotspots epidermal keratinocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in NHEK cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Nhdfneo NHDF-neo Hspot 2 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 518 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Nhdfneo Hotspots neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Nhdfneo NHDF-neo Hspot 1 NHDF-neo DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 518 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Nhdfneo Hotspots neonatal dermal fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in NHDF-neo cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Nb4 NB4 Hspot 2 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 498 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Nb4 Hotspots acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in NB4 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Nb4V2 NB4 Hspot 1 NB4 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2009-09-19 2010-06-19 498 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Nb4V2 Hotspots acute promyelocytic leukemia cell line. (PMID: 1995093) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in NB4 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Jurkat Jurkat Hspot 2 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Jurkat Hotspots T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in Jurkat cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Jurkat Jurkat Hspot 1 Jurkat DnaseSeq ENCODE Jan 2010 Freeze 2010-01-13 2009-09-19 2010-06-19 497 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Jurkat Hotspots T lymphoblastoid derived from an acute T cell leukemia, "The Jurkat cell line was established from the peripheral blood of a 14 year old boy by Schneider et al., and was originally designated JM." - ATCC. (PMID: 68013) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in Jurkat cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hrpe HRPEpiC Hspot 2 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hrpe Hotspots retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hrpe HRPEpiC Hspot 1 HRPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 517 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hrpe Hotspots retinal pigment epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HRPEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hre HRE Hspot 2 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hre Hotspots renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HRE cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hre HRE Hspot 1 HRE DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 494 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hre Hotspots renal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HRE cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hrcepic HRCEpiC Hspot 2 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hrcepic Hotspots renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hrcepic HRCEpiC Hspot 1 HRCEpiC DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 493 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hrcepic Hotspots renal cortical epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HRCEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hnpce HNPCEpiC Hspot 2 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hnpce Hotspots non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hnpce HNPCEpiC Hspot 1 HNPCEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 516 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hnpce Hotspots non-pigment ciliary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HNPCEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hmec HMEC Hspot 1 HMEC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 503 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hmec Hotspots mammary epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HMEC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hl60 HL-60 2 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-19 489 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hl60 Hotspots promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HL-60 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hl60V2 HL-60 Hspot 1 HL-60 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-04-29 2010-01-29 489 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hl60V2 Hotspots promyelocytic leukemia cells, (PMID: 276884) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HL-60 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hgf HGF Hspot 2 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hgf Hotspots gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HGF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hgf HGF Hspot 1 HGF DnaseSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 504 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hgf Hotspots gingival fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HGF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hee HEEpiC Hspot 2 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 515 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hee Hotspots esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hee HEEpiC Hspot 1 HEEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 515 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hee Hotspots esophageal epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HEEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hcpe HCPEpiC Hspot 2 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hcpe Hotspots choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hcpe HCPEpiC Hspot 1 HCPEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 514 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hcpe Hotspots choroid plexus epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HCPEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hcm HCM Hspot 2 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hcm Hotspots cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HCM cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hcm HCM Hspot 1 HCM DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 519 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hcm Hotspots cardiac myocytes DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HCM cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hcf HCF Hspot 2 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 513 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hcf Hotspots cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HCF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hcf HCF Hspot 1 HCF DnaseSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-08 513 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hcf Hotspots cardiac fibroblasts DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HCF cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hae HAEpiC Hspot 2 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Hae Hotspots amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hae HAEpiC Hspot 1 HAEpiC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 512 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Hae Hotspots amniotic epithelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HAEpiC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1H7es H7-hESC Hspot 1 H7-hESC DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 511 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1H7es Hotspots undifferentiated embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in H7-hESC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Gm12865 GM12865 Hspot 2 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Gm12865 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in GM12865 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Gm12865 GM12865 Hspot 1 GM12865 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 520 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Gm12865 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in GM12865 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Gm06990 GM06990 Hspot 2 GM06990 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 481 Stam UW Hotspot-v5.0 2 wgEncodeUwDnaseSeqHotspotsRep2Gm06990 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in GM06990 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Gm06990 GM06990 Hspot 1 GM06990 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 481 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Gm06990 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in GM06990 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Cmk CMK Hspot 1 CMK DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-07 510 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Cmk Hotspots acute megakaryocytic leukemia cells, "established from the peripheral blood of a 10-month-old boy with Down's syndrome and acute megakaryocytic leukemia (AML M7) at relapse in 1985" - DSMZ. (PMID: 3016165) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in CMK cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Caco2 Caco-2 Hspot 2 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-03-30 486 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Caco2 Hotspots colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Caco2 Caco-2 Hspot 1 Caco-2 DnaseSeq ENCODE July 2009 Freeze 2009-04-24 2010-01-24 486 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Caco2 Hotspots colorectal adenocarcinoma. (PMID: 1939345) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in Caco-2 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Bjtert BJ Hspot 2 BJ DnaseSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 487 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Bjtert Hotspots skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in BJ cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Bjtert BJ Hspot 1 BJ DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 487 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Bjtert Hotspots skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in BJ cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Ag10803 AG10803 Hspot 2 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Ag10803 Hotspots abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in AG10803 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Ag10803 AG10803 Hspot 1 AG10803 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 509 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Ag10803 Hotspots abdominal skin fibroblasts from apparently heathly 22 year old, "8% of the cells examined showing random chromosome loss, 2% showing random chromosome gain, and 2% showing 69,XYY" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in AG10803 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Ag09319 AG09319 Hspot 2 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Ag09319 Hotspots gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in AG09319 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Ag09319 AG09319 Hspot 1 AG09319 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 508 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Ag09319 Hotspots gum tissue fibroblasts from apparently heathly 24 year old DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in AG09319 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Ag09309 AG09309 Hspot 2 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Ag09309 Hotspots adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in AG09309 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Ag09309 AG09309 Hspot 1 AG09309 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 507 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Ag09309 Hotspots adult toe fibroblast from apparently healthy 21 year old, "7% of the cells examined showing random chromosome loss/gain" -Coriell DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in AG09309 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Ag04450 AG04450 Hspot 2 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Ag04450 Hotspots fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in AG04450 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Ag04450 AG04450 Hspot 1 AG04450 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 506 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Ag04450 Hotspots fetal lung fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in AG04450 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Ag04449 AG04449 Hspot 2 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 505 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Ag04449 Hotspots fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in AG04449 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Ag04449 AG04449 Hspot 1 AG04449 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-05 505 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Ag04449 Hotspots fetal buttock/thigh fibroblast DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in AG04449 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Mcf7 MCF-7 Hspot 2 MCF-7 DnaseSeq ENCODE Jan 2010 Freeze 2010-01-10 2010-10-09 502 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Mcf7 Hotspots mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Mcf7 MCF-7 Hspot 1 MCF-7 DnaseSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 502 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Mcf7 Hotspots mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in MCF-7 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Huvec HUVEC Hspot 1 HUVEC DnaseSeq ENCODE July 2009 Freeze 2009-04-27 2010-01-27 488 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Huvec Hotspots umbilical vein endothelial cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HUVEC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Hepg2 HepG2 Hspot 2 HepG2 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 482 Stam UW Hotspot-v5.0 2 wgEncodeUwDnaseSeqHotspotsRep2Hepg2 Hotspots hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HepG2 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Hepg2 HepG2 Hspot 1 HepG2 DnaseSeq ENCODE July 2009 Freeze 2008-11-19 2009-08-18 482 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1Hepg2 Hotspots hepatocellular carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HepG2 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Helas3 HeLa-S3 Hspot 2 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW Hotspot-v5.1 2 wgEncodeUwDnaseSeqHotspotsRep2Helas3 Hotspots cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Helas3 HeLa-S3 Hspot 1 HeLa-S3 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 495 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Helas3 Hotspots cervical carcinoma DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in HeLa-S3 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2K562 K562 Hspot 2 K562 DnaseSeq ENCODE July 2009 Freeze 2008-12-09 2009-09-09 484 Stam UW Hotspot-v5.0 2 wgEncodeUwDnaseSeqHotspotsRep2K562 Hotspots leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in K562 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1K562 K562 Hspot 1 K562 DnaseSeq ENCODE July 2009 Freeze 2008-12-09 2009-09-09 484 Stam UW Hotspot-v5.0 1 wgEncodeUwDnaseSeqHotspotsRep1K562 Hotspots leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in K562 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1H1es H1-hESC 1 H1-hESC DnaseSeq ENCODE Sep 2009 Freeze 2009-09-19 2010-06-18 496 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1H1es Hotspots embryonic stem cells DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in H1-hESC cells) Regulation wgEncodeUwDnaseSeqHotspotsRep1Gm12878 GM12878 Hspot 2 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-01 492 Stam UW Hotspot-v5.1 1 wgEncodeUwDnaseSeqHotspotsRep1Gm12878 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 2nd (in GM12878 cells) Regulation wgEncodeUwDnaseSeqHotspotsRep2Gm12878 GM12878 Hspot 1 GM12878 DnaseSeq ENCODE July 2009 Freeze 2009-07-01 2010-04-01 492 Stam UW Hotspot-v5.0 2 wgEncodeUwDnaseSeqHotspotsRep2Gm12878 Hotspots B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus DNaseI HS Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Digital DNaseI Hotspots - 1st (in GM12878 cells) Regulation wgEncodeUwChIPSeq UW Histone ENCODE Histone Modifications by Univ. Washington ChIP-seq Regulation Description This track is produced as part of the ENCODE Project. This track displays maps of histone modifications genome-wide in different cell lines, using ChIP-seq high-throughput sequencing. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. For each cell type, this track contains the following views: HotSpots ChIP-seq affinity zones identified using the HotSpot algorithm. Peaks ChIP-seq affinity sites identified as signal peaks within FDR 0.5% hypersensitive zones. Raw Signal Density graph (wiggle) of signal enrichment based on aligned read density. Methods Cells were grown according to the approved ENCODE cell culture protocols. Cells were crosslinked with 1% formaldehyde, and the reaction was quenched by the addition of glycine. Fixed cells were rinsed with PBS, lysed in nuclei lysis buffer, and the chromatin was sheared to 200-500 bp fragments using Fisher Dismembrator (model 500). Sheared chromatin fragments were immunoprecipitated with specific polyclonal antibody at 4 degrees C with gentle rotation. Antibody-chromatin complexes were washed and eluted. The cross linking in immunoprecipitated DNA was reversed and treated with RNase-A. Following proteinase K treatment, the DNA fragments were purified by phenol-chloroform-isoamyl alcohol extraction and ethanol precipitation. 20-50 ng of ChIP DNA was end-repaired, followed by the addition of adenine, ligated to Illumina adapters and made in to Solexa library for sequencing. The sequencing reads uniquely mapping to the hg18 reference human genome were then scored using a scan statistic based on the binomial distribution. Regions of significant tag enrichment (significance being gauged using an FDR procedure based on randomly generated data) were then resolved into 150 bp peaks called on a continuous, sliding window tag density representation of the data (Sabo et al., 2004). Verification Data were verified by sequencing biological replicates displaying correlation coefficient >0.9. Credits These data were generated by the UW ENCODE group. Contact: Richard Sandstrom References Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M, Shafer A, Kawamoto J, Hall R, Mack J, Dorschner MO, McArthur M, Stamatoyannopoulos JA. Discovery of functional noncoding elements by digital analysis of chromatin structure. PNAS. 2004;101:16837-16842. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeUwChIPSeqViewzRawSig Raw Signal ENCODE Histone Modifications by Univ. Washington ChIP-seq Regulation wgEncodeUwChIPSeqRawSignalRep2Werirb1Ctcf WERIRb1 CTCF S2 CTCF WERI-Rb-1 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 402 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Werirb1Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. retinoblastoma (PMID: 844036) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in WERI-Rb-1 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Werirb1Ctcf WERIRb1 CTCF S1 CTCF WERI-Rb-1 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 402 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Werirb1Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. retinoblastoma (PMID: 844036) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in WERI-Rb-1 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SknshraH3k36me3 SKN H3K36me3 S2 H3K36me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 441 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SknshraH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SknshraH3k36me3 SKN H3K36me3 S1 H3K36me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 441 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SknshraH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SknshraH3k27me3 SKN H3K27me3 S2 H3K27me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 440 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SknshraH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SknshraH3k27me3 SKN H3K27me3 S1 H3K27me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 440 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SknshraH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SknshraH3k4me3 SKN H3K4me3 S2 H3K4me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 422 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SknshraH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SknshraH3k4me3 SKN H3K4me3 S1 H3K4me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 422 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SknshraH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SknshraCtcf SKNSHRA CTCF S2 CTCF SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-23 2010-07-22 439 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SknshraCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SknshraCtcf SKNSHRA CTCF S1 CTCF SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 439 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SknshraCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SaecH3k36me3 SAECH3K36me3 S2 H3K36me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 438 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SaecH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SaecH3k36me3 SAEC H3K36me3 S1 H3K36me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 438 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SaecH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SaecH3k27me3 SAEC H3K27me3 S2 H3K27me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 420 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SaecH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SaecH3k27me3 SAEC H3K27me3 S1 H3K27me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 420 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SaecH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SaecH3k4me3 SAEC H3K4me3 S2 H3K4me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 421 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SaecH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SaecH3k4me3 SAEC H3K4me3 S1 H3K4me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 421 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SaecH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2SaecCtcf SAEC CTCF S2 CTCF SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 437 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2SaecCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1SaecCtcf SAEC CTCF S1 CTCF SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 437 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1SaecCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in SAEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2NhekH3k36me3 NHEK H3K36me3 S2 H3K36me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-06-29 414 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2NhekH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep1NhekH3k36me3 NHEK H3K36me3 S1 H3K36me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 414 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1NhekH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep2NhekH3k27me3 NHEK H3K27me3 S2 H3K27me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 436 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2NhekH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep1NhekH3k27me3 NHEK H3K27me3 S1 H3K27me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 436 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1NhekH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep2NhekH3k4me3 NHEK H3K4me3 S2 H3K4me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 415 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2NhekH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep1NhekH3k4me3 NHEK H3K4me3 S1 H3K4me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 415 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1NhekH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep2NhekCtcf NHEK CTCF S2 CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 406 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2NhekCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep1NhekCtcf NHEK CTCF S1 CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-21 2010-06-20 406 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1NhekCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in NHEK cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HreH3k36me3 HRE H3K36me3 S2 H3K36me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 430 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HreH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HreH3k36me3 HRE H3K36me3 S1 H3K36me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 430 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HreH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HreH3k27me3 HRE H3K27me3 S2 H3K27me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 429 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HreH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HreH3k27me3 HRE H3K27me3 S1 H3K27me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-10-20 2010-07-20 429 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HreH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HreH3k4me3 HRE H3K4me3 S2 H3K4me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 409 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HreH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HreH3k4me3 HRE H3K4me3 S1 H3K4me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 409 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HreH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HreCtcf HRE CTCF S2 CTCF HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 405 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HreCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HreCtcf HRE CTCF S1 CTCF HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 405 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HreCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HRE cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HmecH3k27me3 HMEC H3K27me3 S1 H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 408 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HmecH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in HMEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HmecCtcf HMEC CTCF S1 CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 419 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HmecCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HMEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Hl60H3k4me3 HL60 H3K4me3 S2 H3K4me3 HL-60 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 418 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Hl60H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in HL-60 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hl60H3k4me3 HL60 H3K4me3 S1 H3K4me3 HL-60 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 418 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hl60H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in HL-60 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hl60Ctcf HL60 CTCF S1 CTCF HL-60 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 397 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hl60Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HL-60 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Hek293Ctcf HEK293 CTCF S2 CTCF HEK293 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 396 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Hek293Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in HEK293 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hek293Ctcf HEK293 CTCF S1 CTCF HEK293 ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-06-30 2010-03-29 396 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hek293Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HEK293 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12875Ctcf GM12875 CTCF S2 CTCF GM12875 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 452 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12875Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12875 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12875Ctcf GM12875 CTCF S1 CTCF GM12875 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 452 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12875Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12875 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12874Ctcf GM12874 CTCF S2 CTCF GM12874 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 451 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12874Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12874 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12874Ctcf GM12874 CTCF S1 CTCF GM12874 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 451 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12874Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12874 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12873Ctcf GM12873 CTCF S2 CTCF GM12873 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 450 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12873Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12873 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12873Ctcf GM12873 CTCF S1 CTCF GM12873 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 450 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12873Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12873 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12872Ctcf GM12872 CTCF S2 CTCF GM12872 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 449 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12872Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12872 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12872Ctcf GM12872 CTCF S1 CTCF GM12872 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 449 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12872Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12872 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12865Ctcf GM12865 CTCF S2 CTCF GM12865 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 448 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12865Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12865 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12865Ctcf GM12865 CTCF S1 CTCF GM12865 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 448 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12865Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12865 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12864Ctcf GM12864 CTCF S2 CTCF GM12864 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 447 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12864Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12864 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12864Ctcf GM12864 CTCF S1 CTCF GM12864 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 447 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12864Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12864 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12801Ctcf GM12801 CTCF S1 CTCF GM12801 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 393 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12801Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12801 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k36me3 6990 H3K36me3 S2 H3K36me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 444 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k36me3 6990 H3K36me3 S1 H3K36me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 444 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k27me3 6990 H3K27me3 S2 H3K27me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 427 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k27me3 6990 H3K27me3 S1 H3K27me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 427 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k4me3 06990 H3K4me3 S2 H3K4me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 417 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm06990H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k4me3 06990 H3K4me3 S1 H3K4me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 417 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm06990H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm06990Ctcf GM06990 CTCF S2 CTCF GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 392 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm06990Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm06990Ctcf GM06990 CTCF S1 CTCF GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2009-06-29 2010-03-29 392 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm06990Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM06990 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Caco2H3k36me3 Cco2 H3K36me3 S2 H3K36me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 426 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Caco2H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Caco2H3k36me3 Cco2 H3K36me3 S1 H3K36me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 426 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Caco2H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Caco2H3k27me3 Cco2 H3K27me3 S2 H3K27me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 425 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Caco2H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Caco2H3k27me3 Cco2 H3K27me3 S1 H3K27me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 425 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Caco2H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Caco2H3k4me3 Cco2 H3K4me3 S2 H3K4me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 407 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Caco2H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Caco2H3k4me3 Caco2 H3K4me3 S1 H3K4me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 407 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Caco2H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Caco2Ctcf Caco2 CTCF S2 CTCF Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 404 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Caco2Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Caco2Ctcf Caco2 CTCF S1 CTCF Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 404 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Caco2Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in Caco-2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2BjH3k36me3 BJ H3K36me3 S2 H3K36me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 443 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2BjH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep1BjH3k36me3 BJ H3K36me3 S1 H3K36me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 443 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1BjH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep2BjH3k27me3 BJ H3K27me3 S2 H3K27me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 424 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2BjH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep1BjH3k27me3 BJ H3K27me3 S1 H3K27me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 424 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1BjH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep2BjH3k4me3 BJ H3K4me3 S2 H3K4me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 416 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2BjH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep1BjH3k4me3 BJ H3K4me3 S1 H3K4me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 416 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1BjH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep2BjtertCtcf BJ CTCF S2 CTCF BJ ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-19 403 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2BjtertCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep1BjtertCtcf BJ CTCF S1 CTCF BJ ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 403 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1BjtertCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in BJ cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HuvecH3k36me3 HUVEC H3K36me3 S2 H3K36me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 431 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HuvecH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HuvecH3k36me3 HUVEC H3K36me3 S1 H3K36me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 431 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HuvecH3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HuvecH3k27me3 HUVEC H3K27me3 S2 H3K27me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-12 411 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HuvecH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HuvecH3k27me3 HUVEC H3K27me3 S1 H3K27me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 411 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HuvecH3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HuvecH3k4me3 HUVEC H3K4me3 S2 H3K4me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 412 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HuvecH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HuvecH3k4me3 HUVEC H3K4me3 S1 H3K4me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 412 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HuvecH3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep2HuvecCtcf HUVEC CTCF S2 CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 410 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2HuvecCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1HuvecCtcf HUVEC CTCF S1 CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-06-29 410 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1HuvecCtcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HUVEC cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k36me3 HpG2 H3K36me3 S1 H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 446 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Hepg2H3k27me3 HpG2 H3K27me3 S2 H3K27me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 433 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Hepg2H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k27me3 HpG2 H3K27me3 S1 H3K27me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 433 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Hepg2H3k4me3 HepG2 H3K4me3 S2 H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 413 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Hepg2H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k4me3 HepG2 H3K4me3 S1 H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 413 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hepg2H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Hepg2Ctcf HepG2 CTCF S2 CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2009-07-02 2010-04-02 401 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Hepg2Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Hepg2Ctcf HepG2 CTCF S1 CTCF HepG2 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 401 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Hepg2Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HepG2 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Helas3H3k36me3 HLS3 H3K36me3 S2 H3K36me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 432 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Helas3H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Helas3H3k36me3 HLS3 H3K36me3 S1 H3K36me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 432 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Helas3H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Helas3H3k27me3 HLS3 H3K27me3 S2 H3K27me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 442 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Helas3H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Helas3H3k27me3 HLS3 H3K27me3 S1 H3K27me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 442 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Helas3H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Helas3H3k4me3 HLS3 H3K4me3 S2 H3K4me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 423 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Helas3H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Helas3H3k4me3 HLaS3 H3K4me3 S1 H3K4me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 423 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Helas3H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Helas3Ctcf HeLaS3 CTCF S2 CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 398 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Helas3Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Helas3Ctcf HeLaS3 CTCF S1 CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 398 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Helas3Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2K562H3k36me3 K562 H3K36me3 S2 H3K36me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 435 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2K562H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1K562H3k36me3 K562 H3K36me3 S1 H3K36me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 435 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1K562H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2K562H3k27me3 K562 H3K27me3 S2 H3K27me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 434 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2K562H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1K562H3k27me3 K562 H3K27me3 S1 H3K27me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 434 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1K562H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1K562H3k4me3 K562 H3K4me3 S1 H3K4me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 400 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1K562H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2K562Ctcf K562 CTCF S2 CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 399 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2K562Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1K562Ctcf K562 CTCF S1 CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 399 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1K562Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in K562 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k36me3 M128 H3K36me3 S2 H3K36me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 445 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K36me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k36me3 M128 H3K36me3 S1 H3K36me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 445 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k36me3 RawSignal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K36me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k27me3 M128 H3K27me3 S2 H3K27me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 428 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K27me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k27me3 M128 H3K27me3 S1 H3K27me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 428 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k27me3 RawSignal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K27me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k4me3 GM128 H3K4me3 S2 H3K4me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 395 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12878H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (H3K4me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k4me3 GM128 H3K4me3 S1 H3K4me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 395 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12878H3k4me3 RawSignal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (H3K4me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep2Gm12878Ctcf GM12878 CTCF S2 CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 394 Stam UW WindowDensity-bin20-win+/-75 2 exp wgEncodeUwChIPSeqRawSignalRep2Gm12878Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 2nd (CTCF in GM12878 cells) Regulation wgEncodeUwChIPSeqRawSignalRep1Gm12878Ctcf GM12878 CTCF S1 CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 394 Stam UW WindowDensity-bin20-win+/-75 1 exp wgEncodeUwChIPSeqRawSignalRep1Gm12878Ctcf RawSignal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE UW Histone ChIP Raw Signal - 1st (CTCF in GM12878 cells) Regulation wgEncodeUwChIPSeqViewaPeaks Peaks ENCODE Histone Modifications by Univ. Washington ChIP-seq Regulation wgEncodeUwChIPSeqPeaksWerirb1Ctcf WERIRb1 CTCF P1 CTCF WERI-Rb-1 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 402 Stam UW lmax-v1.0, FDR 0.5% exp wgEncodeUwChIPSeqPeaksWerirb1Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. retinoblastoma (PMID: 844036) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in WERI-Rb-1 cells) Regulation wgEncodeUwChIPSeqPeaksRep1SknshraH3k36me3 SKN H3K36me3 P1 H3K36me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 441 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SknshraH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqPeaksRep1SknshraH3k27me3 SKN H3K27me3 P1 H3K27me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 440 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SknshraH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqPeaksRep1SknshraH3k4me3 SKN H3K4me3 P1 H3K4me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 422 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SknshraH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqPeaksRep1SknshraCtcf SKNSHRA CTCF P1 CTCF SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 439 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SknshraCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqPeaksRep1SaecH3k36me3 SAEC H3K36me3 P1 H3K36me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 438 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SaecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in SAEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1SaecH3k27me3 SAEC H3K27me3 P1 H3K27me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 420 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SaecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in SAEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1SaecH3k4me3 SAEC H3K4me3 P1 H3K4me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 421 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SaecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in SAEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1SaecCtcf SAEC CTCF P1 CTCF SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 437 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1SaecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in SAEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1NhekH3k36me3 NHEK H3K36me3 P1 H3K36me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 414 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1NhekH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in NHEK cells) Regulation wgEncodeUwChIPSeqPeaksRep1NhekH3k27me3 NHEK H3K27me3 P1 H3K27me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 436 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1NhekH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in NHEK cells) Regulation wgEncodeUwChIPSeqPeaksRep1NhekH3k4me3 NHEK H3K4me3 P1 H3K4me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 415 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1NhekH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in NHEK cells) Regulation wgEncodeUwChIPSeqPeaksRep1NhekCtcf NHEK CTCF P1 CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-21 2010-06-20 406 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1NhekCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in NHEK cells) Regulation wgEncodeUwChIPSeqPeaksRep1HreH3k36me3 HRE H3K36me3 P1 H3K36me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 430 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HreH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in HRE cells) Regulation wgEncodeUwChIPSeqPeaksRep1HreH3k27me3 HRE H3K27me3 P1 H3K27me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-10-20 2010-07-20 429 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HreH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in HRE cells) Regulation wgEncodeUwChIPSeqPeaksRep1HreH3k4me3 HRE H3K4me3 P1 H3K4me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 409 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HreH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in HRE cells) Regulation wgEncodeUwChIPSeqPeaksHreCtcf HRE CTCF P1 CTCF HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 405 Stam UW lmax-v1.0, FDR 0.5% exp wgEncodeUwChIPSeqPeaksHreCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HRE cells) Regulation wgEncodeUwChIPSeqPeaksRep1HmecH3k27me3 HMEC H3K27me3 P1 H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 408 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HmecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in HMEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1HmecCtcf HMEC CTCF P1 CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 419 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HmecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HMEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1Hl60H3k4me3 HL60 H3K4me3 P1 H3K4me3 HL-60 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 418 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Hl60H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in HL-60 cells) Regulation wgEncodeUwChIPSeqPeaksHl60Ctcf HL60 CTCF P1 CTCF HL-60 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 397 Stam UW lmax-v1.0, FDR 0.5% exp wgEncodeUwChIPSeqPeaksHl60Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HL-60 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Hek293Ctcf HEK293 CTCF P1 CTCF HEK293 ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-06-30 2010-03-29 396 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Hek293Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HEK293 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12875Ctcf GM12875 CTCF P1 CTCF GM12875 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 452 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12875Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12875 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12874Ctcf GM12874 CTCF P1 CTCF GM12874 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 451 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12874Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12874 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12873Ctcf GM12873 CTCF P1 CTCF GM12873 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 450 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12873Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12873 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12872Ctcf GM12872 CTCF P1 CTCF GM12872 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 449 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12872Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12872 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12865Ctcf GM12865 CTCF P1 CTCF GM12865 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 448 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12865Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12865 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12864Ctcf GM12864 CTCF P1 CTCF GM12864 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 447 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12864Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12864 cells) Regulation wgEncodeUwChIPSeqPeaksGm12801Ctcf GM12801 CTCF P1 CTCF GM12801 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 393 Stam UW lmax-v1.0, FDR 0.5% exp wgEncodeUwChIPSeqPeaksGm12801Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12801 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm06990H3k36me3 6990 H3K36me3 P1 H3K36me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 444 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm06990H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm06990H3k27me3 6990 H3K27me3 P1 H3K27me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 427 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm06990H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm06990H3k4me3 06990 H3K4me3 P1 H3K4me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 417 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm06990H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm06990Ctcf GM06990 CTCF P1 CTCF GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2009-06-29 2010-03-29 392 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm06990Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM06990 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Caco2H3k36me3 Cco2 H3K36me3 P1 H3K36me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 426 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Caco2H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Caco2H3k27me3 Cco2 H3K27me3 P1 H3K27me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 425 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Caco2H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Caco2H3k4me3 Caco2 H3K4me3 P1 H3K4me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 407 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Caco2H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Caco2Ctcf Caco2 CTCF P1 CTCF Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 404 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Caco2Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in Caco-2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1BjH3k36me3 BJ H3K36me3 P1 H3K36me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 443 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1BjH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in BJ cells) Regulation wgEncodeUwChIPSeqPeaksRep1BjH3k27me3 BJ H3K27me3 P1 H3K27me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 424 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1BjH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in BJ cells) Regulation wgEncodeUwChIPSeqPeaksRep1BjH3k4me3 BJ H3K4me3 P1 H3K4me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 416 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1BjH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in BJ cells) Regulation wgEncodeUwChIPSeqPeaksRep1BjtertCtcf BJ CTCF P1 CTCF BJ ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 403 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1BjtertCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in BJ cells) Regulation wgEncodeUwChIPSeqPeaksRep1HuvecH3k36me3 HUVEC H3K36me3 P1 H3K36me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 431 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HuvecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1HuvecH3k27me3 HUVEC H3K27me3 P1 H3K27me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 411 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HuvecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1HuvecH3k4me3 HUVEC H3K4me3 P1 H3K4me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 412 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HuvecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1HuvecCtcf HUVEC CTCF P1 CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-06-29 410 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1HuvecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HUVEC cells) Regulation wgEncodeUwChIPSeqPeaksRep1Hepg2H3k36me3 HpG2 H3K36me3 P1 H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 446 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Hepg2H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Hepg2H3k27me3 HpG2 H3K27me3 P1 H3K27me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 433 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Hepg2H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Hepg2H3k4me3 HepG2 H3K4me3 P1 H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 413 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Hepg2H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqPeaksHepg2Ctcf HepG2 CTCF P1 CTCF HepG2 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 401 Stam UW lmax-v1.0, FDR 0.5% exp wgEncodeUwChIPSeqPeaksHepg2Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HepG2 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Helas3H3k36me3 HLS3 H3K36me3 P1 H3K36me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 432 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Helas3H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Helas3H3k27me3 HLS3 H3K27me3 P1 H3K27me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 442 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Helas3H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Helas3H3k4me3 HLaS3 H3K4me3 P1 H3K4me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 423 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Helas3H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Helas3Ctcf HeLaS3 CTCF P1 CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 398 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Helas3Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqPeaksRep1K562H3k36me3 K562 H3K36me3 P1 H3K36me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 435 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1K562H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in K562 cells) Regulation wgEncodeUwChIPSeqPeaksRep1K562H3k27me3 K562 H3K27me3 P1 H3K27me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 434 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1K562H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in K562 cells) Regulation wgEncodeUwChIPSeqPeaksRep1K562H3k4me3 K562 H3K4me3 P1 H3K4me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 400 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1K562H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in K562 cells) Regulation wgEncodeUwChIPSeqPeaksRep1K562Ctcf K562 CTCF P1 CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 399 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1K562Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in K562 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12878H3k36me3 M128 H3K36me3 P1 H3K36me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 445 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12878H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K36me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12878H3k27me3 M128 H3K27me3 P1 H3K27me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 428 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12878H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K27me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12878H3k4me3 GM128 H3K4me3 P1 H3K4me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 395 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12878H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (H3K4me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqPeaksRep1Gm12878Ctcf GM12878 CTCF P1 CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-03-29 394 Stam UW lmax-v1.0, FDR 0.5% 1 exp wgEncodeUwChIPSeqPeaksRep1Gm12878Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington Regions of enriched signal in experiment ENCODE UW Histone ChIP Peaks (FDR 0.5%) - 1st (CTCF in GM12878 cells) Regulation wgEncodeUwChIPSeqViewcHot Hotspots ENCODE Histone Modifications by Univ. Washington ChIP-seq Regulation wgEncodeUwChIPSeqHotspotsRep2Werirb1Ctcf WERIRb1 CTCF H2 CTCF WERI-Rb-1 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 402 Stam UW Hotspot-v5.0 2 exp wgEncodeUwChIPSeqHotspotsRep2Werirb1Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. retinoblastoma (PMID: 844036) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in WERI-Rb-1 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Werirb1Ctcf WERIRb1 CTCF H1 CTCF WERI-Rb-1 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 402 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Werirb1Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. retinoblastoma (PMID: 844036) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in WERI-Rb-1 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SknshraH3k36me3 SKN H3K36me3 H2 H3K36me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 441 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SknshraH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SknshraH3k36me3 SKN H3K36me3 H1 H3K36me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 441 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SknshraH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SknshraH3k27me3 SKN H3K27me3 H2 H3K27me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 440 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SknshraH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SknshraH3k27me3 SKN H3K27me3 H1 H3K27me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 440 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SknshraH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SknshraH3k4me3 SKN H3K4me3 H2 H3K4me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 422 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SknshraH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SknshraH3k4me3 SKN H3K4me3 H1 H3K4me3 SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 422 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SknshraH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SknshraCtcf SKNSHRA CTCF H2 CTCF SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-23 2010-07-22 439 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SknshraCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SknshraCtcf SKNSHRA CTCF H1 CTCF SK-N-SH_RA ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 439 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SknshraCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. neuroblastoma cell line, treatment: differentiated with retinoic acid, (Biedler, et al. Morphology and Growth, Tumorigenicity, and Cytogenetics of Human Neuroblastoma Cells in Continuous Culture. Cancer Research 33, 2643-2652, November 1973.) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in SK-N-SH_RA cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SaecH3k36me3 SAEC H3K36me3 H2 H3K36me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 438 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SaecH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SaecH3k36me3 SAEC H3K36me3 H1 H3K36me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 438 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SaecH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SaecH3k27me3 SAEC H3K27me3 H2 H3K27me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 420 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SaecH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SaecH3k27me3 SAEC H3K27me3 H1 H3K27me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 420 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SaecH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SaecH3k4me3 SAEC H3K4me3 H2 H3K4me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 421 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SaecH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SaecH3k4me3 SAEC H3K4me3 H1 H3K4me3 SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 421 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SaecH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2SaecCtcf SAEC CTCF H2 CTCF SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 437 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2SaecCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1SaecCtcf SAEC CTCF H1 CTCF SAEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 437 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1SaecCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. small airway epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in SAEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2NhekH3k36me3 NHEK H3K36me3 H2 H3K36me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-06-29 414 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2NhekH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep1NhekH3k36me3 NHEK H3K36me3 H1 H3K36me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 414 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1NhekH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep2NhekH3k27me3 NHEK H3K27me3 H2 H3K27me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 436 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2NhekH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep1NhekH3k27me3 NHEK H3K27me3 H1 H3K27me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 436 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1NhekH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep2NhekH3k4me3 NHEK H3K4me3 H2 H3K4me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 415 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2NhekH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep1NhekH3k4me3 NHEK H3K4me3 H1 H3K4me3 NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 415 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1NhekH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep2NhekCtcf NHEK CTCF H2 CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 406 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2NhekCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep1NhekCtcf NHEK CTCF H1 CTCF NHEK ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-21 2010-06-20 406 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1NhekCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in NHEK cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HreH3k36me3 HRE H3K36me3 H2 H3K36me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 430 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HreH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HreH3k36me3 HRE H3K36me3 H1 H3K36me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 430 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HreH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HreH3k27me3 HRE H3K27me3 H2 H3K27me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 429 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HreH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HreH3k27me3 HRE H3K27me3 H1 H3K27me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2009-10-20 2010-07-20 429 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HreH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HreH3k4me3 HRE H3K4me3 H2 H3K4me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 409 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HreH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HreH3k4me3 HRE H3K4me3 H1 H3K4me3 HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 409 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HreH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HreCtcf HRE CTCF H2 CTCF HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 405 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HreCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HreCtcf HRE CTCF H1 CTCF HRE ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 405 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HreCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. renal epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HRE cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HmecH3k27me3 HMEC H3K27me3 H1 H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 408 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HmecH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in HMEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HmecCtcf HMEC CTCF H1 CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 419 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HmecCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HMEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Hl60H3k4me3 HL60 H3K4me3 H2 H3K4me3 HL-60 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 418 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Hl60H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in HL-60 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hl60H3k4me3 HL60 H3K4me3 H1 H3K4me3 HL-60 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 418 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hl60H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in HL-60 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hl60Ctcf HL60 CTCF H1 CTCF HL-60 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 397 Stam UW Hotspot-v5.0 1 exp wgEncodeUwChIPSeqHotspotsRep1Hl60Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. promyelocytic leukemia cells, (PMID: 276884) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HL-60 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Hek293Ctcf HEK293 CTCF H2 CTCF HEK293 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 396 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Hek293Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in HEK293 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hek293Ctcf HEK293 CTCF H1 CTCF HEK293 ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2009-06-30 2010-03-29 396 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hek293Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HEK293 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12875Ctcf GM12875 CTCF H2 CTCF GM12875 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 452 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12875Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12875 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12875Ctcf GM12875 CTCF H1 CTCF GM12875 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 452 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12875Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12875 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12874Ctcf GM12874 CTCF H2 CTCF GM12874 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 451 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12874Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12874 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12874Ctcf GM12874 CTCF H1 CTCF GM12874 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 451 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12874Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12874 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12873Ctcf GM12873 CTCF H2 CTCF GM12873 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 450 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12873Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12873 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12873Ctcf GM12873 CTCF H1 CTCF GM12873 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 450 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12873Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12873 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12872Ctcf GM12872 CTCF 2 CTCF GM12872 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 449 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12872Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12872 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12872Ctcf GM12872 CTCF H1 CTCF GM12872 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 449 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12872Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12872 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12865Ctcf GM12865 CTCF H2 CTCF GM12865 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 448 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12865Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12865 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12865Ctcf GM12865 CTCF 1 CTCF GM12865 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 448 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12865Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12865 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12864Ctcf GM12864 CTCF H2 CTCF GM12864 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 447 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12864Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12864 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12864Ctcf GM12864 CTCF H1 CTCF GM12864 ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 447 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12864Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1459, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12864 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12801Ctcf GM12801 CTCF H1 CTCF GM12801 ChipSeq ENCODE July 2009 Freeze 2009-06-30 2010-03-29 393 Stam UW Hotspot-v5.0 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12801Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12801 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k36me3 6990 H3K36me3 H2 H3K36me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 444 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k36me3 6990 H3K36me3 H1 H3K36me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 444 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k27me3 6990 H3K27me3 H2 H3K27me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 427 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k27me3 6990 H3K27me3 H1 H3K27me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 427 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k4me3 06990 H3K4me3 H2 H3K4me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 417 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm06990H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k4me3 06990 H3K4me3 H1 H3K4me3 GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 417 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm06990H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm06990Ctcf GM06990 CTCF H2 CTCF GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 392 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm06990Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm06990Ctcf GM06990 CTCF H1 CTCF GM06990 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2009-06-29 2010-03-29 392 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm06990Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM06990 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Caco2H3k36me3 Cco2 H3K36me3 H2 H3K36me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 426 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Caco2H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Caco2H3k36me3 Cco2 H3K36me3 H1 H3K36me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 426 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Caco2H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Caco2H3k27me3 Cco2 H3K27me3 H2 H3K27me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 425 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Caco2H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Caco2H3k27me3 Cco2 H3K27me3 H1 H3K27me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 425 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Caco2H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Caco2H3k4me3 Cco2 H3K4me3 H2 H3K4me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 407 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Caco2H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Caco2H3k4me3 Caco2 H3K4me3 H1 H3K4me3 Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 407 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Caco2H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Caco2Ctcf Caco2 CTCF H2 CTCF Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 404 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Caco2Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Caco2Ctcf Caco2 CTCF H1 CTCF Caco-2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 404 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Caco2Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. colorectal adenocarcinoma. (PMID: 1939345) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in Caco-2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2BjH3k36me3 BJ H3K36me3 H2 H3K36me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 443 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2BjH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep1BjH3k36me3 BJ H3K36me3 H1 H3K36me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 443 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1BjH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep2BjH3k27me3 BJ H3K27me3 H2 H3K27me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 424 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2BjH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep1BjH3k27me3 BJ H3K27me3 H1 H3K27me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 424 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1BjH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep2BjH3k4me3 BJ H3K4me3 H2 H3K4me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 416 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2BjH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep1BjH3k4me3 BJ H3K4me3 H1 H3K4me3 BJ ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 416 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1BjH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep2BjtertCtcf BJ CTCF H2 CTCF BJ ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-19 403 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2BjtertCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep1BjtertCtcf BJ CTCF H1 CTCF BJ ChipSeq ENCODE Sep 2009 Freeze 2009-09-20 2010-06-20 403 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1BjtertCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skin fibroblast, "The line was established from skin taken from normal foreskin." - ATCC. (PMID: 9916803) Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in BJ cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HuvecH3k36me3 HUVEC H3K36me3 H2 H3K36me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 431 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HuvecH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HuvecH3k36me3 HUVEC H3K36me3 H1 H3K36me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 431 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HuvecH3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HuvecH3k27me3 HUVEC H3K27me3 H2 H3K27me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-12 411 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HuvecH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HuvecH3k27me3 HUVEC H3K27me3 H1 H3K27me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2009-09-29 2010-06-29 411 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HuvecH3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HuvecH3k4me3 HUVEC H3K4me3 H2 H3K4me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 412 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HuvecH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HuvecH3k4me3 HUVEC H3K4me3 H1 H3K4me3 HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 412 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HuvecH3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep2HuvecCtcf HUVEC CTCF H2 CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 410 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2HuvecCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1HuvecCtcf HUVEC CTCF H1 CTCF HUVEC ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-09-29 2010-06-29 410 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1HuvecCtcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HUVEC cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k36me3 HpG2 H3K36me3 H1 H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 446 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Hepg2H3k27me3 HpG2 H3K27me3 H2 H3K27me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 433 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Hepg2H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k27me3 HpG2 H3K27me3 H1 H3K27me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 433 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Hepg2H3k4me3 HepG2 H3K4me3 H2 H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 413 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Hepg2H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k4me3 HepG2 H3K4me3 H1 H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 413 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hepg2H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Hepg2Ctcf HepG2 CTCF H2 CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2009-07-02 2010-04-02 401 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Hepg2Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Hepg2Ctcf HepG2 CTCF H1 CTCF HepG2 ChipSeq ENCODE July 2009 Freeze 2009-07-02 2010-04-02 401 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Hepg2Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HepG2 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Helas3H3k36me3 HLS3 H3K36me3 H2 H3K36me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 432 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Helas3H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Helas3H3k36me3 HLS3 H3K36me3 H1 H3K36me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 432 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Helas3H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Helas3H3k27me3 HLS3 H3K27me3 H2 H3K27me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 442 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Helas3H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Helas3H3k27me3 HLS3 H3K27me3 H1 H3K27me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 442 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Helas3H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Helas3H3k4me3 HLaS3 H3K4me3 H2 H3K4me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 423 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Helas3H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Helas3H3k4me3 HLaS3 H3K4me3 H1 H3K4me3 HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2010-07-12 423 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Helas3H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Helas3Ctcf HeLaS3 CTCF H2 CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-21 398 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Helas3Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Helas3Ctcf HeLaS3 CTCF H1 CTCF HeLa-S3 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 398 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Helas3Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. cervical carcinoma Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in HeLa-S3 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2K562H3k36me3 K562 H3K36me3 H2 H3K36me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 435 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2K562H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1K562H3k36me3 K562 H3K36me3 H1 H3K36me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 435 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1K562H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2K562H3k27me3 K562 H3K27me3 H2 H3K27me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 434 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2K562H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1K562H3k27me3 K562 H3K27me3 H1 H3K27me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 434 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1K562H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1K562H3k4me3 K562 H3K4me3 H1 H3K4me3 K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 400 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1K562H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2K562Ctcf K562 CTCF H2 CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 399 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2K562Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1K562Ctcf K562 CTCF H1 CTCF K562 ChipSeq ENCODE Sep 2009 Freeze 2009-10-13 2009-06-30 2010-03-29 399 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1K562Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in K562 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k36me3 M128 H3K36me3 H2 H3K36me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 445 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K36me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k36me3 M128 H3K36me3 H1 H3K36me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-21 445 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k36me3 Hotspots Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K36me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k27me3 M128 H3K27me3 H2 H3K27me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-20 2010-07-20 428 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K27me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k27me3 M128 H3K27me3 H1 H3K27me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2010-07-19 428 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k27me3 Hotspots Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K27me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k4me3 GM128 H3K4me3 H2 H3K4me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-12 2010-07-11 395 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12878H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (H3K4me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k4me3 GM128 H3K4me3 H1 H3K4me3 GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 395 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12878H3k4me3 Hotspots Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (H3K4me3 in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep2Gm12878Ctcf GM12878 CTCF H2 CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-09-21 2010-06-20 394 Stam UW Hotspot-v5.1 2 exp wgEncodeUwChIPSeqHotspotsRep2Gm12878Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 2nd (CTCF in GM12878 cells) Regulation wgEncodeUwChIPSeqHotspotsRep1Gm12878Ctcf GM12878 CTCF H1 CTCF GM12878 ChipSeq ENCODE Sep 2009 Freeze 2009-10-19 2009-06-30 2010-03-29 394 Stam UW Hotspot-v5.1 1 exp wgEncodeUwChIPSeqHotspotsRep1Gm12878Ctcf Hotspots CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Stamatoyannopoulous Stamatoyannopoulous - University of Washington ChIP-seq affinity zones identified using the HotSpot algorithm ENCODE UW Histone ChIP Hotspots - 1st (CTCF in GM12878 cells) Regulation vegaGeneComposite Vega Genes Vega Annotations Genes and Gene Predictions Description and Methods This track shows gene annotations from the Vertebrate Genome Annotation (Vega) database. Annotations are divided into two subtracks from the Vega Human Genome Annotation project: Vega Protein Coding Annotations Vega Annotated Pseudogenes and Immunoglobulin Segments The following information is an excerpt from the Vertebrate Genome Annotation home page: "The Vega database is designed to be a central repository for high-quality, frequently updated manual annotation of different vertebrate finished genome sequence. Vega attempts to present consistent high-quality curation of the published chromosome sequences. Finished genomic sequence is analysed on a clone-by-clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions (GENSCAN, Fgenes). The annotation is based on supporting evidence only." "In addition, comparative analysis using vertebrate datasets such as the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores (Evolutionary Conserved Regions) are used for novel gene discovery." Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. Transcript type (and other details) may be found by clicking on the transcript identifier which forms the outside link to the Vega transcript details page. Further information on the gene and transcript classification may be found here. Credits Thanks to Steve Trevanion at the Wellcome Trust Sanger Institute for providing the GTF and FASTA files for the Vega annotations. Vega acknowledgements and publications are listed here. vegaPseudoGene Vega Pseudogenes Vega Annotated Pseudogenes and Immunoglobulin Segments Genes and Gene Predictions Description and Methods This track shows gene annotations from the Vertebrate Genome Annotation (Vega) database. Annotations are divided into two subtracks from the Vega Human Genome Annotation project: Vega Protein Coding Annotations Vega Annotated Pseudogenes and Immunoglobulin Segments The following information is an excerpt from the Vertebrate Genome Annotation home page: "The Vega database is designed to be a central repository for high-quality, frequently updated manual annotation of different vertebrate finished genome sequence. Vega attempts to present consistent high-quality curation of the published chromosome sequences. Finished genomic sequence is analysed on a clone-by-clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions (GENSCAN, Fgenes). The annotation is based on supporting evidence only." "In addition, comparative analysis using vertebrate datasets such as the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores (Evolutionary Conserved Regions) are used for novel gene discovery." Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. Transcript type (and other details) may be found by clicking on the transcript identifier which forms the outside link to the Vega transcript details page. Further information on the gene and transcript classification may be found here. Credits Thanks to Steve Trevanion at the Wellcome Trust Sanger Institute for providing the GTF and FASTA files for the Vega annotations. Vega acknowledgements and publications are listed here. vegaGene Vega Protein Genes Vega Protein-Coding Annotations Genes and Gene Predictions Description and Methods This track shows gene annotations from the Vertebrate Genome Annotation (Vega) database. Annotations are divided into two subtracks from the Vega Human Genome Annotation project: Vega Protein Coding Annotations Vega Annotated Pseudogenes and Immunoglobulin Segments The following information is an excerpt from the Vertebrate Genome Annotation home page: "The Vega database is designed to be a central repository for high-quality, frequently updated manual annotation of different vertebrate finished genome sequence. Vega attempts to present consistent high-quality curation of the published chromosome sequences. Finished genomic sequence is analysed on a clone-by-clone basis using a combination of similarity searches against DNA and protein databases as well as a series of ab initio gene predictions (GENSCAN, Fgenes). The annotation is based on supporting evidence only." "In addition, comparative analysis using vertebrate datasets such as the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores (Evolutionary Conserved Regions) are used for novel gene discovery." Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. Transcript type (and other details) may be found by clicking on the transcript identifier which forms the outside link to the Vega transcript details page. Further information on the gene and transcript classification may be found here. Credits Thanks to Steve Trevanion at the Wellcome Trust Sanger Institute for providing the GTF and FASTA files for the Vega annotations. Vega acknowledgements and publications are listed here. encodeUViennaRnaz Vienna RNAz University of Vienna RNA secondary structure predicted by RNAz Pilot ENCODE Regions and Genes Description This track displays regions containing putative functional RNA secondary structures as predicted by RNAz on the basis of thermodynamic stability and evolutionary conservation. Methods RNAz evaluates multiple sequence alignments for unusually stable and conserved RNA secondary structures, two typical characteristics for functional RNA structures that can be found in noncoding RNAs or cis-acting regulatory elements of mRNAs. The RNAz algorithm works as follows: First a consensus secondary structure is predicted using the RNAalifold approach (Hofacker et al., 2002), which is an extension of classical minimum free energy folding algorithms for aligned sequences. The significance of a predicted consensus structure is evaluated by calculating a structure conservation index, which is the ratio of unconstrained folding energies relative to the folding energies under the constraint that all aligned sequences are forced to fold into a common structure. Thermodynamical stability is evaluated by calculating a normalized z-score of the sequences in the alignment. The z-score indicates whether the given sequences are more stable than random sequences of the same length and base composition. Based on these two features, structure conservation index and z-score, an alignment is classified as structural RNA or "other" using a support vector machine classification algorithm (Washietl et al., 2005; Washietl et al. , 2007). This track shows the result of a RNAz screen of 28-way TBA/MULTIZ alignments. Alignments were sliced in overlapping windows of 120 nt in size and with a step size of 40 nt. Sequences with more than 25% gaps with respect to the human sequence were discarded. Only alignments with more than four sequences, a minimum size of 50 columns and at most 1% repeat masked letters were considered. RNAz can only handle alignments with up to six sequences. From alignments with more than six sequences we chose a subset of six. For subset selection, we used a greedy algorithm and iteratively selected sequences optimizing the set for a mean pairwise identity of around 80%. In cases of alignments with more than 10 sequences we sampled three different of such subsets. The windows were finally scored with RNAz version 0.1.1 in the forward and reverse complement direction. Overlapping hits with at least one sampled alignment with RNAz score > 0.5 were combined to a single genomic region. The track shows regions with at least one window in the cluster with an average RNAz score of all samples > 0.5 and at least one hit with RNAz score > 0.9. More details may be found in Washietl et al., 2007. Credits The RNAz program and browser track were developed by Stefan Washietl, Ivo Hofacker (Institute for Theoretical Chemistry, Univ. of Vienna) and Peter F. Stadler (Bioinformatics group, Department of Computer Science, Univ. of Leipzig). References Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 2002 Jun 21;319(5):1059-66. Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA. 2005 Feb 15;102(7):2454-59. Washietl S, Pedersen JS, Korbel JO, Fried C, Gruber AR, Hackermuller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, et al. Structured RNAs in the ENCODE Selected Regions of the Human Genome. Genome Res. 2007 Jun;17(6):852-64. wgEncodeYaleChIPseq Yale TFBS ENCODE Transcription Factor Binding Sites by ChIP-seq from Yale/UC-Davis/Harvard Regulation Description This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Included for each cell type is the input signal, which represents the control condition where no antibody targeting was performed. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding (Peaks). The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. ENCODE tracks typically contain one or more of the following views: Peaks Regions of signal enrichment based on processed data (usually normalized data from pooled replicates). ENCODE Peaks tables contain fields for statistical significance, including FDR (qValue). SignalDensity graph (wiggle) of signal enrichment based on processed data. Methods Cells were grown according to the approved ENCODE cell culture protocols. Further preparations were similar to those previously published (Euskirchen et al., 2007) with the exceptions that the cells were unstimulated and sodium orthovanadate was omitted from the buffers. For details on the chromatin immunoprecipitation protocol used, see Euskirchen et al. (2007) and Rozowsky et al. (2009). DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome. For each 1 Mb segment of each chromosome a peak height threshold was determined by requiring a false discovery rate <= 0.05 when comparing the number of peaks above threshold as compared the number obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value <= 0.05 are considered to be significantly enriched compared to the input DNA control. Expression data generated as confirmation of the TFBS data can be found in the Yale Poly-A --> Yale Poly-A tracks (coming soon). Release Notes Update to Release 4 (Feb 2012): the GM12878/NFKB (IgG-rab) experiments and files have been revoked because the incorrect raw data files were used for generation of the processed data. This is Release 4 (June 2011) of this track, which includes 2 additional experiments and 2 experiments, K562/NF-YA and K562/NF-YB that were present in earlier releases have been removed. A number of previously released datasets have been replaced by updated versions. The affected database tables and files include 'V3' in the name, and metadata is marked with "submittedDataVersion=V3", followed by the specific reason. The specific reason is: Includes previously missing sequence data. Previous versions of files are available for download from the FTP site Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University; Peggy Farnham at UC Davis; and Kevin Struhl at Harvard. Contact: the Gerstein lab. References Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 2007 Jun;17(6):898-909. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007 Aug;4(8):651-7. Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009 Jan;27(1):66-75. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeYaleChIPseqViewSignal Signal ENCODE Transcription Factor Binding Sites by ChIP-seq from Yale/UC-Davis/Harvard Regulation wgEncodeYaleChIPseqSignalK562Ifng6hInput K562/Ig6 Inp Sig Input K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 658 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng6hInput IFNg6h Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqSignalK562Stat1Ifng6h K562 STAT1 Sig STAT1 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 761 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Stat1Ifng6h IFNg6h Signal transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT1 in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng6hPol2 K562/Ig6 Pol2 Sg Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 662 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng6hPol2 IFNg6h Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng6hCmyc K562/Ig6 cMyc Sg c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 670 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng6hCmyc IFNg6h Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng6hCjun K562/Ig6 cJun Sg c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 668 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng6hCjun IFNg6h Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng30Input K562/Ig3 Inp Sig Input K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 657 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng30Input IFNg30 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqSignalK562Stat1Ifng30 K562 STAT1 Sig STAT1 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 760 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Stat1Ifng30 IFNg30 Signal transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT1 in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng30Pol2 K562 Pol2 Sig Pol2 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 704 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng30Pol2 IFNg30 Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Pol2 in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifng30Cjun K562/Ig30 cJun S c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-11 2010-03-11 673 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifng30Cjun IFNg30 Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hInput K562/Ia6 Inp Sig Input K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 656 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hInput IFNa6h Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hStat2 K562/Ia6 STAT2 S STAT2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 666 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hStat2 IFNa6h Signal peptide mapping at c-terminus of Human STAT2 p-113 (C-20) X leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT2 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hStat1 K562/Ia6 STAT1 S STAT1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 664 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hStat1 IFNa6h Signal transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT1 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hPol2 K562/Ia6 Pol2 Sg Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 661 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hPol2 IFNa6h Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hCmyc K562/Ia6 cMyc Sg c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 669 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hCmyc IFNa6h Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna6hCjun K562/Ia6 cJun Sg c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 667 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna6hCjun IFNa6h Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna30Input K562/Ia3 Inp Sig Input K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 655 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna30Input IFNa30 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna30Stat2 K562/Ia3 STAT2 S STAT2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 665 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna30Stat2 IFNa30 Signal peptide mapping at c-terminus of Human STAT2 p-113 (C-20) X leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT2 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna30Stat1 K562/Ia3 STAT1 S STAT1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 663 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna30Stat1 IFNa30 Signal transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT1 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna30Pol2 K562/Ia3 Pol2 Sg Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 660 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna30Pol2 IFNa30 Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqSignalK562Ifna30Cmyc K562/Ia3 cMyc Sg c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 659 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Ifna30Cmyc IFNa30 Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqRel2SignalHelas3ifngInput HeLa/Ig30 Inp Sg Input HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-03-20 2008-10-31 2009-07-31 611 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHelas3ifngInput IFNg30 Signal cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HeLa/IFNg30 cells) Regulation wgEncodeYaleChIPseqSignalHelas3ifngStat1V2 HeLa/Ig3 STAT1 S STAT1 HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-07-21 2008-10-31 2009-07-31 614 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3ifngStat1V2 IFNg30 Signal transcription factor, activated by interferon signalling cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (STAT1 in HeLa/IFNg30 cells) Regulation wgEncodeYaleChIPseqRel2SignalHepg2Input HepG2 Input Sig Input HepG2 std ChipSeq ENCODE July 2009 Freeze 2009-03-20 2009-02-27 2009-11-27 640 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHepg2Input pravastatin Signal hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HepG2 cells) Regulation wgEncodeYaleChIPseqSignalHepg2Srebp2V2 HepG2 SREBP2 Sig SREBP2 HepG2 std ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-27 2009-11-27 643 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Srebp2V2 pravastatin Signal Sterol regulatory element binding transcription factor 2 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SREBP2 in HepG2 cells) Regulation wgEncodeYaleChIPseqSignalHepg2Srebp1aV2 HepG2p SREBP1 Sg SREBP1 HepG2 std ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-27 2009-11-27 642 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Srebp1aV2 pravastatin Signal Sterol regulatory element binding transcription factor 1 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SREBP1 in HepG2/pravastatin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Pol2V2 HepG2 Pol2 Sig Pol2 HepG2 std ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-27 2009-11-27 641 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Pol2V2 pravastatin Signal RNA Polymerase II hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HepG2 cells) Regulation wgEncodeYaleChIPseqSignalHepg2ControlInsln HepG2i Ctrl Sig Input HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 756 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2ControlInsln insulin Signal hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University DMEM with 0.5% BSA supplemented with 100 nM insulin and 10 uM 22-hydroxycholesterol for 6 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Control in HepG2/insolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Srebp1aInsln HepG2i SREBP1 Sg SREBP1 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 759 Snyder Stanford Data paired with HepG2_Insulin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Srebp1aInsln insulin Signal Sterol regulatory element binding transcription factor 1 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University DMEM with 0.5% BSA supplemented with 100 nM insulin and 10 uM 22-hydroxycholesterol for 6 h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SREBP1 in HepG2/insolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2ControlForskln HepG2f Ctrl Sig Input HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 755 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2ControlForskln forskolin Signal hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Control in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Pol2Forskln HepG2f Pol2 Sig Pol2 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 758 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Pol2Forskln forskolin Signal RNA Polymerase II hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Pgc1aForskln HepG2f PGC1A Sig PGC1A HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 757 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Pgc1aForskln forskolin Signal The protein encoded by this gene is a transcriptional coactivator that regulates the genes involved in energy metabolism. This protein interacts with PPARgamma, which permits the interaction of this protein with multiple transcription factors. This protein can interact with, and regulate the activities of, cAMP response element binding protein (CREB) and nuclear respiratory factors (NRFs). hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (PGC1A in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Hsf1Forskln HepG2f HSF1 Sig HSF1 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 754 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Hsf1Forskln forskolin Signal Epitope corresponding to amino acids 219-529 of heat shock transcription factor 1 (HSF1) of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (HSF1 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Hnf4aForskln HepG2f HNF4A Sig HNF4A HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 753 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Hnf4aForskln forskolin Signal Epitope mapping at the C-terminus of Rab 11 of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (HNF4A in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2Grp20Forskln HepG2f GRp20 Sig GRp20 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 752 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2Grp20Forskln forskolin Signal Epitope mapping at the C-terminus of GR alpha of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (GRp20 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2ErraForskln HepG2f ERRA Sig ERRA HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 751 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2ErraForskln forskolin Signal Epitope corresponding to amino acids 81-160 mapping near the N-terminus of ERRalpha of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (ERRA in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalHepg2CebpbForskln HepG2f CEBPB Sig CEBPB HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 750 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2CebpbForskln forskolin Signal Epitope mapping at the C-terminus of C/EBP-beta of rat origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (CEBPB in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqSignalNt2d1Input NT2D1 Input Sig Input NT2-D1 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 654 Snyder USC input PeakSeq1.0 wgEncodeYaleChIPseqSignalNt2d1Input None Signal malignant pluripotent embryonal carcinoma (NTera-2), "The NTERA-2 cl.D1 cell line is a pluripotent human testicular embryonal carcinoma cell line derived by cloning the NTERA-2 cell line." - ATCC. (PMID: 6694356) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in NT2-D1 cells) Regulation wgEncodeYaleChIPseqSignalNt2d1Yy1 NT2D1 YY1 Sig YY1 NT2-D1 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-05 2010-03-05 653 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalNt2d1Yy1 None Signal YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. malignant pluripotent embryonal carcinoma (NTera-2), "The NTERA-2 cl.D1 cell line is a pluripotent human testicular embryonal carcinoma cell line derived by cloning the NTERA-2 cell line." - ATCC. (PMID: 6694356) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (YY1 in NT2-D1 cells) Regulation wgEncodeYaleChIPseqSignalNt2d1Suz12 NT2D1 SUZ12 Sig SUZ12 NT2-D1 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-05 2010-03-05 652 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalNt2d1Suz12 None Signal Suppressor of zeste 12 homolog, Polycomb group (PcG) protein, Component of the PRC2/EED-EZH2 complex malignant pluripotent embryonal carcinoma (NTera-2), "The NTERA-2 cl.D1 cell line is a pluripotent human testicular embryonal carcinoma cell line derived by cloning the NTERA-2 cell line." - ATCC. (PMID: 6694356) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SUZ12 in NT2-D1 cells) Regulation wgEncodeYaleChIPseqRel2SignalNb4Input NB4 Input Sig Input NB4 std ChipSeq ENCODE July 2009 Freeze 2009-03-20 2008-10-31 2009-07-31 617 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalNb4Input None Signal acute promyelocytic leukemia cell line. (PMID: 1995093) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in NB4 cells) Regulation wgEncodeYaleChIPseqRel2SignalNb4Pol2 NB4 Pol2 Sig Pol2 NB4 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-10-31 2009-07-31 618 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalNb4Pol2 None Signal RNA Polymerase II acute promyelocytic leukemia cell line. (PMID: 1995093) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in NB4 cells) Regulation wgEncodeYaleChIPseqSignalMcf7Input MCF7 Input Sig Input MCF-7 UCDavis ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 762 Snyder USC MCF7 cells stably expressed a tagged HA-E2F1 were fragmented using Bioruptor, crosslinks were reversed at 67*C overnight and Input was Qiagen purified, input PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalMcf7Input None Signal mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in MCF-7 cells) Regulation wgEncodeYaleChIPseqSignalMcf7Hae2f1 MCF7 HA-E2F1 Sig HA-E2F1 MCF-7 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-10-28 2010-07-28 693 Snyder USC MCF-7 cells stably expressed a tagged HA-E2F1 were fragmented using Bioruptor, precipitated with StaphA and an antibody to the HA tag, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalMcf7Hae2f1 None Signal The HA-E2F1 protein is a derivative of E2F1, a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionary conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This version of E2F1 includes an N terminal HA tag and a modified ER ligand binding domain to allow regulated translocation to the nucleus. mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (HA-E2F1 in MCF-7 cells) Regulation wgEncodeYaleChIPseqRel2SignalHek293Input HEK293 Input Sig Input HEK293 std ChipSeq ENCODE July 2009 Freeze 2009-03-20 2009-02-25 2009-11-25 631 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHek293Input None Signal embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HEK293 cells) Regulation wgEncodeYaleChIPseqRel2SignalHek293Pol2 HEK293 Pol2 Sig Pol2 HEK293 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-25 2009-11-25 632 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHek293Pol2 None Signal RNA Polymerase II embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HEK293 cells) Regulation wgEncodeYaleChIPseqRel2SignalHct116Input HCT116 Input Sig Input HCT-116 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-03-20 2009-02-25 2009-11-25 627 Snyder USC input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHct116Input None Signal colorectal carcinoma (PMID: 7214343) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HCT-116 cells) Regulation wgEncodeYaleChIPseqRel2SignalHct116Tcf4 HCT116 TCF7L2 Sg TCF7L2 HCT-116 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-25 2009-11-25 629 Snyder USC Fragmented using Bioruptor, precipitated with protein G magnetic beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqRel2SignalHct116Tcf4 None Signal TCF7L2 (formerly known as TCF4) is a member of the high mobility group (HMG) DNA binding protein family of transcription factors which consists of the following: Lymphoid enhancer factor 1 (LEF1), T Cell Factor 1 (TCF1), TCF3 and TCF4. Note: there is an official TCF-4 http://www.genecards.org/cgi-bin/carddisp.pl?gene=TCF4 colorectal carcinoma (PMID: 7214343) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TCF7L2 in HCT-116 cells) Regulation wgEncodeYaleChIPseqSignalHct116Pol2 HCT116 Pol2 Sig Pol2 HCT-116 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-05-15 2010-02-15 651 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHct116Pol2 None Signal RNA Polymerase II colorectal carcinoma (PMID: 7214343) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HCT-116 cells) Regulation wgEncodeYaleChIPseqSignalGm19193InputIggrab GM19193 rIgG Sig Input GM19193 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 779 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19193InputIggrab None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM19193 cells) Regulation wgEncodeYaleChIPseqSignalGm19193MusiggMusigg GM19193 mIgG Sig Input GM19193 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 742 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19193MusiggMusigg None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM19193 cells) Regulation wgEncodeYaleChIPseqSignalGm19193Pol2Musigg GM19193 Pol2 Sig Pol2 GM19193 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 743 Snyder Stanford Data paired with GM19193_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19193Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM19193 cells) Regulation wgEncodeYaleChIPseqSignalGm19193NfkbIggrab GM19193 NFKB Sig NFKB GM19193 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 718 Snyder Stanford Data paired with GM19193_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19193NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM19193 cells) Regulation wgEncodeYaleChIPseqSignalGm19099InputIggrab GM19099 rIgG Sig Input GM19099 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 778 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19099InputIggrab None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM19099 cells) Regulation wgEncodeYaleChIPseqSignalGm19099MusiggMusigg GM19099 mIgG Sig Input GM19099 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 740 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19099MusiggMusigg None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM19099 cells) Regulation wgEncodeYaleChIPseqSignalGm19099Pol2Musigg GM19099 Pol2 Sig Pol2 GM19099 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 741 Snyder Stanford Data paired with GM19099_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19099Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM19099 cells) Regulation wgEncodeYaleChIPseqSignalGm19099NfkbIggrab GM19099 NFKB Sig NFKB GM19099 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 717 Snyder Stanford Data paired with GM19099_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm19099NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM19099 cells) Regulation wgEncodeYaleChIPseqSignalGm18951InputIggrab GM18951 rIgG Sig Input GM18951 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 777 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18951InputIggrab None Signal lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM18951 cells) Regulation wgEncodeYaleChIPseqSignalGm18951MusiggMusigg GM18951 mIgG Sig Input GM18951 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 737 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18951MusiggMusigg None Signal lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM18951 cells) Regulation wgEncodeYaleChIPseqSignalGm18951Pol2Musigg GM18951 Pol2 Sig Pol2 GM18951 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 739 Snyder Stanford Paired with GM18951_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18951Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM18951 cells) Regulation wgEncodeYaleChIPseqSignalGm18951NfkbIggrab GM18951 NFKB Sig NFKB GM18951 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 738 Snyder Stanford Data paired with GM18951_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18951NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM18951 cells) Regulation wgEncodeYaleChIPseqSignalGm18526InputIggrab GM18526 rIgG Sig Input GM18526 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 776 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18526InputIggrab None Signal lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM18526 cells) Regulation wgEncodeYaleChIPseqSignalGm18526MusiggMusigg GM18526 mIgG Sig Input GM18526 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 732 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18526MusiggMusigg None Signal lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input/IgG-mus in GM18526 cells) Regulation wgEncodeYaleChIPseqSignalGm18526Pol2Musigg GM18526 Pol2 Sig Pol2 GM18526 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 733 Snyder Stanford Paired with GM18526_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18526Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM18526 cells) Regulation wgEncodeYaleChIPseqSignalGm18526NfkbIggrab GM18526 NFKB Sig NFKB GM18526 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 716 Snyder Stanford Data paired with GM18526_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18526NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM18526 cells) Regulation wgEncodeYaleChIPseqSignalGm18505InputIggrab GM18505 rIgG Sig Input GM18505 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 775 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18505InputIggrab None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM18505 cells) Regulation wgEncodeYaleChIPseqSignalGm18505MusiggMusigg GM18505 mIgG Sig Input GM18505 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 730 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18505MusiggMusigg None Signal lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input/IgG-mus in GM18505 cells) Regulation wgEncodeYaleChIPseqSignalGm18505Pol2Musigg GM18505 Pol2 Sig Pol2 GM18505 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 731 Snyder Stanford Data paired with GM18505_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18505Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM18505 cells) Regulation wgEncodeYaleChIPseqSignalGm18505NfkbIggrab GM18505 NFKB Sig NFKB GM18505 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 715 Snyder Stanford Data paired with GM18505_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm18505NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM18505 cells) Regulation wgEncodeYaleChIPseqSignalGm15510InputIggrab GM15510 rIgG Sig Input GM15510 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 774 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm15510InputIggrab None Signal lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM15510 cells) Regulation wgEncodeYaleChIPseqSignalGm15510MusiggMusigg GM15510 mIgG Sig Input GM15510 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 713 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm15510MusiggMusigg None Signal lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input/IgG-mus in GM15510 cells) Regulation wgEncodeYaleChIPseqSignalGm15510Pol2Musigg GM15510 Pol2 Sig Pol2 GM15510 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 714 Snyder Stanford Data paired with GM15510_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm15510Pol2Musigg None Signal RNA Polymerase II lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM15510 cells) Regulation wgEncodeYaleChIPseqSignalGm15510NfkbIggrab GM15510 NFKB Sig NFKB GM15510 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 736 Snyder Stanford Data paired with GM15510_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm15510NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM15510 cells) Regulation wgEncodeYaleChIPseqSignalGm12892InputIggrab GM12892 rIgG Sig Input GM12892 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 773 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12892InputIggrab None Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM12892 cells) Regulation wgEncodeYaleChIPseqSignalGm12892MusiggMusigg GM12892 mIgG Sig Input GM12892 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 711 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12892MusiggMusigg None Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM12892 cells) Regulation wgEncodeYaleChIPseqSignalGm12892Pol2Musigg GM12892 Pol2 Sig Pol2 GM12892 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 729 Snyder Stanford Data paired with GM12892_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12892Pol2Musigg None Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM12892 cells) Regulation wgEncodeYaleChIPseqSignalGm12892NfkbIggrab GM12892 NFKB Sig NFKB GM12892 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 712 Snyder Stanford Data paired with GM12892_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12892NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM12892 cells) Regulation wgEncodeYaleChIPseqSignalGm12891InputIggrab GM12891 rIgG Sig Input GM12891 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 772 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12891InputIggrab None Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM12891 cells) Regulation wgEncodeYaleChIPseqSignalGm12891MusiggMusigg GM12891 mIgG Sig Input GM12891 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 709 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12891MusiggMusigg None Signal B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM12891 cells) Regulation wgEncodeYaleChIPseqSignalGm12891Pol2Musigg GM12891 Pol2 Sig Pol2 GM12891 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 710 Snyder Stanford Data paired with GM12891_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12891Pol2Musigg None Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM12891 cells) Regulation wgEncodeYaleChIPseqSignalGm12891NfkbIggrab GM12891 NFKB Sig NFKB GM12891 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 735 Snyder Stanford Data paired with GM12891_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12891NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM12891 cells) Regulation wgEncodeYaleChIPseqSignalGm10847InputIggrab GM10847 rIgG Sig Input GM10847 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 770 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm10847InputIggrab None Signal lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM10847 cells) Regulation wgEncodeYaleChIPseqSignalGm10847MusiggMusigg GM10847 mIgG Sig Input GM10847 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 728 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm10847MusiggMusigg None Signal lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM10847 cells) Regulation wgEncodeYaleChIPseqSignalGm10847Pol2Musigg GM10847 Pol2 Sig Pol2 GM10847 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 705 Snyder Stanford Data paired with GM10847_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm10847Pol2Musigg None Signal RNA Polymerase II lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM10847 cells) Regulation wgEncodeYaleChIPseqSignalGm10847NfkbIggrab GM10847 NFKB Sig NFKB GM10847 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 734 Snyder Stanford Data paired with GM10847_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm10847NfkbIggrab None Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB/IgG-rab in GM10847 cells) Regulation wgEncodeYaleChIPseqSignalHepg2bInput HepG2b Input Sig Input HepG2 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-12 2010-03-11 674 Snyder USC input PeakSeq1.0 wgEncodeYaleChIPseqSignalHepg2bInput None Signal hepatocellular carcinoma Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HepG2b cells) Regulation wgEncodeYaleChIPseqSignalHepg2bTr4 HepG2b TR4 Sig TR4 HepG2 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-12 2010-03-11 675 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHepg2bTr4 None Signal (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. hepatocellular carcinoma Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TR4 in HepG2b cells) Regulation wgEncodeYaleChIPseqRel2SignalHelas3Input HeLa Input Sig Input HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-03-20 2008-10-31 2009-07-31 612 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHelas3Input None Signal cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3MnaseV2 HeLa MNase Sig Input HeLa-S3 MNase ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-25 2009-11-25 635 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3MnaseV2 None Signal cervical carcinoma Input signal from MNase digested DNA. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (MNase Control in HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3Rabigg HeLa rIgG Sig Input HeLa-S3 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 744 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Rabigg None Signal cervical carcinoma Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input/IgG-rab in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3MouseiggV2 HeLa mIgG Sig Input HeLa-S3 IgG-mus ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-25 2009-11-25 633 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3MouseiggV2 None Signal cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input/IgG-mus in HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3LargefragmentV2 HeLa LgFrag Sig Input HeLa-S3 Large_Fragment ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-25 2009-11-25 634 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3LargefragmentV2 None Signal cervical carcinoma Control signal from sonication into large fragments of DNA (350-800 bp). Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Frag Control in HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3NakeddnaV2 HeLa DNA Sig Input HeLa-S3 Naked_DNA ChipSeq ENCODE July 2009 Freeze 2009-07-21 2009-02-25 2009-11-25 636 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3NakeddnaV2 None Signal cervical carcinoma Control signal from Naked DNA. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Naked DNA, HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3Tr4 HeLa TR4 Sig TR4 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-24 687 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3Tr4 None Signal (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TR4 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Tfiiic HeLa TFIIIC Sig TFIIIC-110 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 747 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Tfiiic None Signal TFIIIC-110 is a subunit of the RNA Polymerase III transcription factor TFIIIC. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TFIIIC-110 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Rpc155 HeLa RPC155 Sig RPC155 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 766 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Rpc155 None Signal polymerase (RNA) III (DNA directed) polypeptide A, 155kDa cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (RPC155 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqRel2SignalHelas3Pol2 HeLa Pol2 Sig Pol2 HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-10-31 2009-07-31 613 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalHelas3Pol2 None Signal RNA Polymerase II cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HeLa cells) Regulation wgEncodeYaleChIPseqSignalHelas3Nrf1Musigg HeLa Nrf1 Sig Nrf1 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 723 Snyder Stanford Data set paired with HeLa-S3_MouseIgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Nrf1Musigg None Signal NRF1 is the mammalian homolog to the erect wing (ewg) Drosophila protein that is required for proper development of the central nervous system and indirect flight muscles. In mammals NRF1 functions as a transcription factor that activates the expression of the EIF2S1 (EIF-alpha) gene. This protein links the transcriptional modulation of key metabolic genes to cellular growth and development, and has been implicated in the control of nuclear genes required for respiration, heme biosynthesis and mitochondrialDNA transcription and replication. NRF1 forms a homodimer and binds DNA as a dimer. NRF1 shows a nuclear localization and is expressed widely in embryonic, fetal and adult tissues. Phosphorylation of NRF1 enhances DNA binding. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Nrf1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Max HeLa3 Max Sig Max HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 646 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Max None Signal The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Max in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3JundRabigg HeLa JunD Sig JunD HeLa-S3 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 745 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3JundRabigg None Signal The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) cervical carcinoma Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (JunD/IgG-rab in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Ini1Musigg HeLa Ini1 Sig Ini1 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 722 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqSignalHelas3Ini1Musigg None Signal Ini1 (BAF47, SMARCB1) is a ubiquitous 47 kD component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Ini1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3E2f6 HeLa E2F6 Sig E2F6 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-10-21 2010-07-20 692 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3E2f6 None Signal This gene encodes a member of the E2F transcription factor protein family. E2F family members play a crucial role in control of the cell cycle and of the action of tumor suppressor proteins. They are also a target of the transforming proteins of small DNA tumor viruses. Many E2F proteins contain several evolutionarily conserved domains: a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. The encoded protein of this gene is atypical because it lacks the transactivation and tumor suppressor protein association domains. It contains a modular suppression domain and is an inhibitor of E2F-dependent transcription. The protein is part of a multimeric protein complex that contains a histone methyltransferase and the transcription factors Mga and Max. Multiple transcript variants have been reported for this gene, but it has not been clearly demonstrated that they encode valid isoforms (RefSeq). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (E2F6 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3E2f4 HeLa E2F4 Sig E2F4 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 689 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3E2f4 None Signal mapping at the C-terminus of E2F4 of human origin cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (E2F4 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Ha HeLa HA-E2F1 Sig HA-E2F1 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 688 Snyder USC HeLa cells stably expressed a tagged HA-E2F1 were fragmented using a Bioruptor, precipitated with StaphA and an antibody to the HA tag, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3Ha None Signal The HA-E2F1 protein is a derivative of E2F1, a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionary conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This version of E2F1 includes an N terminal HA tag and a modified ER ligand binding domain to allow regulated translocation to the nucleus. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (HA-E2F1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3E2f1 HeLa E2F1 Sig E2F1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 699 Snyder USC HeLa cells stably expressed a tagged HA-E2F1 were fragmented using Bioruptor, precipitated with StaphA and an antibody to E2F1, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3E2f1 None Signal E2F1, the original member of the E2F family of transcription factors, was identified as a cellular protein with DNA binding activity associated with the adenovirus E2 gene promoter. E2F1 is cell cycle regulated with very low levels in early G1, then increasing levels as cells move from G1 to S, and highest levels of protein at the G1/S boundary, which is consistent with its role in S-phase entry. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (E2F1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Cmyc HeLa c-Myc Sig c-Myc HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 648 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Cmyc None Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3CjunRabigg HeLa c-Jun Sig c-Jun HeLa-S3 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 746 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3CjunRabigg None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. cervical carcinoma Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun/IgG-rab in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Cfos HeLa c-Fos Sig c-Fos HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 647 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Cfos None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Fos in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Brg1Iggmus HeLa Brg1 Sig Brg1 HeLa-S3 IgG-mus ChipSeq ENCODE June 2010 Freeze 2010-03-08 2010-12-08 781 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Brg1Iggmus None Signal Brg1 (SMARCA4) is an ATPase subunit of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Brg1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Brf2 HeLa BRF2 Sig BRF2 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 765 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Brf2 None Signal Brf2 is a component of an alternate form of the RNA Polymerase III transcription factor TFIIIB. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BRF2 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Brf1 HeLa BRF1 Sig BRF1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 764 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Brf1 None Signal 'B-related factor 1', subunit of RNA polymerase III transcription initiation factor IIIB cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BRF1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Bdp1 HeLa BDP1 Sig BDP1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 763 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHelas3Bdp1 None Signal 'B double-prime 1', subunit of RNA polymerase III transcription initiation factor IIIB cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BDP1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Baf170Musigg HeLa BAF170 Sig BAF170 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 721 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqSignalHelas3Baf170Musigg None Signal BAF170 (SMARCC2, Brg1-Associated Factor, 170 kD) is a ubiquitous component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BAF170/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Baf155Musigg HeLa BAF155 Sig BAF155 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 720 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqSignalHelas3Baf155Musigg None Signal BAF155 (SMARCC1, Brg1-Associated Factor, 155 kD) is a ubiquitous component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BAF155/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Ap2gamma HeLa AP-2g Sig AP-2gamma HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-23 686 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3Ap2gamma None Signal Sequence-specific DNA-binding protein that interacts with inducible viral and cellular enhancer elements to regulate transcription of selected genes. AP-2 factors bind to the consensus sequence 5'-GCCNNNGGC-3' and activate genes involved in a large spectrum of important biological functions including proper eye, face, body wall, limb and neural tube development. They also suppress a number of genes including MCAM/MUC18, C/EBP alpha and MYC. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (AP-2gamma in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHelas3Ap2alpha HeLa AP-2a Sig AP-2alpha HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-23 685 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalHelas3Ap2alpha None Signal Sequence-specific DNA-binding protein that interacts with inducible viral and cellular enhancer elements to regulate transcription of selected genes. AP-2 factors bind to the consensus sequence 5' -GCCNNNGGC-3' and activate genes involved in a large spectrum of important biological functions including proper eye, face, body wall, limb and neural tube development. They also suppress a number of genes including MCAM/MUC18, C/EBP alpha and MYC. AP-2alpha is the only AP-2 protein required for early morphogenesis of the lens vesicle (by similarity). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (AP-2alpha in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqSignalHuvecInputStd HUVEC Input Sig Input HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-03-08 2010-12-08 780 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalHuvecInputStd None Signal umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input in HUVEC cells) Regulation wgEncodeYaleChIPseqSignalHuvecPol2 HUVEC Pol2 Sig Pol2 HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-09 2010-10-08 702 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHuvecPol2 None Signal RNA Polymerase II umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in HUVEC cells) Regulation wgEncodeYaleChIPseqSignalHuvecMax HUVEC Max Sig Max HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-16 2010-10-15 768 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHuvecMax None Signal The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Max in HUVEC cells) Regulation wgEncodeYaleChIPseqSignalHuvecCjun HUVEC c-Jun Sig c-Jun HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-12 2010-10-11 719 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalHuvecCjun None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in HUVEC cells) Regulation wgEncodeYaleChIPseqSignalK562bInput K562b Input Sig Input K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-11 2010-03-11 672 Snyder USC input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562bInput None Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562b cells) Regulation wgEncodeYaleChIPseqSignalRep1K562InputV2 K562 Input Sig Input K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-29 2008-10-31 2009-07-31 615 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqSignalRep1K562InputV2 None Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562MusiggMusigg K562 mIgG Sig Input K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 726 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalK562MusiggMusigg None Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562bZnf274 K562b ZNF274 Sig ZNF274 K562 UCDavis ChipSeq ENCODE Jan 2010 Freeze 2010-01-05 2010-10-04 696 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bZnf274 None Signal ZNF274 is a zinc finger protein containing five C2H2-type zinc finger domains, two Kruppel-associated box A (KRABA) domains, and a leucine-rich SCAN domain. The encoded protein has been suggested to be a transcriptional repressor. It localizes predominantly to the nucleolus. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (ZNF274 in K562b cells) Regulation wgEncodeYaleChIPseqRel2SignalK562bZnf263 K562 ZNF263 Sig ZNF263 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-25 2009-11-25 630 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqRel2SignalK562bZnf263 None Signal ZNF263 (NP_005732, 201 a.a. ~ 299 a.a) partial recombinant protein with GST tag. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (ZNF263 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562bYy1 K562b YY1 Sig YY1 K562 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-09-03 2010-06-03 684 Snyder USC Fragmented using Bioruptor, precipitated with protein G magnetic beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bYy1 None Signal YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (YY1 in K562b cells) Regulation wgEncodeYaleChIPseqSignalK562Xrcc4 K562 XRCC4 Sig XRCC4 K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 650 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Xrcc4 None Signal Recognizes the XRCC4 protein (X ray cross complementation protein). XRCC4 is a ubiquitous protein reported to have a role in DNA double-stranded break repair and in V(D)J recombination. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (XRCC4 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562bTr4 K562b TR4 Sig TR4 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 682 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bTr4 None Signal (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TR4 in K562b cells) Regulation wgEncodeYaleChIPseqSignalK562Tfiiic K562 TFIIIC Sig TFIIIC-110 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 748 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Tfiiic None Signal TFIIIC-110 is a subunit of the RNA Polymerase III transcription factor TFIIIC. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TFIIIC-110 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Sirt6 K562 SIRT6 Sig SIRT6 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 681 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqSignalK562Sirt6 None Signal A synthetic peptide made to an internal region of the human SIRT6 protein leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SIRT6 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562bSetdb1Mnase K562/M SETDB1 Sg SETDB1 K562 UCDavis ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 769 Snyder USC Fragmented using MNase and used proteinA agarose beads for ChIPs, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bSetdb1Mnase MNaseD Signal SET domain, bifurcated 1, the SET domain is a highly conserved, approximately 150-amino acid motif implicated in the modulation of chromatin structure leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Fragmented using micrococcal nuclease digestion Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SETDB1 in K562b/MNase cells) Regulation wgEncodeYaleChIPseqSignalK562bSetdb1 K562b SETDB1 Sig SETDB1 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-15 2010-03-15 677 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bSetdb1 None Signal SET domain, bifurcated 1, the SET domain is a highly conserved, approximately 150-amino acid motif implicated in the modulation of chromatin structure leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (SETDB1 in K562b cells) Regulation wgEncodeYaleChIPseqSignalK562Rpc155 K562 RPC155 Sig RPC155 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 680 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqSignalK562Rpc155 None Signal polymerase (RNA) III (DNA directed) polypeptide A, 155kDa leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (RPC155 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Rad21 K562 Rad21 Sig Rad21 K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 649 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Rad21 None Signal Synthetic peptide (Human) conjugated to KLH - which represents a portion of human Rad21 encoded within exon 14 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Rad21 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Pol3 K562 Pol3 Sig Pol3 K562 std ChipSeq ENCODE Jan 2010 Freeze 2009-11-12 2010-08-12 694 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Pol3 None Signal RNA Polymerase III leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol3 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Pol2Musigg K562 Pol2 Sig Pol2 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 727 Snyder Stanford Data paired with IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Pol2Musigg None Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Pol2 K562 Pol2 Sig Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-10-31 2009-07-31 616 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Pol2 None Signal RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Nfe2 K562 NF-E2 Sig NF-E2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-01-09 2009-10-09 624 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Nfe2 None Signal Nuclear factor, erythroid-derived 2 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NF-E2 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Nelfe K562 NELFe Sig NELFe K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 701 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Nelfe None Signal NELF-E (RDBP) is a part of the negative elongation factor complex which binds to RNAPII to suppress elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NELFe in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Max K562 Max Sig Max K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-25 2009-11-25 637 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Max None Signal The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Max in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Jund K562 JunD Sig JunD K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-27 2009-11-27 644 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Jund None Signal The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (JunD in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Ini1Musigg K562 Ini1 Sig Ini1 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 725 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqSignalK562Ini1Musigg None Signal Ini1 (BAF47, SMARCB1) is a ubiquitous 47 kD component of the SWI/SNF chromatin-remodeling complex. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Ini1/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Gtf2b K562 GTF2B Sig GTF2B K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 703 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Gtf2b None Signal DNA- binding general transcription factor leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (GTF2B in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562bGata2 K562b GATA-2 Sig GATA-2 K562 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-09-02 2010-06-02 683 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bGata2 None Signal GATA binding protein 2 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (GATA-2 in K562b cells) Regulation wgEncodeYaleChIPseqRel2SignalK562bGata1 K562b GATA1 Sig GATA-1 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-27 2009-11-27 638 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqRel2SignalK562bGata1 None Signal GATA-1 is a transcriptional activator which probably serves as a general switch factor for erythroid development. It binds to DNA sites with the consensus sequence [AT]GATA[AG} within regulatory regions of globin genes and of other genes expressed in erythroid cells. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (GATA-1 in K562b cells) Regulation wgEncodeYaleChIPseqSignalK562bE2f6 K562b E2F6 Sig E2F6 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-15 2010-03-15 676 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bE2f6 None Signal This gene encodes a member of the E2F transcription factor protein family. E2F family members play a crucial role in control of the cell cycle and of the action of tumor suppressor proteins. They are also a target of the transforming proteins of small DNA tumor viruses. Many E2F proteins contain several evolutionarily conserved domains: a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. The encoded protein of this gene is atypical because it lacks the transactivation and tumor suppressor protein association domains. It contains a modular suppression domain and is an inhibitor of E2F-dependent transcription. The protein is part of a multimeric protein complex that contains a histone methyltransferase and the transcription factors Mga and Max. Multiple transcript variants have been reported for this gene, but it has not been clearly demonstrated that they encode valid isoforms (RefSeq). leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (E2F6 in K562b cells) Regulation wgEncodeYaleChIPseqSignalK562bE2f4 K562b E2F4 Sig E2F4 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-11 2010-03-11 671 Snyder USC Fragmented using Bioruptor, precipitated with StaphA (replicate 1) or protein A agarose beads (replicate 2), exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalK562bE2f4 None Signal mapping at the C-terminus of E2F4 of human origin leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (E2F4 in K562b cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Cmyc K562 c-Myc Sig c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-11-21 2009-08-21 621 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Cmyc None Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Cjun K562 c-Jun Sig c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-11-21 2009-08-21 620 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Cjun None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalK562Cfos K562 c-Fos Sig c-Fos K562 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2008-11-21 2009-08-21 619 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalK562Cfos None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Fos in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Brg1Musigg K562 Brg1 Sig Brg1 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 724 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqSignalK562Brg1Musigg None Signal Brg1 (SMARCA4) is an ATPase subunit of the SWI/SNF chromatin-remodeling complex. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Brg1/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Brf2 K562 BRF2 Sig BRF2 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 767 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Brf2 None Signal Brf2 is a component of an alternate form of the RNA Polymerase III transcription factor TFIIIB. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BRF2 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Brf1 K562 BRF1 Sig BRF1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 679 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqSignalK562Brf1 None Signal 'B-related factor 1', subunit of RNA polymerase III transcription initiation factor IIIB leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BRF1 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Bdp1 K562 BDP1 Sig BDP1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 678 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqSignalK562Bdp1 None Signal 'B double-prime 1', subunit of RNA polymerase III transcription initiation factor IIIB leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (BDP1 in K562 cells) Regulation wgEncodeYaleChIPseqSignalK562Atf3 K562 ATF3 Sig ATF3 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 700 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalK562Atf3 None Signal Activating transcription factor 3. A bZIP transcription factor and member of the Ca2+/cAMP response element-binding (CREB) protein family. ATF3 is found to act both as an activator and repressor of transcription. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (ATF3 in K562 cells) Regulation wgEncodeYaleChIPseqRel2SignalGm12878Input GM12878 Inp Sig Input GM12878 std ChipSeq ENCODE July 2009 Freeze 2009-03-21 2009-02-24 2009-11-24 625 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalGm12878Input None Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Input in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878InputIggrab GM12878 rIgG Sig Input GM12878 IgG-rab ChipSeq ENCODE June 2010 Freeze 2010-03-05 2010-12-05 771 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878InputIggrab None Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-rab in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878MusiggMusigg GM12878 mIgG Sig Input GM12878 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 706 Snyder Stanford input PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878MusiggMusigg None Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Signal (Input/IgG-mus in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Zzz3 GM12878 ZZZ3 Sig ZZZ3 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 698 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878Zzz3 None Signal ZZZ3 contains one ZZ-type zinc finger domain. ZZ type finger domains are named because of their ability to bind two zinc ions. These domains contain 4-6 Cys residues that participate in zinc binding (plus additional Ser/His residues), including a Cys-X2-Cys motif found in other zinc finger domains. These zinc fingers are thought to be involved in protein-protein interactions -they are most likely involved in ligand binding or molecular scaffolding. The structure of the ZZ domain shows that it belongs to the family of cross-brace zinc finger motifs that include the PHD, RING, and FYVE domains. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (ZZZ3 in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Yy1 GM12878 YY1 Sig YY1 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 695 Snyder USC Fragmented using Bioruptor, precipitated with Staph A, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqSignalGm12878Yy1 None Signal YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (YY1 in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Tr4 GM12878 TR4 Sig TR4 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 697 Snyder USC Fragmented using Bioruptor, precipitated with StaphA PeakSeq 1.0 (fdr 0.001), samples were 71% and 72% matching in the top 40% overlap analysis. exp wgEncodeYaleChIPseqSignalGm12878Tr4 None Signal (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (TR4 in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Rad21Iggrab GM12878 Rad21 Sg Rad21 GM12878 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 749 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878Rad21Iggrab None Signal Synthetic peptide (Human) conjugated to KLH - which represents a portion of human Rad21 encoded within exon 14 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Rad21/IgG-rab in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Pol3 GM12878 Pol3 Sig Pol3 GM12878 std ChipSeq ENCODE July 2009 Freeze 2009-05-05 2010-02-04 645 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878Pol3 None Signal RNA Polymerase III B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol3 in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Pol2Musigg GM12878 Pol2 Sig Pol2 GM12878 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 708 Snyder Stanford Data paired with GM12878_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878Pol2Musigg None Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2/IgG-mus in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Pol2V3 GM12878 Pol2 Sig Pol2 GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2009-02-24 2009-11-24 626 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqSignalGm12878Pol2V3 None Signal RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Pol2 in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878NfkbTnfa GM12878 NFKB Sig NFKB GM12878 std ChipSeq ENCODE Sep 2009 Freeze 2009-10-01 2010-06-30 690 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqSignalGm12878NfkbTnfa TNFa Signal Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University human recombinant TNF-alpha from eBioscience [product# 14-8329-62] (Snyder) Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (NFKB in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878MaxV3 GM12878 Max Sig Max GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2009-01-09 2009-10-09 623 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqSignalGm12878MaxV3 None Signal The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (Max in GM12878 cells) Regulation wgEncodeYaleChIPseqRel2SignalGm12878Jund GM12878 JunD Sig JunD GM12878 std ChipSeq ENCODE July 2009 Freeze 2009-05-02 2009-02-27 2009-11-27 639 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqRel2SignalGm12878Jund None Signal The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (JunD in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Cmyc GM12878 c-Myc Sg c-Myc GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2011-01-28 783 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqSignalGm12878Cmyc None Signal transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Myc in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878Cjun GM12878 c-Jun Sg c-Jun GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2011-01-28 782 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqSignalGm12878Cjun None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Jun in GM12878 cells) Regulation wgEncodeYaleChIPseqSignalGm12878CfosV3 GM12878 c-Fos Sg c-Fos GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2008-11-25 2009-08-25 622 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqSignalGm12878CfosV3 None Signal Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Signal ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Signal (c-Fos in GM12878 cells) Regulation wgEncodeYaleChIPseqViewPeaks Peaks ENCODE Transcription Factor Binding Sites by ChIP-seq from Yale/UC-Davis/Harvard Regulation wgEncodeYaleChIPseqPeaksK562Stat1Ifng6h K562 STAT1 Pk STAT1 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 761 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Stat1Ifng6h IFNg6h Peaks transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT1 in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifng6hPol2 K562/Ig6 Pol2 Pk Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 662 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifng6hPol2 IFNg6h Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifng6hCmyc K562/Ig6 cMyc Pk c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 670 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifng6hCmyc IFNg6h Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifng6hCjun K562/Ig6 cJun Pk c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 668 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifng6hCjun IFNg6h Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in K562/IFNg6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Stat1Ifng30 K562 STAT1 Pk STAT1 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 760 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Stat1Ifng30 IFNg30 Peaks transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 30 minutes (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT1 in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifng30Pol2 K562 Pol2 Pk Pol2 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 704 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifng30Pol2 IFNg30 Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Interferon gamma treatment - 30 minutes (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-Seq Peaks (Pol2 in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifng30Cjun K562/Ig3 cJun Pk c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-11 2010-03-11 673 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifng30Cjun IFNg30 Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in K562/IFNg30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna6hStat2 K562/Ia6 STAT2 P STAT2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 666 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna6hStat2 IFNa6h Peaks peptide mapping at c-terminus of Human STAT2 p-113 (C-20) X leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT2 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna6hStat1 K562/Ia6 STAT1 P STAT1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 664 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna6hStat1 IFNa6h Peaks transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT1 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna6hPol2 K562/Ia6 Pol2 Pk Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 661 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna6hPol2 IFNa6h Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna6hCmyc K562/Ia6 cMyc Pk c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 669 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna6hCmyc IFNa6h Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna6hCjun K562/Ia6 cJun Pk c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 667 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna6hCjun IFNa6h Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon alpha treatment - 6 hours (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in K562/IFNa6h cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna30Stat2 K562/Ia3 STAT2 P STAT2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 665 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna30Stat2 IFNa30 Peaks peptide mapping at c-terminus of Human STAT2 p-113 (C-20) X leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT2 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna30Stat1 K562/Ia3 STAT1 P STAT1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 663 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna30Stat1 IFNa30 Peaks transcription factor, activated by interferon signalling leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT1 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna30Pol2 K562/Ia3 Pol2 Pk Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-09 2010-03-09 660 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna30Pol2 IFNa30 Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ifna30Cmyc K562/Ia3 cMyc Pk c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-08 2010-03-08 659 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Ifna30Cmyc IFNa30 Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University 30 m of Interferon alpha (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in K562/IFNa30 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3ifngStat1 HeLa/Ig3 STAT1 P STAT1 HeLa-S3 std ChipSeq ENCODE Nov 2008 Freeze 2008-10-31 2009-07-31 614 Snyder Yale chrom start +1 for this file, exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3ifngStat1 IFNg30 Peaks transcription factor, activated by interferon signalling cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Interferon gamma treatment - 30 minutes (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (STAT1 in HeLa/IFNg30 cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Srebp2 HepG2 SREBP2 Pk SREBP2 HepG2 std ChipSeq ENCODE Feb 2009 Freeze 2009-02-27 2009-11-27 643 Snyder Yale chrom start +1 for this file, exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Srebp2 pravastatin Peaks Sterol regulatory element binding transcription factor 2 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SREBP2 in HepG2 cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Srebp1a HepG2p SREBP1 Pk SREBP1 HepG2 std ChipSeq ENCODE Feb 2009 Freeze 2009-02-27 2009-11-27 642 Snyder Yale chrom start +1 for this file, exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Srebp1a pravastatin Peaks Sterol regulatory element binding transcription factor 1 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SREBP1 in HepG2/pravastatin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Pol2 HepG2p Pol2 Pk Pol2 HepG2 std ChipSeq ENCODE Feb 2009 Freeze 2009-02-27 2009-11-27 641 Snyder Yale chrom start +1 for this file, exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Pol2 pravastatin Peaks RNA Polymerase II hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University For sterol deprivation, cells were cultured with pravastatin (2 uM, Sigma) in DMEM with 0.5% BSA for 16 h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HepG2/pravastatin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Srebp1aInsln HepG2i SREBP1 Pk SREBP1 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 759 Snyder Stanford Data paired with HepG2_Insulin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Srebp1aInsln insulin Peaks Sterol regulatory element binding transcription factor 1 hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University DMEM with 0.5% BSA supplemented with 100 nM insulin and 10 uM 22-hydroxycholesterol for 6 h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SREBP1 in HepG2/insolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Pol2Forskln HepG2f Pol2 Pk Pol2 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 758 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Pol2Forskln forskolin Peaks RNA Polymerase II hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Pgc1aForskln HepG2f PGC1A Pk PGC1A HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 757 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Pgc1aForskln forskolin Peaks The protein encoded by this gene is a transcriptional coactivator that regulates the genes involved in energy metabolism. This protein interacts with PPARgamma, which permits the interaction of this protein with multiple transcription factors. This protein can interact with, and regulate the activities of, cAMP response element binding protein (CREB) and nuclear respiratory factors (NRFs). hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (PGC1A in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Hsf1Forskln HepG2f HSF1 Pk HSF1 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 754 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Hsf1Forskln forskolin Peaks Epitope corresponding to amino acids 219-529 of heat shock transcription factor 1 (HSF1) of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (HSF1 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Hnf4aForskln HepG2f HNF4A Pk HNF4A HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 753 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Hnf4aForskln forskolin Peaks Epitope mapping at the C-terminus of Rab 11 of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (HNF4A in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2Grp20Forskln HepG2f GRp20 Pk GRp20 HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 752 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2Grp20Forskln forskolin Peaks Epitope mapping at the C-terminus of GR alpha of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (GRp20 in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2ErraForskln HepG2f ERRA Pk ERRA HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 751 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2ErraForskln forskolin Peaks Epitope corresponding to amino acids 81-160 mapping near the N-terminus of ERRalpha of human origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (ERRA in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksHepg2CebpbForskln HepG2f CEBPB Pk CEBPB HepG2 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 750 Snyder Stanford Data paired with HepG2_Forskolin_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHepg2CebpbForskln forskolin Peaks Epitope mapping at the C-terminus of C/EBP-beta of rat origin hepatocellular carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University low-glucose DMEM with 0.5% BSA supplemented with 1uM forskolin and 1mM pyruvate for 6h. (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (CEBPB in HepG2/forskolin cells) Regulation wgEncodeYaleChIPseqPeaksNt2d1Yy1 NT2D1 YY1 Pk YY1 NT2-D1 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-05 2010-03-05 653 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksNt2d1Yy1 None Peaks YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. malignant pluripotent embryonal carcinoma (NTera-2), "The NTERA-2 cl.D1 cell line is a pluripotent human testicular embryonal carcinoma cell line derived by cloning the NTERA-2 cell line." - ATCC. (PMID: 6694356) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (YY1 in NT2-D1 cells) Regulation wgEncodeYaleChIPseqPeaksNt2d1Suz12 NT2D1 SUZ12 Pk SUZ12 NT2-D1 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-05 2010-03-05 652 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksNt2d1Suz12 None Peaks Suppressor of zeste 12 homolog, Polycomb group (PcG) protein, Component of the PRC2/EED-EZH2 complex malignant pluripotent embryonal carcinoma (NTera-2), "The NTERA-2 cl.D1 cell line is a pluripotent human testicular embryonal carcinoma cell line derived by cloning the NTERA-2 cell line." - ATCC. (PMID: 6694356) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SUZ12 in NT2-D1 cells) Regulation wgEncodeYaleChIPseqPeaksNb4Pol2V2 NB4 Pol2 Pk Pol2 NB4 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-10-31 2009-07-31 618 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksNb4Pol2V2 None Peaks RNA Polymerase II acute promyelocytic leukemia cell line. (PMID: 1995093) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in NB4 cells) Regulation wgEncodeYaleChIPseqPeaksMcf7Hae2f1 MCF7 HA-E2F1 Pk HA-E2F1 MCF-7 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-10-28 2010-07-28 693 Snyder USC MCF-7 cells stably expressed a tagged HA-E2F1 were fragmented using Bioruptor, precipitated with StaphA and an antibody to the HA tag, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksMcf7Hae2f1 None Peaks The HA-E2F1 protein is a derivative of E2F1, a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionary conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This version of E2F1 includes an N terminal HA tag and a modified ER ligand binding domain to allow regulated translocation to the nucleus. mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (HA-E2F1 in MCF-7 cells) Regulation wgEncodeYaleChIPseqPeaksHek293Pol2V2 HEK293 Pol2 Pk Pol2 HEK293 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-25 2009-11-25 632 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHek293Pol2V2 None Peaks RNA Polymerase II embryonic kidney, cells contain Adenovirus 5 DNA (PMID: 11967234) Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HEK293 cells) Regulation wgEncodeYaleChIPseqPeaksHct116Tcf4V2 HCT116 TCF7L2 Pk TCF7L2 HCT-116 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-25 2009-11-25 629 Snyder USC Fragmented using Bioruptor, precipitated with protein G magnetic beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHct116Tcf4V2 None Peaks TCF7L2 (formerly known as TCF4) is a member of the high mobility group (HMG) DNA binding protein family of transcription factors which consists of the following: Lymphoid enhancer factor 1 (LEF1), T Cell Factor 1 (TCF1), TCF3 and TCF4. Note: there is an official TCF-4 http://www.genecards.org/cgi-bin/carddisp.pl?gene=TCF4 colorectal carcinoma (PMID: 7214343) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TCF7L2 in HCT-116 cells) Regulation wgEncodeYaleChIPseqPeaksHct116Pol2 HCT116 Pol2 Pk Pol2 HCT-116 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-05-15 2010-02-15 651 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHct116Pol2 None Peaks RNA Polymerase II colorectal carcinoma (PMID: 7214343) Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HCT-116 cells) Regulation wgEncodeYaleChIPseqPeaksGm19193Pol2Musigg GM19193 Pol2 Pk Pol2 GM19193 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 743 Snyder Stanford Data paired with GM19193_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm19193Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM19193 cells) Regulation wgEncodeYaleChIPseqPeaksGm19193NfkbIggrab GM19193 NFKB Pk NFKB GM19193 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 718 Snyder Stanford Data paired with GM19193_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm19193NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM19193 cells) Regulation wgEncodeYaleChIPseqPeaksGm19099Pol2Musigg GM19099 Pol2 Pk Pol2 GM19099 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 741 Snyder Stanford Data paired with GM19099_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm19099Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM19099 cells) Regulation wgEncodeYaleChIPseqPeaksGm19099NfkbIggrab GM19099 NFKB Pk NFKB GM19099 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 717 Snyder Stanford Data paired with GM19099_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm19099NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM19099 cells) Regulation wgEncodeYaleChIPseqPeaksGm18951Pol2Musigg GM18951 Pol2 Pk Pol2 GM18951 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 739 Snyder Stanford Paired with GM18951_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18951Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM18951 cells) Regulation wgEncodeYaleChIPseqPeaksGm18951NfkbIggrab GM18951 NFKB Pk NFKB GM18951 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 738 Snyder Stanford Data paired with GM18951_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18951NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Japanese in Tokyo, Japan, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM18951 cells) Regulation wgEncodeYaleChIPseqPeaksGm18526Pol2Musigg GM18526 Pol2 Pk Pol2 GM18526 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 733 Snyder Stanford Paired with GM18526_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18526Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM18526 cells) Regulation wgEncodeYaleChIPseqPeaksGm18526NfkbIggrab GM18526 NFKB Pk NFKB GM18526 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 716 Snyder Stanford Data paired with GM18526_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18526NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Han Chinese in Beijing, China, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM18526 cells) Regulation wgEncodeYaleChIPseqPeaksGm18505Pol2Musigg GM18505 Pol2 Pk Pol2 GM18505 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 731 Snyder Stanford Scored against GM18505_IgG_Control. Data paired with GM18505_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18505Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM18505 cells) Regulation wgEncodeYaleChIPseqPeaksGm18505NfkbIggrab GM18505 NFKB Pk NFKB GM18505 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 715 Snyder Stanford Data paired with GM18505_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm18505NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM18505 cells) Regulation wgEncodeYaleChIPseqPeaksGm15510Pol2Musigg GM15510 Pol2 Pk Pol2 GM15510 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 714 Snyder Stanford Data paired with GM15510_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm15510Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM15510 cells) Regulation wgEncodeYaleChIPseqPeaksGm15510NfkbIggrab GM15510 NFKB Pk NFKB GM15510 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 736 Snyder Stanford Data paired with GM15510_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm15510NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid NIGMS Human Genetic Cell Repository, DNA Polymorphism Discovery Resource Collection, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM15510 cells) Regulation wgEncodeYaleChIPseqPeaksGm12892Pol2Musigg GM12892 Pol2 Pk Pol2 GM12892 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 729 Snyder Stanford Data paired with GM12892_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12892Pol2Musigg None Peaks RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM12892 cells) Regulation wgEncodeYaleChIPseqPeaksGm12892NfkbIggrab GM12892 NFKB Pk NFKB GM12892 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 712 Snyder Stanford Data paired with GM12892_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12892NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM12892 cells) Regulation wgEncodeYaleChIPseqPeaksGm12891Pol2Musigg GM12891 Pol2 Pk Pol2 GM12891 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 710 Snyder Stanford Data paired with GM12891_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12891Pol2Musigg None Peaks RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM12891 cells) Regulation wgEncodeYaleChIPseqPeaksGm12891NfkbIggrab GM12891 NFKB Pk NFKB GM12891 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 735 Snyder Stanford Data paired with GM12891_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12891NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM12891 cells) Regulation wgEncodeYaleChIPseqPeaksGm10847Pol2Musigg GM10847 Pol2 Pk Pol2 GM10847 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 705 Snyder Stanford Data paired with GM10847_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm10847Pol2Musigg None Peaks RNA Polymerase II lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM10847 cells) Regulation wgEncodeYaleChIPseqPeaksGm10847NfkbIggrab GM10847 NFKB Pk NFKB GM10847 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 734 Snyder Stanford Data paired with GM10847_IgG_Control exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm10847NfkbIggrab None Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 lymphoblastoid, International HapMap Project, CEPH/Utah, treatment: Epstein-Barr Virus transformed Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB/IgG-rab in GM10847 cells) Regulation wgEncodeYaleChIPseqPeaksHepg2bTr4 HepG2b TR4 Pk TR4 HepG2 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-12 2010-03-11 675 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHepg2bTr4 None Peaks (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. hepatocellular carcinoma Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TR4 in HepG2b cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Mouseigg HeLa IgG-mus Pk Input HeLa-S3 IgG-mus ChipSeq ENCODE Feb 2009 Freeze 2009-02-25 2009-11-25 633 Snyder Yale input PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Mouseigg None Peaks cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Input/IgG-mus in HeLa cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Largefragment HeLa LgFrag Pk Input HeLa-S3 Large_Fragment ChipSeq ENCODE Feb 2009 Freeze 2009-02-25 2009-11-25 634 Snyder Yale chrom start +1 for this file, input PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Largefragment None Peaks cervical carcinoma Control signal from sonication into large fragments of DNA (350-800 bp). Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Frag Control in HeLa cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Tr4 HeLa TR4 Pk TR4 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-24 687 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3Tr4 None Peaks (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TR4 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Tfiiic HeLa TFIIIC Pk TFIIIC-110 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 747 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Tfiiic None Peaks TFIIIC-110 is a subunit of the RNA Polymerase III transcription factor TFIIIC. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TFIIIC-110 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Rpc155 HeLa RPC155 Pk RPC155 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 766 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Rpc155 None Peaks polymerase (RNA) III (DNA directed) polypeptide A, 155kDa cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (RPC155 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Pol2V2 HeLa Pol2 Pk Pol2 HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-10-31 2009-07-31 613 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Pol2V2 None Peaks RNA Polymerase II cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Nrf1Musigg HeLa Nrf1 Pk Nrf1 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 723 Snyder Stanford Data set paired with HeLa-S3_MouseIgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Nrf1Musigg None Peaks NRF1 is the mammalian homolog to the erect wing (ewg) Drosophila protein that is required for proper development of the central nervous system and indirect flight muscles. In mammals NRF1 functions as a transcription factor that activates the expression of the EIF2S1 (EIF-alpha) gene. This protein links the transcriptional modulation of key metabolic genes to cellular growth and development, and has been implicated in the control of nuclear genes required for respiration, heme biosynthesis and mitochondrialDNA transcription and replication. NRF1 forms a homodimer and binds DNA as a dimer. NRF1 shows a nuclear localization and is expressed widely in embryonic, fetal and adult tissues. Phosphorylation of NRF1 enhances DNA binding. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Nrf1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Max HeLa Max Pk Max HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-05-05 2010-02-04 646 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Max None Peaks The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Max in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3JundRabigg HeLa JunD Pk JunD HeLa-S3 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 745 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3JundRabigg None Peaks The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) cervical carcinoma Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (JunD/IgG-rab in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Ini1Musigg HeLa Ini1 Pk Ini1 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 722 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqPeaksHelas3Ini1Musigg None Peaks Ini1 (BAF47, SMARCB1) is a ubiquitous 47 kD component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Ini1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3E2f6 HeLa E2F6 Pk E2F6 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-10-21 2010-07-20 692 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3E2f6 None Peaks This gene encodes a member of the E2F transcription factor protein family. E2F family members play a crucial role in control of the cell cycle and of the action of tumor suppressor proteins. They are also a target of the transforming proteins of small DNA tumor viruses. Many E2F proteins contain several evolutionarily conserved domains: a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. The encoded protein of this gene is atypical because it lacks the transactivation and tumor suppressor protein association domains. It contains a modular suppression domain and is an inhibitor of E2F-dependent transcription. The protein is part of a multimeric protein complex that contains a histone methyltransferase and the transcription factors Mga and Max. Multiple transcript variants have been reported for this gene, but it has not been clearly demonstrated that they encode valid isoforms (RefSeq). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (E2F6 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3E2f4 HeLa E2F4 Pk E2F4 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 689 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3E2f4 None Peaks mapping at the C-terminus of E2F4 of human origin cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (E2F4 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Ha HeLa HA-E2F1 Pk HA-E2F1 HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 688 Snyder USC HeLa cells stably expressed a tagged HA-E2F1 were fragmented using a Bioruptor, precipitated with StaphA and an antibody to the HA tag, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3Ha None Peaks The HA-E2F1 protein is a derivative of E2F1, a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionary conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This version of E2F1 includes an N terminal HA tag and a modified ER ligand binding domain to allow regulated translocation to the nucleus. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (HA-E2F1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3E2f1 HeLa E2F1 Pk E2F1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 699 Snyder USC HeLa cells stably expressed a tagged HA-E2F1 were fragmented using Bioruptor, precipitated with StaphA and an antibody to E2F1, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3E2f1 None Peaks E2F1, the original member of the E2F family of transcription factors, was identified as a cellular protein with DNA binding activity associated with the adenovirus E2 gene promoter. E2F1 is cell cycle regulated with very low levels in early G1, then increasing levels as cells move from G1 to S, and highest levels of protein at the G1/S boundary, which is consistent with its role in S-phase entry. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (E2F1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Cmyc HeLa c-Myc Pk c-Myc HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-05-05 2010-02-04 648 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Cmyc None Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3CjunRabigg HeLa c-Jun Pk c-Jun HeLa-S3 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 746 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3CjunRabigg None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. cervical carcinoma Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun/IgG-rab in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Cfos HeLa c-Fos Pk c-Fos HeLa-S3 std ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-05-05 2010-02-04 647 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Cfos None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Fos in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Brg1Iggmus HeLa Brg1 Pk Brg1 HeLa-S3 IgG-mus ChipSeq ENCODE June 2010 Freeze 2010-03-08 2010-12-08 781 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Brg1Iggmus None Peaks Brg1 (SMARCA4) is an ATPase subunit of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Brg1/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Brf2 HeLa BRF2 Pk BRF2 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 765 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Brf2 None Peaks Brf2 is a component of an alternate form of the RNA Polymerase III transcription factor TFIIIB. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BRF2 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Brf1 HeLa BRF1 Pk BRF1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 764 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Brf1 None Peaks 'B-related factor 1', subunit of RNA polymerase III transcription initiation factor IIIB cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BRF1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Bdp1 HeLa BDP1 Pk BDP1 HeLa-S3 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 763 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHelas3Bdp1 None Peaks 'B double-prime 1', subunit of RNA polymerase III transcription initiation factor IIIB cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BDP1 in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Baf170Musigg HeLa BAF170 Pk BAF170 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 721 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqPeaksHelas3Baf170Musigg None Peaks BAF170 (SMARCC2, Brg1-Associated Factor, 170 kD) is a ubiquitous component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BAF170/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Baf155Musigg HeLa BAF155 Pk BAF155 HeLa-S3 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 720 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqPeaksHelas3Baf155Musigg None Peaks BAF155 (SMARCC1, Brg1-Associated Factor, 155 kD) is a ubiquitous component of the SWI/SNF chromatin-remodeling complex. cervical carcinoma Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BAF155/IgG-mus in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Ap2gamma HeLa AP-2g Pk AP-2gamma HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-23 686 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3Ap2gamma None Peaks Sequence-specific DNA-binding protein that interacts with inducible viral and cellular enhancer elements to regulate transcription of selected genes. AP-2 factors bind to the consensus sequence 5'-GCCNNNGGC-3' and activate genes involved in a large spectrum of important biological functions including proper eye, face, body wall, limb and neural tube development. They also suppress a number of genes including MCAM/MUC18, C/EBP alpha and MYC. cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (AP-2gamma in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHelas3Ap2alpha HeLa AP-2a Pk AP-2alpha HeLa-S3 std ChipSeq ENCODE Sep 2009 Freeze 2009-09-24 2010-06-23 685 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksHelas3Ap2alpha None Peaks Sequence-specific DNA-binding protein that interacts with inducible viral and cellular enhancer elements to regulate transcription of selected genes. AP-2 factors bind to the consensus sequence 5' -GCCNNNGGC-3' and activate genes involved in a large spectrum of important biological functions including proper eye, face, body wall, limb and neural tube development. They also suppress a number of genes including MCAM/MUC18, C/EBP alpha and MYC. AP-2alpha is the only AP-2 protein required for early morphogenesis of the lens vesicle (by similarity). cervical carcinoma Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (AP-2alpha in HeLa-S3 cells) Regulation wgEncodeYaleChIPseqPeaksHuvecPol2 HUVEC Pol2 Pk Pol2 HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-09 2010-10-08 702 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHuvecPol2 None Peaks RNA Polymerase II umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in HUVEC cells) Regulation wgEncodeYaleChIPseqPeaksHuvecMax HUVEC Max Pk Max HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-16 2010-10-15 768 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHuvecMax None Peaks The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Max in HUVEC cells) Regulation wgEncodeYaleChIPseqPeaksHuvecCjun HUVEC c-Jun Pk c-Jun HUVEC std ChipSeq ENCODE June 2010 Freeze 2010-01-12 2010-10-11 719 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksHuvecCjun None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. umbilical vein endothelial cells Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in HUVEC cells) Regulation wgEncodeYaleChIPseqPeaksK562bZnf274 K562b ZNF274 Pk ZNF274 K562 UCDavis ChipSeq ENCODE Jan 2010 Freeze 2010-01-05 2010-10-04 696 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bZnf274 None Peaks ZNF274 is a zinc finger protein containing five C2H2-type zinc finger domains, two Kruppel-associated box A (KRABA) domains, and a leucine-rich SCAN domain. The encoded protein has been suggested to be a transcriptional repressor. It localizes predominantly to the nucleolus. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (ZNF274 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562bZnf263V2 K562b ZNF263 Pk ZNF263 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-25 2009-11-25 630 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bZnf263V2 None Peaks ZNF263 (NP_005732, 201 a.a. ~ 299 a.a) partial recombinant protein with GST tag. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (ZNF263 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562bYy1 K562b YY1 Pk YY1 K562 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-09-03 2010-06-03 684 Snyder USC Fragmented using Bioruptor, precipitated with protein G magnetic beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bYy1 None Peaks YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (YY1 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562Xrcc4 K562 XRCC4 Pk XRCC4 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-05-05 2010-02-04 650 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Xrcc4 None Peaks Recognizes the XRCC4 protein (X ray cross complementation protein). XRCC4 is a ubiquitous protein reported to have a role in DNA double-stranded break repair and in V(D)J recombination. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (XRCC4 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562bTr4 K562b TR4 Pk TR4 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-22 2010-03-22 682 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bTr4 None Peaks (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TR4 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562Tfiiic K562 TFIIIC Pk TFIIIC-110 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 748 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Tfiiic None Peaks TFIIIC-110 is a subunit of the RNA Polymerase III transcription factor TFIIIC. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TFIIIC-110 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Sirt6 K562 SIRT6 Pk SIRT6 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-06-22 2010-03-22 681 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqPeaksK562Sirt6 None Peaks A synthetic peptide made to an internal region of the human SIRT6 protein leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SIRT6 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562bSetdb1Mnase K562/M SETDB1 Pk SETDB1 K562 UCDavis ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 769 Snyder USC Fragmented using MNase and used proteinA agarose beads for ChIPs, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bSetdb1Mnase MNaseD Peaks SET domain, bifurcated 1, the SET domain is a highly conserved, approximately 150-amino acid motif implicated in the modulation of chromatin structure leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Fragmented using micrococcal nuclease digestion Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SETDB1 in K562b/MNase cells) Regulation wgEncodeYaleChIPseqPeaksK562bSetdb1 K562b SETDB1 Pk SETDB1 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-15 2010-03-15 677 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bSetdb1 None Peaks SET domain, bifurcated 1, the SET domain is a highly conserved, approximately 150-amino acid motif implicated in the modulation of chromatin structure leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (SETDB1 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562Rpc155 K562 RPC155 Pk RPC155 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-06-22 2010-03-22 680 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqPeaksK562Rpc155 None Peaks polymerase (RNA) III (DNA directed) polypeptide A, 155kDa leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (RPC155 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Rad21 K562 Rad21 Pk Rad21 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-05-05 2010-02-04 649 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Rad21 None Peaks Synthetic peptide (Human) conjugated to KLH - which represents a portion of human Rad21 encoded within exon 14 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Rad21 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Pol3 K562 Pol3 Pk Pol3 K562 std ChipSeq ENCODE Jan 2010 Freeze 2009-11-12 2010-08-12 694 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Pol3 None Peaks RNA Polymerase III leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol3 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Pol2Musigg K562 Pol2 Pk Pol2 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 727 Snyder Stanford Data paired with IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Pol2Musigg None Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Pol2V2 K562 Pol2 Pk Pol2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-10-31 2009-07-31 616 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Pol2V2 None Peaks RNA Polymerase II leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Nfe2V2 K562 NF-E2 Pk NF-E2 K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-01-09 2009-10-09 624 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Nfe2V2 None Peaks Nuclear factor, erythroid-derived 2 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NF-E2 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Nelfe K562 NELFe Pk NELFe K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 701 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Nelfe None Peaks NELF-E (RDBP) is a part of the negative elongation factor complex which binds to RNAPII to suppress elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NELFe in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562MaxV2 K562 Max Pk Max K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-25 2009-11-25 637 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562MaxV2 None Peaks The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Max in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562JundV2 K562 JunD Pk JunD K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-27 2009-11-27 644 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562JundV2 None Peaks The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (JunD in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Ini1Musigg K562 Ini1 Pk Ini1 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 725 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqPeaksK562Ini1Musigg None Peaks Ini1 (BAF47, SMARCB1) is a ubiquitous 47 kD component of the SWI/SNF chromatin-remodeling complex. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Ini1/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Gtf2b K562 GTF2B Pk GTF2B K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-11 2010-10-11 703 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Gtf2b None Peaks DNA- binding general transcription factor leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (GTF2B in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562bGata2 K562b GATA-2 Pk GATA-2 K562 UCDavis ChipSeq ENCODE Sep 2009 Freeze 2009-09-02 2010-06-02 683 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bGata2 None Peaks GATA binding protein 2 leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (GATA-2 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562bGata1V2 K562b GATA-1 Pk GATA-1 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-27 2009-11-27 638 Snyder USC Fragmented using Bioruptor, precipitated with protein A agarose beads, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bGata1V2 None Peaks GATA-1 is a transcriptional activator which probably serves as a general switch factor for erythroid development. It binds to DNA sites with the consensus sequence [AT]GATA[AG} within regulatory regions of globin genes and of other genes expressed in erythroid cells. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (GATA-1 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562bE2f6 K562b E2F6 Pk E2F6 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-06-15 2010-03-15 676 Snyder USC Fragmented using Bioruptor, precipitated with StaphA exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bE2f6 None Peaks This gene encodes a member of the E2F transcription factor protein family. E2F family members play a crucial role in control of the cell cycle and of the action of tumor suppressor proteins. They are also a target of the transforming proteins of small DNA tumor viruses. Many E2F proteins contain several evolutionarily conserved domains: a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. The encoded protein of this gene is atypical because it lacks the transactivation and tumor suppressor protein association domains. It contains a modular suppression domain and is an inhibitor of E2F-dependent transcription. The protein is part of a multimeric protein complex that contains a histone methyltransferase and the transcription factors Mga and Max. Multiple transcript variants have been reported for this gene, but it has not been clearly demonstrated that they encode valid isoforms (RefSeq). leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (E2F6 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562bE2f4 K562b E2F4 Pk E2F4 K562 UCDavis ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-06-11 2010-03-11 671 Snyder USC Fragmented using Bioruptor, precipitated with StaphA (replicate 1) or protein A agarose beads (replicate 2), exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksK562bE2f4 None Peaks mapping at the C-terminus of E2F4 of human origin leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input library was prepared at UC Davis. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (E2F4 in K562b cells) Regulation wgEncodeYaleChIPseqPeaksK562CmycV2 K562 c-Myc Pk c-Myc K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-11-21 2009-08-21 621 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562CmycV2 None Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562CjunV2 K562 c-Jun Pk c-Jun K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-11-21 2009-08-21 620 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562CjunV2 None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562CfosV2 K562 c-Fos Pk c-Fos K562 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2008-11-21 2009-08-21 619 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562CfosV2 None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Fos in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Brg1Musigg K562 Brg1 Pk Brg1 K562 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 724 Snyder Stanford Data paired with IgG_Control. exp wgEncodeYaleChIPseqPeaksK562Brg1Musigg None Peaks Brg1 (SMARCA4) is an ATPase subunit of the SWI/SNF chromatin-remodeling complex. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Brg1/IgG-mus in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Brf2 K562 BRF2 Pk BRF2 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-16 2010-10-15 767 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Brf2 None Peaks Brf2 is a component of an alternate form of the RNA Polymerase III transcription factor TFIIIB. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BRF2 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Brf1 K562 BRF1 Pk BRF1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-06-22 2010-03-22 679 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqPeaksK562Brf1 None Peaks 'B-related factor 1', subunit of RNA polymerase III transcription initiation factor IIIB leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BRF1 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Bdp1 K562 BDP1 Pk BDP1 K562 std ChipSeq ENCODE July 2009 Freeze 2009-07-01 2009-06-22 2010-03-22 678 Snyder Harvard fragmented with both a probe sonicator and a Misonix sonicator, and precipitated with protein A sepharose beads exp PeakSeq1.0 (fdr 0.01) wgEncodeYaleChIPseqPeaksK562Bdp1 None Peaks 'B double-prime 1', subunit of RNA polymerase III transcription initiation factor IIIB leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (BDP1 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksK562Atf3 K562 ATF3 Pk ATF3 K562 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-09 2010-10-09 700 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksK562Atf3 None Peaks Activating transcription factor 3. A bZIP transcription factor and member of the Ca2+/cAMP response element-binding (CREB) protein family. ATF3 is found to act both as an activator and repressor of transcription. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (ATF3 in K562 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Zzz3 GM12878 ZZZ3 Pk ZZZ3 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-08 2010-10-08 698 Snyder Harvard exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878Zzz3 None Peaks ZZZ3 contains one ZZ-type zinc finger domain. ZZ type finger domains are named because of their ability to bind two zinc ions. These domains contain 4-6 Cys residues that participate in zinc binding (plus additional Ser/His residues), including a Cys-X2-Cys motif found in other zinc finger domains. These zinc fingers are thought to be involved in protein-protein interactions -they are most likely involved in ligand binding or molecular scaffolding. The structure of the ZZ domain shows that it belongs to the family of cross-brace zinc finger motifs that include the PHD, RING, and FYVE domains. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Struhl - Harvard University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (ZZZ3 in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Yy1 GM12878 YY1 Pk YY1 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2009-12-20 2010-09-19 695 Snyder USC Fragmented using Bioruptor, precipitated with Staph A, exp PeakSeq1.0 (fdr 0.001) wgEncodeYaleChIPseqPeaksGm12878Yy1 None Peaks YIN YANG 1 transcription factor belongs to the GLI-Kruppel class of zinc finger proteins. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (YY1 in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Tr4 GM12878 TR4 Pk TR4 GM12878 std ChipSeq ENCODE Jan 2010 Freeze 2010-01-07 2010-10-07 697 Snyder USC Fragmented using Bioruptor, precipitated with StaphA PeakSeq 1.0 (fdr 0.001), samples were 71% and 72% matching in the top 40% overlap analysis. exp wgEncodeYaleChIPseqPeaksGm12878Tr4 None Peaks (Also: NR2C2) Members of the nuclear hormone receptor family, such as NR2C2, act as ligand-activated transcription factors. The proteins have an N-terminal transactivation domain, a central DNA-binding domain with 2 zinc fingers, and a ligand-binding domain at the C terminus. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Farnham - University of Southern California Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (TR4 in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Rad21Iggrab GM12878 Rad21 Pk Rad21 GM12878 IgG-rab ChipSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-15 749 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878Rad21Iggrab None Peaks Synthetic peptide (Human) conjugated to KLH - which represents a portion of human Rad21 encoded within exon 14 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Rabbit IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Rad21/IgG-rab in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Pol3 GM12878 Pol3 Pk Pol3 GM12878 std ChipSeq ENCODE July 2009 Freeze 2009-07-20 2009-05-05 2010-02-04 645 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878Pol3 None Peaks RNA Polymerase III B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol3 in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Pol2Musigg GM12878 Pol2 Pk Pol2 GM12878 IgG-mus ChipSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-12 708 Snyder Stanford Data paired with GM12878_IgG_Control. exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878Pol2Musigg None Peaks RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Input signal from Normal Mouse IgG ChIP-seq. Chromatin IP Sequencing Snyder Snyder - Stanford University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2/IgG-mus in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Pol2V3 GM12878 Pol2 Pk Pol2 GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2009-02-24 2009-11-24 626 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqPeaksGm12878Pol2V3 None Peaks RNA Polymerase II B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Pol2 in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878NfkbTnfa GM12878 NFKB Pk NFKB GM12878 std ChipSeq ENCODE Sep 2009 Freeze 2009-10-01 2010-06-30 690 Snyder Stanford exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878NfkbTnfa TNFa Peaks Epitope mapping at the C-terminus of NF-kappa-B p65 of human origin, recommended for detection of NFKB p65 B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Stanford University human recombinant TNF-alpha from eBioscience [product# 14-8329-62] (Snyder) Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (NFKB in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878MaxV3 GM12878 Max Pk Max GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2009-01-09 2009-10-09 623 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqPeaksGm12878MaxV3 None Peaks The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors. It is able to form homodimers and heterodimers with other family members, which include Mad, Mxi1 and Myc. Myc is an oncoprotein implicated in cell proliferation, differentiation and apoptosis. The homodimers and heterodimers compete for a common DNA target site (the E box) and rearrangement among these dimer forms provides a complex system of transcriptional regulation. Multiple alternatively spliced transcript variants have been described for this gene but the full-length nature for some of them is unknown (RefSeq). B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (Max in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878JundV2 GM12878 JunD Pk JunD GM12878 std ChipSeq ENCODE July 2009 Freeze 2009-06-30 2009-02-27 2009-11-27 639 Snyder Yale exp PeakSeq1.0 wgEncodeYaleChIPseqPeaksGm12878JundV2 None Peaks The protein encoded by this intronless gene is a member of the JUN family, and a functional component of the AP1 transcription factor complex. It has been proposed to protect cells from p53-dependent senescence and apoptosis. Alternate translation initiation site usage results in the production of different isoforms. (provided by RefSeq) B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (JunD in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Cmyc GM12878 c-Myc Pk c-Myc GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2011-01-28 783 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqPeaksGm12878Cmyc None Peaks transcription factor; c-Myc-encoded proteins function in cell proliferation,differentiation and neoplastic disease B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Myc in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878Cjun GM12878 c-Jun Pk c-Jun GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2011-01-28 782 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqPeaksGm12878Cjun None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Jun in GM12878 cells) Regulation wgEncodeYaleChIPseqPeaksGm12878CfosV3 GM12878 c-Fos Pk c-Fos GM12878 std ChipSeq ENCODE Mar 2012 Freeze 2010-04-28 2008-11-25 2009-08-25 622 Snyder Yale PeakSeq1.0 exp wgEncodeYaleChIPseqPeaksGm12878CfosV3 None Peaks Heterodimer of Fos and Jun constitute transcription factor AP1. Proto-oncogene c-Jun is a leucine-zipper. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Standard input signal for most experiments. Chromatin IP Sequencing Snyder Snyder - Yale University Regions of enriched signal in experiment ENCODE TFBS, Yale/UCD/Harvard ChIP-seq Peaks (c-Fos in GM12878 cells) Regulation chainNetPanTro2 Chimp Chain/Net Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Mar. 2006 (CGSC 2.1/panTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Mar. 2006 (CGSC 2.1/panTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPanTro2Viewnet Net Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments Comparative Genomics netPanTro2 Chimp Net Chimp (Mar. 2006 (CGSC 2.1/panTro2)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Mar. 2006 (CGSC 2.1/panTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Mar. 2006 (CGSC 2.1/panTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPanTro2Viewchain Chain Chimp (Mar. 2006 (CGSC 2.1/panTro2)), Chain and Net Alignments Comparative Genomics chainPanTro2 Chimp Chain Chimp (Mar. 2006 (CGSC 2.1/panTro2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chimp (Mar. 2006 (CGSC 2.1/panTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chimp and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chimp assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chimp/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chimp sequence used in this annotation is from the Mar. 2006 (CGSC 2.1/panTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chimp/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chimp chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPonAbe2 Orangutan Chain/Net Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both orangutan and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the orangutan assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best orangutan/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The orangutan sequence used in this annotation is from the July 2007 (WUGSC 2.0.2/ponAbe2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the orangutan/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single orangutan chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPonAbe2Viewnet Net Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments Comparative Genomics netPonAbe2 Orangutan Net Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both orangutan and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the orangutan assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best orangutan/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The orangutan sequence used in this annotation is from the July 2007 (WUGSC 2.0.2/ponAbe2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the orangutan/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single orangutan chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPonAbe2Viewchain Chain Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)), Chain and Net Alignments Comparative Genomics chainPonAbe2 Orangutan Chain Orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of orangutan (July 2007 (WUGSC 2.0.2/ponAbe2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both orangutan and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the orangutan assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best orangutan/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The orangutan sequence used in this annotation is from the July 2007 (WUGSC 2.0.2/ponAbe2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the orangutan/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single orangutan chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRheMac2 Rhesus Chain/Net Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rhesus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rhesus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rhesus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rhesus sequence used in this annotation is from the Jan. 2006 (MGSC Merged 1.0/rheMac2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rhesus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rhesus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRheMac2Viewnet Net Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments Comparative Genomics netRheMac2 Rhesus Net Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rhesus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rhesus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rhesus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rhesus sequence used in this annotation is from the Jan. 2006 (MGSC Merged 1.0/rheMac2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rhesus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rhesus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRheMac2Viewchain Chain Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)), Chain and Net Alignments Comparative Genomics chainRheMac2 Rhesus Chain Rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rhesus (Jan. 2006 (MGSC Merged 1.0/rheMac2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rhesus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rhesus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rhesus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rhesus sequence used in this annotation is from the Jan. 2006 (MGSC Merged 1.0/rheMac2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rhesus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rhesus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCalJac1 Marmoset Chain/Net Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of marmoset (June 2007 (WUGSC 2.0.2/calJac1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both marmoset and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the marmoset assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best marmoset/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The marmoset sequence used in this annotation is from the June 2007 (WUGSC 2.0.2/calJac1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the marmoset/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single marmoset chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCalJac1Viewnet Net Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments Comparative Genomics netCalJac1 Marmoset Net Marmoset (June 2007 (WUGSC 2.0.2/calJac1)) Alignment net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of marmoset (June 2007 (WUGSC 2.0.2/calJac1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both marmoset and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the marmoset assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best marmoset/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The marmoset sequence used in this annotation is from the June 2007 (WUGSC 2.0.2/calJac1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the marmoset/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single marmoset chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCalJac1Viewchain Chain Marmoset (June 2007 (WUGSC 2.0.2/calJac1)), Chain and Net Alignments Comparative Genomics chainCalJac1 Marmoset Chain Marmoset (June 2007 (WUGSC 2.0.2/calJac1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of marmoset (June 2007 (WUGSC 2.0.2/calJac1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both marmoset and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the marmoset assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best marmoset/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The marmoset sequence used in this annotation is from the June 2007 (WUGSC 2.0.2/calJac1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the marmoset/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single marmoset chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A90-330-236-356 C-330100-318-236 G-236-318100-330 T-356-236-33090 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 wgEncodeUcsdNgTaf1Signal LI TAF1 Signal Ludwig Institute NimbleGen ChIP-chip Signal: TAF1 antibody, IMR90 cells Regulation Description This track shows TAF1 binding in fibroblastoid (IMR90) cells as assayed by ChIP-chip using a NimbleGen microarray. The companion LI TAF Sites track shows binding sites derived from these experiments. TAF1, a protein found at the start of transcribed genes, is a general transcription factor that is a key part of the pre-initiation complex found on the promoter. It is more fully known as TBP-associated factor 1 of the TFIID complex or by its molecular weight as TAF250. To survey the entire human genome in an unbiased fashion, a total of 38 high-density oligonucleotide arrays (NimbleGen platform) were fabricated, representing approximately 1.45 billion base pairs of non-repetitive DNA with 50-mer oligonucleotides positioned at every 100 base pairs throughout the human genome (UCSC hg16). Using this array, genome-wide location analysis of TAF1 was conducted employing ChIP-chip using chromatin extracted from primary fibroblast IMR90 cells. Methods Chromatin from IMR90 cells lines was cross-linked, precipitated with TAF1 antibody (sc-735, Santa Cruz), sheared, amplified and hybridized to 38 high-density oligonucleotide arrays (NimbleGen). These arrays contain a total of 14,535,659 50-mer oligonucleotides positioned at every 100 base pairs through the human genome (UCSC hg16). track. The raw data are available from GEO GSE2672. The raw data are available from GEO GSE2672. Verification The peaks from genome scan experiments were verified using condensed arrays, as described in the Methods section. The verification data may be viewed in the LI TAF1 Valid track. References Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436:876-80. wgEncodeUcsdNgTaf1Super LI/UCSD TAF1 Ludwig Institute/UC San Diego TAF1 Binding in Fibroblasts Regulation Overview This super-track combines related tracks of genome-wide ChIP-chip data generated by the Ludwig Institute/UC San Diego ENCODE group. These tracks show TAF1 binding in fibroblastoid (IMR90) cells as assayed by ChIP-chip using a NimbleGen microarray. TAF1, a protein found at the start of transcribed genes, is a general transcription factor that is a key part of the pre-initiation complex found on the promoter. It is more fully known as TBP-associated factor 1 of the TFIID complex or by its molecular weight as TAF250. The raw data from these experiments are available from GEO GSE2672. Credits The data for these tracks were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436:876-80. wgEncodeUcsdNgTaf1Sites LI TAF1 Sites Ludwig Institute NimbleGen ChIP-chip Sites: TAF1 antibody, IMR90 cells Regulation Description This track shows likely TAF1 binding sites in fibroblastoid (IMR90) cells as assayed by ChIP-chip using a NimbleGen microarray. The two subtracks show known TAF1 binding sites and additional novel sites where, based on the data in the LI TAF1Signal companion track, TAF1 is most likely to bind. TAF1, a protein found at the start of transcribed genes, is a general transcription factor that is a key part of the pre-initiation complex found on the promoter. It is more fully known as TBP-associated factor 1 of the TFIID complex or by its molecular weight as TAF250. To survey the entire human genome in an unbiased fashion, a total of 38 high-density oligonucleotide arrays (NimbleGen platform) were fabricated, representing approximately 1.45 billion base pairs of non-repetitive DNA with 50-mer oligonucleotides positioned at every 100 base pairs throughout the human genome (UCSC hg16). Using this array, genome-wide location analysis of TAF1 was conducted employing ChIP-chip using chromatin extracted from primary fibroblast IMR90 cells. Methods Chromatin from IMR90 cells lines was cross-linked, precipitated with TAF1 antibody (sc-735, Santa Cruz), sheared, amplified and hybridized to 38 high-density oligonucleotide arrays (NimbleGen). These arrays contain a total of 14,535,659 50-mer oligonucleotides positioned at every 100 base pairs through the human genome (UCSC hg16). Using this set of arrays, a total of 9,966 clusters of TFIID binding sites were identified. To verify the binding of TFIID to these sequences, a condensed array was designed containing a total of 379,521 oligonucleotides to represent the 9,966 putative TFIID binding sequences plus 29 control genomic loci at 100 bp resolution. Using these condensed arrays, two independent chromatin immunoprecipitation (ChIP) experiments were performed with the antibodies against TAF1, RNA polymerase II, acetylated histone 3 and dimethylated K4 histone 3. A total of 8,597 TFIID binding regions, ranging in size from 400 bp to 9.8 Kbp, were confirmed by the TAF1 replicate experiments. The verification data can be viewed in the LI TAF1 Valid track. To further define the sites of TFIID binding within the identified regions, a model-based peak-finding algorithm was developed that estimates the most likely TFIID binding sites based on the hybridization intensity of probes within each fragment. The signals from a set of consecutive significantly-enriched probes were collectively used to locate the most likely TFIID binding site to the probe with the peak signal. The algorithm predicted a total of 12,150 TFIID binding sites within the 8,597 confirmed TFIID binding fragments. The locations of the 12,150 peaks were compared to the annotated 5' end of transcripts from RefSeq, GenBank and DBTSS, using a cutoff of 2.5 Kbp. It was found that 10,504 peaks corresponding to 9,281 non-redundant transcripts were within 2.5 Kbp of the annotated 5' end. 47 of the remaining peaks were within 2.5 Kbp of Ensembl genes, resulting in a total of 9328 known non-redundant promoters. The remaining peaks were further filtered using Acembly annotation and H3ac, RNAP and MeH3K4 ChIP-chip data. The total number of novel peaks was 1,239. The raw data are available from GEO GSE2672. Verification The peaks from genome scan experiments were verified using condensed arrays, as described in the Methods section. The verification data may be viewed in the LI TAF1 Valid track. References Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436:876-80. wgEncodeUcsdNgTaf1NovelSites TAF1 Novel Sites Ludwig Institute TAF1 Sites Matching No Known Transcripts Regulation wgEncodeUcsdNgTaf1KnownSites TAF1 Known Sites Ludwig Institute TAF1 Sites Matching to Known Transcripts Regulation wgEncodeUcsdNgTaf1Valid LI TAF1 Valid Ludwig Institute ChIP-chip Validation: TAF1 in IMR90 cells Regulation Description This track displays validation data from ChIP-chip experiments on four factors in IMR90 cells using a condensed array covering putative TAF1 binding sites. This track may be used to validate the whole genome scan shown in the LI TAF1 Signal and LI TAF1 Sites tracks. All four factors — Pol2, H3ac, H3K4me2, and TAF1 itself — are associated with the start of transcribed genes. Thus, there should be a very strong correlation between the signals shown in this track and the other tracks. TAF1 is a component of TFIID, which is itself a component of the pre-initiation complex that assembles on promoter regions. Pol2, more fully known as RNA Polymerase II, is the enzyme responsible for transcription of mRNA. The specific antibody against Pol2, 8WG16 from the Abcom catalog, binds specifically to the non-phosphorylated form of Pol2 which is associated with the pre-initiation complex. H3ac and H3K4me2 are forms of histone H3 that are associated with transcriptionally-active chromatin. Methods For the whole genome scan, chromatin from IMR90 cells lines was cross-linked, precipitated with TAF1 antibody (sc-735, Santa Cruz), sheared, amplified and hybridized to 38 high-density oligonucleotide arrays (NimbleGen). These arrays contain a total of 14,535,659 50-mer oligonucleotides, positioned at every 100 base pairs throughout the human genome (UCSC hg16). Using this set of arrays, a total of 9,966 clusters of TAF1 binding sites were identified. To verify the binding of TAF1 to these sequences, a condensed array was designed containing a total of 379,521 oligonucleotides to represent the 9,966 putative TAF1 binding sequences plus 29 control genomic loci at 100 bp resolution. Using these condensed arrays, two independent chromatin immunoprecipitation (ChIP) experiments were performed with the antibodies against TAF1, Pol2, acetylated histone 3 and dimethylated K4 histone 3. A total of 8,597 TAF1 binding regions, ranging in size from 400 bp to 9.8 Kbp, were confirmed by the TAF1 replicate experiments. The raw data are available from GEO GSE2672. Verification The peaks from genome scan experiments were verified using condensed arrays, as described in the Methods section. References Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436:876-80. wgEncodeUcsdNgTaf1ValidTaf LI Valid TAF1 Ludwig Institute ChIP-chip Validation: TAF1 antibody, IMR90 cells Regulation wgEncodeUcsdNgTaf1ValidRnap LI Valid Pol2 Ludwig Institute ChIP-chip Validation: Pol2 8WG16 antibody, IMR90 cells Regulation wgEncodeUcsdNgTaf1ValidH3K4me LI Valid H3K4m2 Ludwig Institute ChIP-chip Validation: H3K4me2 antibody, IMR90 cells Regulation wgEncodeUcsdNgTaf1ValidH3ac LI Valid H3ac Ludwig Institute ChIP-chip Validation: H3ac antibody, IMR90 cells Regulation uwNucOccA375 Nucl Occ: A375 UW Predicted Nucleosome Occupancy - A375 Regulation Description Inside the nucleus, DNA is wrapped into a complex molecular structure called chromatin, whose fundamental unit is approximately 150 bp of DNA organized around the eight-histone protein complex known as the nucleosome. This track contains predicted nucleosome occupancy scores produced by a model that was trained using data from the A375 cell line from Ozsolak et al. (2007). This cell line was prepared with weak MNase digestion. The A375 model excels at recognizing regions of strong protection from MNase cleavage; i.e., positions that are frequently occupied by a nucleosome. Display Conventions and Configuration The output of the SVM is a unitless discriminant score. In the browser, the score of a 50-mer is assigned to its 26th base. Canonically, a score of 0 indicates an uncertain assignment; a score of 1.0 corresponds to a confident prediction for being in the positive class (i.e., a position of frequent nucleosome occupancy), and a score of -1.0 corresponds to a confident prediction for being in the negative class. Methods For a given microarray experiment, we identify the 1000 50 bp probes with the highest log intensity ratios. These comprise our positive training samples. In a similar fashion, we generate negative training samples with the lowest log intensity ratios. Each 50-mer in the training set is converted into a 2772-element vector of k-mer frequencies for k=1 up to 6 (collapsing reverse complements). A linear SVM is then trained to discriminate between the two classes. The SVM regularization parameter is selected by evaluating the entire regularization path on a held-out portion of the training data set. After training, each 50-mer in the human genome is converted to the 2772-element representation and scored using the trained SVM. Detailed methods are given in Gupta et al. (2008), and supplementary data is available here. Credits This track was produced at the University of Washington by Shobhit Gupta and William Stafford Noble (noble@gs.washington.edu). References Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007 Feb;25(2):244-8. Dennis JH, Fan HY, Reynolds SM, Yuan G, Meldrim JC, Richter DJ, Peterson DG, Rando OJ, Noble WS, Kingston RE. Independent and complementary methods for large-scale structural analysis of mammalian chromatin. Genome Res. 2007 Jun;17(6):928-39. Gupta S, Dennis J, Thurman RE, Kingston R, Stamatoyannopoulos JA, Noble WS. Predicting human nucleosome occupancy from primary sequence. PLoS Comput Biol. 2008 Aug 22;4(8):e1000134. uwNucOcc Nucleosome Occupancy UW Predicted Nucleosome Occupancy Regulation Description Inside the nucleus, DNA is wrapped into a complex molecular structure called chromatin, whose fundamental unit is approximately 150 bp of DNA organized around the eight-histone protein complex known as the nucleosome. These tracks contains predicted nucleosome occupancy scores produced by three different computational models. Each model is a support vector machine classifier trained using microarray data from an MNase cleavage assay. Each SVM is trained to discriminate between 50 bp DNA sequences that show the strongest and weakest signals in the MNase assay. Although each model can predict regions of high and low nucleosome occupancy, one model (MEC) excels at recognizing regions of low nucleosome occupancy, whereas the other two (A375 and Dennis) are better at recognizing regions of high nucleosome occupancy. The three models are as follows: A375 - This model was trained using data from the A375 cell line from Ozsolak et al. (2007). This cell line was prepared with weak MNase digestion. The A375 model excels at recognizing regions of strong protection from MNase cleavage; i.e., positions that are frequently occupied by a nucleosome. Dennis - This model was trained using data from MDA-kb2 cell line data from Dennis et al. (2007). This cell line was prepared with weak MNase digestion. Hence, like the A375 model, the Dennis model excels at recognizing regions that are frequently occupied by a nucleosome. MEC - This model was trained using data from the MEC cell line from Ozsolak et al. (2007). This cell line was prepared with strong MNase digestion. This model excels at recognizing regions of high accessibility to MNase cleavage; i.e., positions that are frequently nucleosome-free. Display Conventions and Configuration The output of the SVM is a unitless discriminant score. In the browser, the score of a 50-mer is assigned to its 26th base. Canonically, a score of 0 indicates an uncertain assignment; a score of 1.0 corresponds to a confident prediction for being in the positive class (i.e., a position of frequent nucleosome occupancy), and a score of -1.0 corresponds to a confident prediction for being in the negative class. Methods For a given microarray experiment, we identify the 1000 50 bp probes with the highest log intensity ratios. These comprise our positive training samples. In a similar fashion, we generate negative training samples with the lowest log intensity ratios. Each 50-mer in the training set is converted into a 2772-element vector of k-mer frequencies for k=1 up to 6 (collapsing reverse complements). A linear SVM is then trained to discriminate between the two classes. The SVM regularization parameter is selected by evaluating the entire regularization path on a held-out portion of the training data set. After training, each 50-mer in the human genome is converted to the 2772-element representation and scored using the trained SVM. Detailed methods are given in Gupta et al. (2008), and supplementary data is available here. Credits This track was produced at the University of Washington by Shobhit Gupta and William Stafford Noble (noble@gs.washington.edu). References Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007 Feb;25(2):244-8. Dennis JH, Fan HY, Reynolds SM, Yuan G, Meldrim JC, Richter DJ, Peterson DG, Rando OJ, Noble WS, Kingston RE. Independent and complementary methods for large-scale structural analysis of mammalian chromatin. Genome Res. 2007 Jun;17(6):928-39. Gupta S, Dennis J, Thurman RE, Kingston R, Stamatoyannopoulos JA, Noble WS. Predicting human nucleosome occupancy from primary sequence. PLoS Comput Biol. 2008 Aug 22;4(8):e1000134. uwNucOccDennis Nucl Occ: Dennis UW Predicted Nucleosome Occupancy - Dennis Regulation Description Inside the nucleus, DNA is wrapped into a complex molecular structure called chromatin, whose fundamental unit is approximately 150 bp of DNA organized around the eight-histone protein complex known as the nucleosome. This track contains predicted nucleosome occupancy scores produced by a model that was trained using data from MDA-kb2 cell line data from Dennis et al. (2007). This cell line was prepared with weak MNase digestion. Hence, like the A375 model, the Dennis model excels at recognizing regions that are frequently occupied by a nucleosome. Display Conventions and Configuration The output of the SVM is a unitless discriminant score. In the browser, the score of a 50-mer is assigned to its 26th base. Canonically, a score of 0 indicates an uncertain assignment; a score of 1.0 corresponds to a confident prediction for being in the positive class (i.e., a position of frequent nucleosome occupancy), and a score of -1.0 corresponds to a confident prediction for being in the negative class. Methods For a given microarray experiment, we identify the 1000 50 bp probes with the highest log intensity ratios. These comprise our positive training samples. In a similar fashion, we generate negative training samples with the lowest log intensity ratios. Each 50-mer in the training set is converted into a 2772-element vector of k-mer frequencies for k=1 up to 6 (collapsing reverse complements). A linear SVM is then trained to discriminate between the two classes. The SVM regularization parameter is selected by evaluating the entire regularization path on a held-out portion of the training data set. After training, each 50-mer in the human genome is converted to the 2772-element representation and scored using the trained SVM. Detailed methods are given in Gupta et al. (2008), and supplementary data is available here. Credits This track was produced at the University of Washington by Shobhit Gupta and William Stafford Noble (noble@gs.washington.edu). References Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007 Feb;25(2):244-8. Dennis JH, Fan HY, Reynolds SM, Yuan G, Meldrim JC, Richter DJ, Peterson DG, Rando OJ, Noble WS, Kingston RE. Independent and complementary methods for large-scale structural analysis of mammalian chromatin. Genome Res. 2007 Jun;17(6):928-39. Gupta S, Dennis J, Thurman RE, Kingston R, Stamatoyannopoulos JA, Noble WS. Predicting human nucleosome occupancy from primary sequence. PLoS Comput Biol. 2008 Aug 22;4(8):e1000134. uwNucOccMec Nucl Occ: MEC UW Predicted Nucleosome Occupancy - MEC Regulation Description Inside the nucleus, DNA is wrapped into a complex molecular structure called chromatin, whose fundamental unit is approximately 150 bp of DNA organized around the eight-histone protein complex known as the nucleosome. This track contains predicted nucleosome occupancy scores produced by a model that was trained using data from the MEC cell line from Ozsolak et al. (2007). This cell line was prepared with strong MNase digestion. This model excels at recognizing regions of high accessibility to MNase cleavage; i.e., positions that are frequently nucleosome-free. Display Conventions and Configuration The output of the SVM is a unitless discriminant score. In the browser, the score of a 50-mer is assigned to its 26th base. Canonically, a score of 0 indicates an uncertain assignment; a score of 1.0 corresponds to a confident prediction for being in the positive class (i.e., a position of frequent nucleosome occupancy), and a score of -1.0 corresponds to a confident prediction for being in the negative class. Methods For a given microarray experiment, we identify the 1000 50 bp probes with the highest log intensity ratios. These comprise our positive training samples. In a similar fashion, we generate negative training samples with the lowest log intensity ratios. Each 50-mer in the training set is converted into a 2772-element vector of k-mer frequencies for k=1 up to 6 (collapsing reverse complements). A linear SVM is then trained to discriminate between the two classes. The SVM regularization parameter is selected by evaluating the entire regularization path on a held-out portion of the training data set. After training, each 50-mer in the human genome is converted to the 2772-element representation and scored using the trained SVM. Detailed methods are given in Gupta et al. (2008), and supplementary data is available here. Credits This track was produced at the University of Washington by Shobhit Gupta and William Stafford Noble (noble@gs.washington.edu). References Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007 Feb;25(2):244-8. Dennis JH, Fan HY, Reynolds SM, Yuan G, Meldrim JC, Richter DJ, Peterson DG, Rando OJ, Noble WS, Kingston RE. Independent and complementary methods for large-scale structural analysis of mammalian chromatin. Genome Res. 2007 Jun;17(6):928-39. Gupta S, Dennis J, Thurman RE, Kingston R, Stamatoyannopoulos JA, Noble WS. Predicting human nucleosome occupancy from primary sequence. PLoS Comput Biol. 2008 Aug 22;4(8):e1000134. chainNetCavPor3 Guinea pig Chain/Net Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of guinea pig (Feb. 2008 (Broad/cavPor3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both guinea pig and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the guinea pig assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best guinea pig/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The guinea pig sequence used in this annotation is from the Feb. 2008 (Broad/cavPor3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the guinea pig/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single guinea pig chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCavPor3Viewnet Net Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments Comparative Genomics netCavPor3 Guinea pig Net Guinea pig (Feb. 2008 (Broad/cavPor3)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of guinea pig (Feb. 2008 (Broad/cavPor3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both guinea pig and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the guinea pig assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best guinea pig/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The guinea pig sequence used in this annotation is from the Feb. 2008 (Broad/cavPor3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the guinea pig/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single guinea pig chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCavPor3Viewchain Chain Guinea pig (Feb. 2008 (Broad/cavPor3)), Chain and Net Alignments Comparative Genomics chainCavPor3 Guinea pig Chain Guinea pig (Feb. 2008 (Broad/cavPor3)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of guinea pig (Feb. 2008 (Broad/cavPor3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both guinea pig and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the guinea pig assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best guinea pig/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The guinea pig sequence used in this annotation is from the Feb. 2008 (Broad/cavPor3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the guinea pig/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single guinea pig chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn4 Rat Chain/Net Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Nov. 2004 (Baylor 3.4/rn4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Nov. 2004 (Baylor 3.4/rn4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn4Viewnet Net Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments Comparative Genomics netRn4 Rat Net Rat (Nov. 2004 (Baylor 3.4/rn4)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Nov. 2004 (Baylor 3.4/rn4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Nov. 2004 (Baylor 3.4/rn4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetRn4Viewchain Chain Rat (Nov. 2004 (Baylor 3.4/rn4)), Chain and Net Alignments Comparative Genomics chainRn4 Rat Chain Rat (Nov. 2004 (Baylor 3.4/rn4)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of rat (Nov. 2004 (Baylor 3.4/rn4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both rat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the rat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best rat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The rat sequence used in this annotation is from the Nov. 2004 (Baylor 3.4/rn4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the rat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single rat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm9 Mouse Chain/Net Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (July 2007 (NCBI37/mm9)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the July 2007 (NCBI37/mm9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm9Viewnet Net Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments Comparative Genomics netMm9 Mouse Net Mouse (July 2007 (NCBI37/mm9)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (July 2007 (NCBI37/mm9)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the July 2007 (NCBI37/mm9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm9Viewchain Chain Mouse (July 2007 (NCBI37/mm9)), Chain and Net Alignments Comparative Genomics chainMm9 Mouse Chain Mouse (July 2007 (NCBI37/mm9)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (July 2007 (NCBI37/mm9)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the July 2007 (NCBI37/mm9) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2 Dog Chain/Net Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2Viewnet Net Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics netCanFam2 Dog Net Dog (May 2005 (Broad/canFam2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetCanFam2Viewchain Chain Dog (May 2005 (Broad/canFam2)), Chain and Net Alignments Comparative Genomics chainCanFam2 Dog Chain Dog (May 2005 (Broad/canFam2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of dog (May 2005 (Broad/canFam2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both dog and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the dog assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best dog/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The dog sequence used in this annotation is from the May 2005 (Broad/canFam2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the dog/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single dog chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFelCat3 Cat Chain/Net Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cat (Mar. 2006 (Broad/felCat3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cat sequence used in this annotation is from the Mar. 2006 (Broad/felCat3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFelCat3Viewnet Net Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments Comparative Genomics netFelCat3 Cat Net Cat (Mar. 2006 (Broad/felCat3)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cat (Mar. 2006 (Broad/felCat3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cat sequence used in this annotation is from the Mar. 2006 (Broad/felCat3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFelCat3Viewchain Chain Cat (Mar. 2006 (Broad/felCat3)), Chain and Net Alignments Comparative Genomics chainFelCat3 Cat Chain Cat (Mar. 2006 (Broad/felCat3)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cat (Mar. 2006 (Broad/felCat3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cat and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cat assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cat/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cat sequence used in this annotation is from the Mar. 2006 (Broad/felCat3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cat/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cat chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEquCab1 Horse Chain/Net Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of horse (Jan. 2007 (Broad/equCab1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both horse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the horse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best horse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The horse sequence used in this annotation is from the Jan. 2007 (Broad/equCab1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the horse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single horse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEquCab1Viewnet Net Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments Comparative Genomics netEquCab1 Horse Net Horse (Jan. 2007 (Broad/equCab1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of horse (Jan. 2007 (Broad/equCab1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both horse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the horse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best horse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The horse sequence used in this annotation is from the Jan. 2007 (Broad/equCab1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the horse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single horse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEquCab1Viewchain Chain Horse (Jan. 2007 (Broad/equCab1)), Chain and Net Alignments Comparative Genomics chainEquCab1 Horse Chain Horse (Jan. 2007 (Broad/equCab1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of horse (Jan. 2007 (Broad/equCab1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both horse and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the horse assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best horse/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The horse sequence used in this annotation is from the Jan. 2007 (Broad/equCab1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the horse/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single horse chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau4 bosTau4 Chain/Net Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Oct. 2007 (Baylor 4.0/bosTau4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Oct. 2007 (Baylor 4.0/bosTau4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau4Viewnet Net Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments Comparative Genomics netBosTau4 Cow Net Cow (Oct. 2007 (Baylor 4.0/bosTau4)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Oct. 2007 (Baylor 4.0/bosTau4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Oct. 2007 (Baylor 4.0/bosTau4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBosTau4Viewchain Chain Cow (Oct. 2007 (Baylor 4.0/bosTau4)), Chain and Net Alignments Comparative Genomics chainBosTau4 Cow Chain Cow (Oct. 2007 (Baylor 4.0/bosTau4)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of cow (Oct. 2007 (Baylor 4.0/bosTau4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both cow and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the cow assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best cow/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The cow sequence used in this annotation is from the Oct. 2007 (Baylor 4.0/bosTau4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the cow/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single cow chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEchTel1 Tenrec Chain/Net Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tenrec (July 2005 (Broad/echTel1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tenrec and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tenrec assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tenrec/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tenrec sequence used in this annotation is from the July 2005 (Broad/echTel1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tenrec/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tenrec chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEchTel1Viewnet Net Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments Comparative Genomics netEchTel1 Tenrec Net Tenrec (July 2005 (Broad/echTel1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tenrec (July 2005 (Broad/echTel1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tenrec and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tenrec assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tenrec/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tenrec sequence used in this annotation is from the July 2005 (Broad/echTel1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tenrec/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tenrec chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetEchTel1Viewchain Chain Tenrec (July 2005 (Broad/echTel1)), Chain and Net Alignments Comparative Genomics chainEchTel1 Tenrec Chain Tenrec (July 2005 (Broad/echTel1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tenrec (July 2005 (Broad/echTel1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tenrec and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tenrec assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tenrec/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tenrec sequence used in this annotation is from the July 2005 (Broad/echTel1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tenrec/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tenrec chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMonDom4 Opossum Chain/Net Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of opossum (Jan. 2006 (Broad/monDom4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both opossum and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the opossum assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best opossum/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The opossum sequence used in this annotation is from the Jan. 2006 (Broad/monDom4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the opossum/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single opossum chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMonDom4Viewnet Net Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments Comparative Genomics netMonDom4 Opossum Net Opossum (Jan. 2006 (Broad/monDom4)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of opossum (Jan. 2006 (Broad/monDom4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both opossum and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the opossum assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best opossum/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The opossum sequence used in this annotation is from the Jan. 2006 (Broad/monDom4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the opossum/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single opossum chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMonDom4Viewchain Chain Opossum (Jan. 2006 (Broad/monDom4)), Chain and Net Alignments Comparative Genomics chainMonDom4 Opossum Chain Opossum (Jan. 2006 (Broad/monDom4)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of opossum (Jan. 2006 (Broad/monDom4)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both opossum and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the opossum assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best opossum/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The opossum sequence used in this annotation is from the Jan. 2006 (Broad/monDom4) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the opossum/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single opossum chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOrnAna1 Platypus Chain/Net Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both platypus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the platypus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best platypus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The platypus sequence used in this annotation is from the Mar. 2007 (WUGSC 5.0.1/ornAna1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the platypus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single platypus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOrnAna1Viewnet Net Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments Comparative Genomics netOrnAna1 Platypus Net Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both platypus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the platypus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best platypus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The platypus sequence used in this annotation is from the Mar. 2007 (WUGSC 5.0.1/ornAna1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the platypus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single platypus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOrnAna1Viewchain Chain Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)), Chain and Net Alignments Comparative Genomics chainOrnAna1 Platypus Chain Platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of platypus (Mar. 2007 (WUGSC 5.0.1/ornAna1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both platypus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the platypus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best platypus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The platypus sequence used in this annotation is from the Mar. 2007 (WUGSC 5.0.1/ornAna1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the platypus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single platypus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetAnoCar1 Lizard Chain/Net Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lizard (Feb. 2007 (Broad/anoCar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lizard and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lizard assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lizard/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lizard sequence used in this annotation is from the Feb. 2007 (Broad/anoCar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lizard/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lizard chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetAnoCar1Viewnet Net Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments Comparative Genomics netAnoCar1 Lizard Net Lizard (Feb. 2007 (Broad/anoCar1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lizard (Feb. 2007 (Broad/anoCar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lizard and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lizard assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lizard/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lizard sequence used in this annotation is from the Feb. 2007 (Broad/anoCar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lizard/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lizard chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetAnoCar1Viewchain Chain Lizard (Feb. 2007 (Broad/anoCar1)), Chain and Net Alignments Comparative Genomics chainAnoCar1 Lizard Chain Lizard (Feb. 2007 (Broad/anoCar1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lizard (Feb. 2007 (Broad/anoCar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lizard and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lizard assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lizard/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lizard sequence used in this annotation is from the Feb. 2007 (Broad/anoCar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lizard/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lizard chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal3 Chicken Chain/Net Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (May 2006 (WUGSC 2.1/galGal3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the May 2006 (WUGSC 2.1/galGal3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal3Viewnet Net Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments Comparative Genomics netGalGal3 Chicken Net Chicken (May 2006 (WUGSC 2.1/galGal3)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (May 2006 (WUGSC 2.1/galGal3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the May 2006 (WUGSC 2.1/galGal3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGalGal3Viewchain Chain Chicken (May 2006 (WUGSC 2.1/galGal3)), Chain and Net Alignments Comparative Genomics chainGalGal3 Chicken Chain Chicken (May 2006 (WUGSC 2.1/galGal3)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of chicken (May 2006 (WUGSC 2.1/galGal3)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both chicken and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the chicken assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best chicken/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The chicken sequence used in this annotation is from the May 2006 (WUGSC 2.1/galGal3) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the chicken/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single chicken chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTaeGut1 Zebra finch Chain/Net Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Jul. 2008 (WUGSC 3.2.4/taeGut1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTaeGut1Viewnet Net Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments Comparative Genomics netTaeGut1 Zebra finch Net Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Jul. 2008 (WUGSC 3.2.4/taeGut1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTaeGut1Viewchain Chain Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)), Chain and Net Alignments Comparative Genomics chainTaeGut1 Zebra finch Chain Zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Jul. 2008 (WUGSC 3.2.4/taeGut1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Jul. 2008 (WUGSC 3.2.4/taeGut1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro2 xenTro2 Chain/Net X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Aug. 2005 (JGI 4.1/xenTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro2Viewnet Net X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments Comparative Genomics netXenTro2 X. tropicalis Net X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Aug. 2005 (JGI 4.1/xenTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetXenTro2Viewchain Chain X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)), Chain and Net Alignments Comparative Genomics chainXenTro2 X. tropicalis Chain X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of X. tropicalis (Aug. 2005 (JGI 4.1/xenTro2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both X. tropicalis and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the X. tropicalis assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best X. tropicalis/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The X. tropicalis sequence used in this annotation is from the Aug. 2005 (JGI 4.1/xenTro2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the X. tropicalis/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single X. tropicalis chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer5 Zebrafish Chain/Net Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (July 2007 (Zv7/danRer5)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the July 2007 (Zv7/danRer5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer5Viewnet Net Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments Comparative Genomics netDanRer5 Zebrafish Net Zebrafish (July 2007 (Zv7/danRer5)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (July 2007 (Zv7/danRer5)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the July 2007 (Zv7/danRer5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetDanRer5Viewchain Chain Zebrafish (July 2007 (Zv7/danRer5)), Chain and Net Alignments Comparative Genomics chainDanRer5 Zebrafish Chain Zebrafish (July 2007 (Zv7/danRer5)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebrafish (July 2007 (Zv7/danRer5)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebrafish and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebrafish assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebrafish/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebrafish sequence used in this annotation is from the July 2007 (Zv7/danRer5) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebrafish/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebrafish chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGasAcu1 Stickleback Chain/Net Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of stickleback (Feb. 2006 (Broad/gasAcu1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both stickleback and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the stickleback assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best stickleback/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The stickleback sequence used in this annotation is from the Feb. 2006 (Broad/gasAcu1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the stickleback/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single stickleback chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGasAcu1Viewnet Net Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments Comparative Genomics netGasAcu1 Stickleback Net Stickleback (Feb. 2006 (Broad/gasAcu1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of stickleback (Feb. 2006 (Broad/gasAcu1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both stickleback and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the stickleback assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best stickleback/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The stickleback sequence used in this annotation is from the Feb. 2006 (Broad/gasAcu1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the stickleback/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single stickleback chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGasAcu1Viewchain Chain Stickleback (Feb. 2006 (Broad/gasAcu1)), Chain and Net Alignments Comparative Genomics chainGasAcu1 Stickleback Chain Stickleback (Feb. 2006 (Broad/gasAcu1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of stickleback (Feb. 2006 (Broad/gasAcu1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both stickleback and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the stickleback assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best stickleback/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The stickleback sequence used in this annotation is from the Feb. 2006 (Broad/gasAcu1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the stickleback/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single stickleback chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOryLat2 Medaka Chain/Net Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medaka and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medaka assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medaka/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medaka sequence used in this annotation is from the Oct. 2005 (NIG/UT MEDAKA1/oryLat2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medaka/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medaka chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOryLat2Viewnet Net Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments Comparative Genomics netOryLat2 Medaka Net Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medaka and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medaka assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medaka/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medaka sequence used in this annotation is from the Oct. 2005 (NIG/UT MEDAKA1/oryLat2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medaka/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medaka chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetOryLat2Viewchain Chain Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)), Chain and Net Alignments Comparative Genomics chainOryLat2 Medaka Chain Medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medaka (Oct. 2005 (NIG/UT MEDAKA1/oryLat2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medaka and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medaka assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medaka/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medaka sequence used in this annotation is from the Oct. 2005 (NIG/UT MEDAKA1/oryLat2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medaka/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medaka chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr2 Fugu Chain/Net Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Oct. 2004 (JGI 4.0/fr2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Oct. 2004 (JGI 4.0/fr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr2Viewnet Net Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments Comparative Genomics netFr2 Fugu Net Fugu (Oct. 2004 (JGI 4.0/fr2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Oct. 2004 (JGI 4.0/fr2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Oct. 2004 (JGI 4.0/fr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetFr2Viewchain Chain Fugu (Oct. 2004 (JGI 4.0/fr2)), Chain and Net Alignments Comparative Genomics chainFr2 Fugu Chain Fugu (Oct. 2004 (JGI 4.0/fr2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of fugu (Oct. 2004 (JGI 4.0/fr2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both fugu and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the fugu assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best fugu/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The fugu sequence used in this annotation is from the Oct. 2004 (JGI 4.0/fr2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the fugu/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single fugu chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1 Tetraodon Chain/Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1Viewnet Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics netTetNig1 Tetraodon Net Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTetNig1Viewchain Chain Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)), Chain and Net Alignments Comparative Genomics chainTetNig1 Tetraodon Chain Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both tetraodon and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the tetraodon assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best tetraodon/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The tetraodon sequence used in this annotation is from the Feb. 2004 (Genoscope 7/tetNig1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the tetraodon/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single tetraodon chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 ecoresTetNig1 Tetraodon Ecores Human(hg18)/Tetraodon (Feb. 2004 (Genoscope 7/tetNig1)) Evolutionary Conserved Regions Comparative Genomics Description This track shows Evolutionary Conserved Regions computed by the Exofish program at Genoscope. Each singleton block corresponds to an "ecore"; two blocks connected by a thin line correspond to an "ecotig", a set of colinear ecores in a syntenic region. Methods Genome-wide sequence comparisons were done at the protein-coding level between the genome sequences of human, Homo sapiens, and Tetraodon (green spotted pufferfish), Tetraodon nigroviridis, to detect evolutionarily conserved regions (ECORES). Credits Thanks to Olivier Jaillon at Genoscope for contributing the data. chainNetPetMar1 Lamprey Chain/Net Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Mar. 2007 (WUGSC 3.0/petMar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPetMar1Viewnet Net Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments Comparative Genomics netPetMar1 Lamprey Net Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Mar. 2007 (WUGSC 3.0/petMar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPetMar1Viewchain Chain Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)), Chain and Net Alignments Comparative Genomics chainPetMar1 Lamprey Chain Lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Mar. 2007 (WUGSC 3.0/petMar1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Mar. 2007 (WUGSC 3.0/petMar1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBraFlo1 Lancelet Chain/Net Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lancelet (Mar. 2006 (JGI 1.0/braFlo1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lancelet and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lancelet assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lancelet/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lancelet sequence used in this annotation is from the Mar. 2006 (JGI 1.0/braFlo1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lancelet/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lancelet chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBraFlo1Viewnet Net Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments Comparative Genomics netBraFlo1 Lancelet Net Lancelet (Mar. 2006 (JGI 1.0/braFlo1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lancelet (Mar. 2006 (JGI 1.0/braFlo1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lancelet and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lancelet assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lancelet/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lancelet sequence used in this annotation is from the Mar. 2006 (JGI 1.0/braFlo1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lancelet/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lancelet chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetBraFlo1Viewchain Chain Lancelet (Mar. 2006 (JGI 1.0/braFlo1)), Chain and Net Alignments Comparative Genomics chainBraFlo1 Lancelet Chain Lancelet (Mar. 2006 (JGI 1.0/braFlo1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lancelet (Mar. 2006 (JGI 1.0/braFlo1)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lancelet and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lancelet assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lancelet/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lancelet sequence used in this annotation is from the Mar. 2006 (JGI 1.0/braFlo1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lancelet/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lancelet chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "2000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetStrPur2 S. purpuratus Chain/Net S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both S. purpuratus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the S. purpuratus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best S. purpuratus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The S. purpuratus sequence used in this annotation is from the Sep. 2006 (Baylor 2.1/strPur2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the S. purpuratus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single S. purpuratus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetStrPur2Viewnet Net S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments Comparative Genomics netStrPur2 S. purpuratus Net S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both S. purpuratus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the S. purpuratus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best S. purpuratus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The S. purpuratus sequence used in this annotation is from the Sep. 2006 (Baylor 2.1/strPur2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the S. purpuratus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single S. purpuratus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetStrPur2Viewchain Chain S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)), Chain and Net Alignments Comparative Genomics chainStrPur2 S. purpuratus Chain S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of S. purpuratus (Sep. 2006 (Baylor 2.1/strPur2)) to the human genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both S. purpuratus and human simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the S. purpuratus assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best S. purpuratus/human chain for every part of the human genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The S. purpuratus sequence used in this annotation is from the Sep. 2006 (Baylor 2.1/strPur2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the S. purpuratus/human split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single S. purpuratus chromosome and a single human chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits Lastz (previously known as blastz) was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961