dbSnp155Composite dbSNP 155 Short Genetic Variants from dbSNP release 155 Variation Description This track shows short genetic variants (up to approximately 50 base pairs) from dbSNP build 155: single-nucleotide variants (SNVs), small insertions, deletions, and complex deletion/insertions (indels), relative to the reference genome assembly. Most variants in dbSNP are rare, not true polymorphisms, and some variants are known to be pathogenic. For hg38 (GRCh38), approximately 998 million distinct variants (RefSNP clusters with rs# ids) have been mapped to more than 1.06 billion genomic locations including alternate haplotype and fix patch sequences. dbSNP remapped variants from hg38 to hg19 (GRCh37); approximately 981 million distinct variants were mapped to more than 1.02 billion genomic locations including alternate haplotype and fix patch sequences (not all of which are included in UCSC's hg19). This track includes four subtracks of variants: All dbSNP (155): the entire set (1.02 billion for hg19, 1.06 billion for hg38) Common dbSNP (155): approximately 15 million variants with a minor allele frequency (MAF) of at least 1% (0.01) in the 1000 Genomes Phase 3 dataset. Variants in the Mult. subset (below) are excluded. ClinVar dbSNP (155): approximately 820,000 variants mentioned in ClinVar. Note: that includes both benign and pathogenic (as well as uncertain) variants. Variants in the Mult. subset (below) are excluded. Mult. dbSNP (155): variants that have been mapped to multiple chromosomes, for example chr1 and chr2, raising the question of whether the variant is really a variant or just a difference between duplicated sequences. There are some exceptions in which a variant is mapped to more than one reference sequence, but not culled into this set: A variant may appear in both X and Y pseudo-autosomal regions (PARs) without being included in this set. A variant may also appear in a main chromosome as well as an alternate haplotype or fix patch sequence assigned to that chromosome. A fifth subtrack highlights coordinate ranges to which dbSNP mapped a variant but with genomic coordinates that are not internally consistent, i.e. different coordinate ranges were provided when describing different alleles. This can occur due to a bug with mapping variants from one assembly sequence to another when there is an indel difference between the assembly sequences: Map Err (155): around 134,000 mappings of 88,000 distinct rsIDs for hg19 and 178,000 mappings of 108,000 distinct rsIDs for hg38. Interpreting and Configuring the Graphical Display SNVs and pure deletions are displayed as boxes covering the affected base(s). Pure insertions are drawn as single-pixel tickmarks between the base before and the base after the insertion. Insertions and/or deletions in repetitive regions may be represented by a half-height box showing uncertainty in placement, followed by a full-height box showing the number of deleted bases, or a full-height tickmark to indicate an insertion. When an insertion or deletion falls in a repetitive region, the placement may be ambiguous. For example, if the reference genome contains "TAAAG" but some individuals have "TAAG" at the same location, then the variant is a deletion of a single A relative to the reference genome. However, which A was deleted? There is no way to tell whether the first, second or third A was removed. Different variant mapping tools may place the deletion at different bases in the reference genome. To reduce errors in merging variant calls made with different left vs. right biases, dbSNP made a major change in its representation of deletion/insertion variants in build 152. Now, instead of assigning a single-base genomic location at one of the A's, dbSNP expands the coordinates to encompass the whole repetitive region, so the variant is represented as a deletion of 3 A's combined with an insertion of 2 A's. In the track display, there will be a half-height box covering the first two A's, followed by a full-height box covering the third A, to show a net loss of one base but an uncertain placement within the three A's. Variants are colored according to functional effect on genes annotated by dbSNP: Protein-altering variants and splice site variants are red. Synonymous codon variants are green. Non-coding transcript or Untranslated Region (UTR) variants are blue. On the track controls page, several variant properties can be included or excluded from the item labels: rs# identifier assigned by dbSNP, reference/alternate alleles, major/minor alleles (when available) and minor allele frequency (when available). Allele frequencies are reported independently by the project (some of which may have overlapping sets of samples): 1000Genomes: The 1000 Genomes dataset contains data for 2,504 individuals from 26 populations. dbGaP_PopFreq: The new source of dbGaP aggregated frequency data (>1 Million Subjects) provided by dbSNP. TOPMED: The TOPMED dataset contains freeze 8 panel that includes about 158,000 individuals. The approximate ethnic breakdown is European(41%), African (31%), Hispanic or Latino (15%), East Asian (9%), and unknown (4%) ancestry. KOREAN: The Korean Reference Genome Database contains data for 1,465 Korean individuals. SGDP_PRJ: The Simons Genome Diversity Project dataset contains 263 C-panel fully public samples and 16 B-panel fully public samples for a total of 279 samples. Qatari: The dataset contains initial mappings of the genomes of more than 1,000 Qatari nationals. NorthernSweden: The dataset contains 300 whole-genome sequenced human samples from the county of Vasterbotten in northern Sweden. Siberian: The dataset contains paired-end whole-genome sequencing data of 28 modern-day humans from Siberia and Western Russia. TWINSUK: The UK10K - TwinsUK project contains 1854 samples from the Department of Twin Research and Genetic Epidemiology (DTR). The dataset contains data obtained from the 11,000 identical and non-identical twins between the ages of 16 and 85 years old. TOMMO: The Tohoku Medical Megabank Project contains an allele frequency panel of 3552 Japanese individuals, including the X chromosome. ALSPAC: The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample including individuals obtained from the ALSPAC population. This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992. GENOME_DK: The dataset contains the sequencing of Danish parent-offspring trios to determine genomic variation within the Danish population. GnomAD: The gnomAD genome dataset includes a catalog containing 602M SNVs and 105M indels based on the whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome. GoNL: The Genome of the Netherlands (GoNL) Project characterizes DNA sequence variation, common and rare, for SNVs and short insertions and deletions (indels) and large deletions in 769 individuals of Dutch ancestry selected from five biobanks under the auspices of the Dutch hub of the Biobanking and Biomolecular Research Infrastructure (BBMRI-NL). Estonian: The dataset contains genetic variation in the Estonian population: pharmacogenomics study of adverse drug effects using electronic health records. Vietnamese: The Kinh Vietnamese database contains 24.81 million variants (22.47 million single nucleotide polymorphisms (SNPs) and 2.34 million indels), of which 0.71 million variants are novel. Korea1K: The dataset contains 1,094 Korean personal genomes with clinical information. HapMap: (HapMap is being retired.) The International HapMap Project contains samples from African, Asian, or European populations. PRJEB36033: The dataset contains ancient Sardinia genome-wide 1240k capture data from 70 ancient Sardinians. HGDP_Stanford: The Stanford HGDP SNP genotyping data consists of ~660,918 tag SNPs in autosomes, chromosome X and Y, the pseudoautosomal region, and mitochondrial DNA, typed across 1043 individuals from all panel populations. Daghestan: The dataset contains genotypes of >550 000 autosomal single-nucleotide polymorphisms (SNPs) in a set of 14 population isolates speaking Nakh-Daghestanian (ND) languages. PAGE_STUDY: The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits. Chileans: The dataset consists of genetic variation on the Chileans using genotype data on ~685,944 SNPs from 313 individuals across the whole-continental country. MGP: MGP contains aggregated information on 267 healthy individuals, representative of the Spanish population that were used as controls in the MGP (Medical Genome Project). PRJEB37584: The dataset contains genome-wide genotype analysis that identified copy number variations in cranial meningiomas in Chinese patients, and demonstrated diverse CNV burdens among individuals with diverse clinical features. GoESP: The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. ExAC: The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. Individuals affected by severe pediatric disease have been removed. GnomAD_exomes: The gnomAD v2.1 exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes in 14 populations. FINRISK: The FINRISK cohorts comprise the respondents of representative, cross-sectional population surveys that are carried out every 5 years since 1972, to assess the risk factors of chronic diseases (e.g. CVD, diabetes, obesity, cancer) and health behavior in the working age population. PharmGKB: The dataset contains aggregated frequency data for all PharmGKB submissions. PRJEB37766: The Mexican Genomic Database for Addiction Research. The project from which to take allele frequency data defaults to 1000 Genomes but can be set to any of those projects. Using the track controls, variants can be filtered by minimum minor allele frequency (MAF) variation class/type (e.g. SNV, insertion, deletion) functional effect on a gene (e.g. synonymous, frameshift, intron, upstream) assorted features and anomalies noted by UCSC during processing of dbSNP's data Interesting and anomalous conditions noted by UCSC While processing the information downloaded from dbSNP, UCSC annotates some properties of interest. These are noted on the item details page, and may be useful to include or exclude affected variants. Some are purely informational: keyword in data file (dbSnp155.bb) # in hg19# in hg38description clinvar 627817 630503 Variant is in ClinVar. clinvarBenign 275541 276409 Variant is in ClinVar with clinical significance of benign and/or likely benign. clinvarConflicting 16925 16834 Variant is in ClinVar with reports of both benign and pathogenic significance. clinvarPathogenic 56373 56475 Variant is in ClinVar with clinical significance of pathogenic and/or likely pathogenic. commonAll 14904503 15862783 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in all projects reporting frequencies. commonSome 59633864 62095091 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in some, but not all, projects reporting frequencies. diffMajor 12748733 13073288 Different frequency sources have different major alleles. overlapDiffClass 198945442 207101421 This variant overlaps another variant with a different type/class. overlapSameClass 29281958 30301090 This variant overlaps another with the same type/class but different start/end. rareAll 906113910 938985356 Variant is "rare", i.e. has a Minor Allele Frequency of less than 1% in all projects reporting frequencies, or has no frequency data. rareSome 950843271 985217664 Variant is "rare", i.e. has a Minor Allele Frequency of less than 1% in some, but not all, projects reporting frequencies, or has no frequency data. revStrand 5540864 6770772 Alleles are displayed on the + strand at the current position. dbSNP's alleles are displayed on the + strand of a different assembly sequence, so dbSNP's variant page shows alleles that are reverse-complemented with respect to the alleles displayed above. while others may indicate that the reference genome contains a rare variant or sequencing issue: keyword in data file (dbSnp155.bb) # in hg19# in hg38description refIsAmbiguous 19 41 The reference genome allele contains an IUPAC ambiguous base (e.g. 'R' for 'A or G', or 'N' for 'any base'). refIsMinor 14950212 15386394 The reference genome allele is not the major allele in at least one project. refIsRare 793081 822757 The reference genome allele is rare (i.e. allele frequency refIsSingleton 694310 712794 The reference genome allele has never been observed in a population sequencing project reporting frequencies. refMismatch 1 18 The reference genome allele reported by dbSNP differs from the GenBank assembly sequence. This is very rare and in all cases observed so far, the GenBank assembly has an 'N' while the RefSeq assembly used by dbSNP has a less ambiguous character such as 'R'. and others may indicate an anomaly or problem with the variant data: keyword in data file (dbSnp155.bb) # in hg19# in hg38description altIsAmbiguous 5294 5361 At least one alternate allele contains an IUPAC ambiguous base (e.g. 'R' for 'A or G'). For alleles containing more than one ambiguous base, this may create a combinatoric explosion of possible alleles. classMismatch 13289 18475 Variation class/type is inconsistent with alleles mapped to this genome assembly. clusterError 373258 459130 This variant has the same start, end and class as another variant; they probably should have been merged into one variant. freqIncomplete 0 0 At least one project reported counts for only one allele which implies that at least one allele is missing from the report; that project's frequency data are ignored. freqIsAmbiguous 4332 4399 At least one allele reported by at least one project that reports frequencies contains an IUPAC ambiguous base. freqNotMapped 1149972 1141935 At least one project reported allele frequencies relative to a different assembly; However, dbSNP does not include a mapping of this variant to that assembly, which implies a problem with mapping the variant across assemblies. The mapping on this assembly may have an issue; evaluate carefully vs. original submissions, which you can view by clicking through to dbSNP above. freqNotRefAlt 74139 110646 At least one allele reported by at least one project that reports frequencies does not match any of the reference or alternate alleles listed by dbSNP. multiMap 799777 286666 This variant has been mapped to more than one distinct genomic location. otherMapErr 91260 195051 At least one other mapping of this variant has erroneous coordinates. The mapping(s) with erroneous coordinates are excluded from this track and are included in the Map Err subtrack. Sometimes despite this mapping having legal coordinates, there may still be an issue with this mapping's coordinates and alleles; you may want to click through to dbSNP to compare the initial submission's coordinates and alleles. In hg19, 55454 distinct rsIDs are affected; in hg38, 86636. Data Sources and Methods dbSNP has collected genetic variant reports from researchers worldwide for more than 20 years. Since the advent of next-generation sequencing methods and the population sequencing efforts that they enable, dbSNP has grown exponentially, requiring a new data schema, computational pipeline, web infrastructure, and download files. (Holmes et al.) The same challenges of exponential growth affected UCSC's presentation of dbSNP variants, so we have taken the opportunity to change our internal representation and import pipeline. Most notably, flanking sequences are no longer provided by dbSNP, because most submissions have been genomic variant calls in VCF format as opposed to independent sequences. We downloaded JSON files available from dbSNP at http://ftp.ncbi.nlm.nih.gov/snp/archive/b155/JSON/, extracted a subset of the information about each variant, and collated it into a bigBed file using the bigDbSnp.as schema with the information necessary for filtering and displaying the variants, as well as a separate file containing more detailed information to be displayed on each variant's details page (dbSnpDetails.as schema). Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. Since dbSNP has grown to include over 1 billion variants, the size of the All dbSNP (155) subtrack can cause the Table Browser and Data Integrator to time out, leading to a blank page or truncated output, unless queries are restricted to a chromosomal region, to particular defined regions, to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser), or to one of the subset tracks such as Common (~15 million variants) or ClinVar (~0.8M variants). For automated analysis, the track data files can be downloaded from the downloads server for hg19 and hg38. file format subtrack dbSnp155.bb hg19 hg38 bigDbSnp (bigBed4+13) All dbSNP (155) dbSnp155ClinVar.bb hg19 hg38 bigDbSnp (bigBed4+13) ClinVar dbSNP (155) dbSnp155Common.bb hg19 hg38 bigDbSnp (bigBed4+13) Common dbSNP (155) dbSnp155Mult.bb hg19 hg38 bigDbSnp (bigBed4+13) Mult. dbSNP (155) dbSnp155BadCoords.bb hg19 hg38 bigBed4 Map Err (155) dbSnp155Details.tab.gz gzip-compressed tab-separated text Detailed variant properties, independent of genome assembly version Several utilities for working with bigBed-formatted binary files can be downloaded here. Run a utility with no arguments to see a brief description of the utility and its options. bigBedInfo provides summary statistics about a bigBed file including the number of items in the file. With the -as option, the output includes an autoSql definition of data columns, useful for interpreting the column values. bigBedToBed converts the binary bigBed data to tab-separated text. Output can be restricted to a particular region by using the -chrom, -start and -end options. bigBedNamedItems extracts rows for one or more rs# IDs. Example: retrieve all variants in the region chr1:200001-200400 bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb -chrom=chr1 -start=200000 -end=200400 stdout Example: retrieve variant rs6657048 bigBedNamedItems dbSnp155.bb rs6657048 stdout Example: retrieve all variants with rs# IDs in file myIds.txt bigBedNamedItems -nameFile dbSnp155.bb myIds.txt dbSnp155.myIds.bed The columns in the bigDbSnp/bigBed files and dbSnp155Details.tab.gz file are described in bigDbSnp.as and dbSnpDetails.as respectively. For columns that contain lists of allele frequency data, the order of projects providing the data listed is as follows: 1000Genomes dbGaP_PopFreq TOPMED KOREAN SGDP_PRJ Qatari NorthernSweden Siberian TWINSUK TOMMO ALSPAC GENOME_DK GnomAD GoNL Estonian Vietnamese Korea1K HapMap PRJEB36033 HGDP_Stanford Daghestan PAGE_STUDY Chileans MGP PRJEB37584 GoESP ExAC GnomAD_exomes FINRISK PharmGKB PRJEB37766 The functional effect (maxFuncImpact) for each variant contains the Sequence Ontology (SO) ID for the greatest functional impact on the gene. This field contains a 0 when no SO terms are annotated on the variant. UCSC also has an API that can be used to retrieve values from a particular chromosome range. A list of rs# IDs can be pasted/uploaded in the Variant Annotation Integrator tool to find out which genes (if any) the variants are located in, as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. References Holmes JB, Moyer E, Phan L, Maglott D, Kattman B. SPDI: Data Model for Variants and Applications at NCBI. Bioinformatics. 2019 Nov 18;. PMID: 31738401 Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, Connor R, Fiorini N, Funk K, Hefferon T et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28. PMID: 30395293; PMC: PMC6323993 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 dbSnp155ViewVariants Variants Short Genetic Variants from dbSNP release 155 Variation dbSnp155 All dbSNP(155) All Short Genetic Variants from dbSNP Release 155 Variation dbSnp155Mult Mult. dbSNP(155) Short Genetic Variants from dbSNP Release 155 that Map to Multiple Genomic Loci Variation dbSnp155ClinVar ClinVar dbSNP(155) Short Genetic Variants from dbSNP Release 155 Included in ClinVar Variation dbSnp155Common Common dbSNP(155) Common (1000 Genomes Phase 3 MAF >= 1%) Short Genetic Variants from dbSNP Release 155 Variation dbSnp155ViewErrs Mapping Errors Short Genetic Variants from dbSNP release 155 Variation dbSnp155BadCoords Map Err dbSnp(155) Mappings with Inconsistent Coordinates from dbSNP 155 Variation dbSnp153Composite dbSNP 153 Short Genetic Variants from dbSNP release 153 Variation Description This track shows short genetic variants (up to approximately 50 base pairs) from dbSNP build 153: single-nucleotide variants (SNVs), small insertions, deletions, and complex deletion/insertions (indels), relative to the reference genome assembly. Most variants in dbSNP are rare, not true polymorphisms, and some variants are known to be pathogenic. For hg38 (GRCh38), approximately 667 million distinct variants (RefSNP clusters with rs# ids) have been mapped to more than 702 million genomic locations including alternate haplotype and fix patch sequences. dbSNP remapped variants from hg38 to hg19 (GRCh37); approximately 658 million distinct variants were mapped to more than 683 million genomic locations including alternate haplotype and fix patch sequences (not all of which are included in UCSC's hg19). This track includes four subtracks of variants: All dbSNP (153): the entire set (683 million for hg19, 702 million for hg38) Common dbSNP (153): approximately 15 million variants with a minor allele frequency (MAF) of at least 1% (0.01) in the 1000 Genomes Phase 3 dataset. Variants in the Mult. subset (below) are excluded. ClinVar dbSNP (153): approximately 455,000 variants mentioned in ClinVar. Note: that includes both benign and pathogenic (as well as uncertain) variants. Variants in the Mult. subset (below) are excluded. Mult. dbSNP (153): variants that have been mapped to multiple chromosomes, for example chr1 and chr2, raising the question of whether the variant is really a variant or just a difference between duplicated sequences. There are some exceptions in which a variant is mapped to more than one reference sequence, but not culled into this set: A variant may appear in both X and Y pseudo-autosomal regions (PARs) without being included in this set. A variant may also appear in a main chromosome as well as an alternate haplotype or fix patch sequence assigned to that chromosome. A fifth subtrack highlights coordinate ranges to which dbSNP mapped a variant but with genomic coordinates that are not internally consistent, i.e. different coordinate ranges were provided when describing different alleles. This can occur due to a bug with mapping variants from one assembly sequence to another when there is an indel difference between the assembly sequences: Map Err (153): around 120,000 mappings of 55,000 distinct rsIDs for hg19 and 149,000 mappings of 86,000 distinct rsIDs for hg38. Interpreting and Configuring the Graphical Display SNVs and pure deletions are displayed as boxes covering the affected base(s). Pure insertions are drawn as single-pixel tickmarks between the base before and the base after the insertion. Insertions and/or deletions in repetitive regions may be represented by a half-height box showing uncertainty in placement, followed by a full-height box showing the number of deleted bases, or a full-height tickmark to indicate an insertion. When an insertion or deletion falls in a repetitive region, the placement may be ambiguous. For example, if the reference genome contains "TAAAG" but some individuals have "TAAG" at the same location, then the variant is a deletion of a single A relative to the reference genome. However, which A was deleted? There is no way to tell whether the first, second or third A was removed. Different variant mapping tools may place the deletion at different bases in the reference genome. To reduce errors in merging variant calls made with different left vs. right biases, dbSNP made a major change in its representation of deletion/insertion variants in build 152. Now, instead of assigning a single-base genomic location at one of the A's, dbSNP expands the coordinates to encompass the whole repetitive region, so the variant is represented as a deletion of 3 A's combined with an insertion of 2 A's. In the track display, there will be a half-height box covering the first two A's, followed by a full-height box covering the third A, to show a net loss of one base but an uncertain placement within the three A's. Variants are colored according to functional effect on genes annotated by dbSNP: Protein-altering variants and splice site variants are red. Synonymous codon variants are green. Non-coding transcript or Untranslated Region (UTR) variants are blue. On the track controls page, several variant properties can be included or excluded from the item labels: rs# identifier assigned by dbSNP, reference/alternate alleles, major/minor alleles (when available) and minor allele frequency (when available). Allele frequencies are reported independently by twelve projects (some of which may have overlapping sets of samples): 1000Genomes: The 1000 Genomes Phase 3 dataset contains data for 2,504 individuals from 26 populations. GnomAD exomes: The gnomAD v2.1 exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes in 14 populations. TOPMED: The TOPMED dataset contains phase 3 data from freeze 5 panel that include more than 60,000 individuals. The approximate ethnic breakdown is European(52%), African (31%), Hispanic or Latino (10%), and East Asian (7%) ancestry. PAGE STUDY: The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits. GnomAD genomes: The gnomAD v2.1 genome dataset includes 229 million SNVs and 33 million indels from 15,708 genomes in 9 populations. GoESP: The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. Estonian: Genetic variation in the Estonian population: pharmacogenomics study of adverse drug effects using electronic health records. ALSPAC: The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample including individuals obtained from the ALSPAC population. This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992. TWINSUK: The UK10K - TwinsUK project contains 1854 samples from the Department of Twin Research and Genetic Epidemiology (DTR). The DTR dataset contains data obtained from the 11,000 identical and non-identical twins between the ages of 16 and 85 years old. NorthernSweden: Whole-genome sequenced control population in northern Sweden reveals subregional genetic differences. This population consists of 300 whole genome sequenced human samples selected from the county of Vasterbotten in northern Sweden. To be selected for inclusion into the population, the individuals had to have reached at least 80 years of age and have no diagnosed cancer. Vietnamese: The Vietnamese Genetic Variation Database includes about 25 million variants (SNVs and indels) from 406 genomes and 305 exomes of unrelated healthy Kinh Vietnamese (KHV) people. The project from which to take allele frequency data defaults to 1000 Genomes but can be set to any of those projects. Using the track controls, variants can be filtered by minimum minor allele frequency (MAF) variation class/type (e.g. SNV, insertion, deletion) functional effect on a gene (e.g. synonymous, frameshift, intron, upstream) assorted features and anomalies noted by UCSC during processing of dbSNP's data Interesting and anomalous conditions noted by UCSC While processing the information downloaded from dbSNP, UCSC annotates some properties of interest. These are noted on the item details page, and may be useful to include or exclude affected variants. Some are purely informational: keyword in data file (dbSnp153.bb) # in hg19# in hg38description clinvar 454678 453996 Variant is in ClinVar. clinvarBenign 143864 143736 Variant is in ClinVar with clinical significance of benign and/or likely benign. clinvarConflicting 7932 7950 Variant is in ClinVar with reports of both benign and pathogenic significance. clinvarPathogenic 96242 95262 Variant is in ClinVar with clinical significance of pathogenic and/or likely pathogenic. commonAll 12184521 12438655 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in all projects reporting frequencies. commonSome 20541190 20902944 Variant is "common", i.e. has a Minor Allele Frequency of at least 1% in some, but not all, projects reporting frequencies. diffMajor 1377831 1399109 Different frequency sources have different major alleles. overlapDiffClass 107015341 110007682 This variant overlaps another variant with a different type/class. overlapSameClass 16915239 17291289 This variant overlaps another with the same type/class but different start/end. rareAll 662601770 681696398 Variant is "rare", i.e. has a Minor Allele Frequency of less than 1% in all projects reporting frequencies, or has no frequency data. rareSome 670958439 690160687 Variant is "rare", i.e. has a Minor Allele Frequency of less than 1% in some, but not all, projects reporting frequencies, or has no frequency data. revStrand 3813702 4532511 Alleles are displayed on the + strand at the current position. dbSNP's alleles are displayed on the + strand of a different assembly sequence, so dbSNP's variant page shows alleles that are reverse-complemented with respect to the alleles displayed above. while others may indicate that the reference genome contains a rare variant or sequencing issue: keyword in data file (dbSnp153.bb) # in hg19# in hg38description refIsAmbiguous 101 111 The reference genome allele contains an IUPAC ambiguous base (e.g. 'R' for 'A or G', or 'N' for 'any base'). refIsMinor 3272116 3360435 The reference genome allele is not the major allele in at least one project. refIsRare 136547 160827 The reference genome allele is rare (i.e. allele frequency refIsSingleton 37832 50927 The reference genome allele has never been observed in a population sequencing project reporting frequencies. refMismatch 4 33 The reference genome allele reported by dbSNP differs from the GenBank assembly sequence. This is very rare and in all cases observed so far, the GenBank assembly has an 'N' while the RefSeq assembly used by dbSNP has a less ambiguous character such as 'R'. and others may indicate an anomaly or problem with the variant data: keyword in data file (dbSnp153.bb) # in hg19# in hg38description altIsAmbiguous 10755 10888 At least one alternate allele contains an IUPAC ambiguous base (e.g. 'R' for 'A or G'). For alleles containing more than one ambiguous base, this may create a combinatoric explosion of possible alleles. classMismatch 5998 6216 Variation class/type is inconsistent with alleles mapped to this genome assembly. clusterError 114826 128306 This variant has the same start, end and class as another variant; they probably should have been merged into one variant. freqIncomplete 3922 4673 At least one project reported counts for only one allele which implies that at least one allele is missing from the report; that project's frequency data are ignored. freqIsAmbiguous 7656 7756 At least one allele reported by at least one project that reports frequencies contains an IUPAC ambiguous base. freqNotMapped 2685 6590 At least one project reported allele frequencies relative to a different assembly; However, dbSNP does not include a mapping of this variant to that assembly, which implies a problem with mapping the variant across assemblies. The mapping on this assembly may have an issue; evaluate carefully vs. original submissions, which you can view by clicking through to dbSNP above. freqNotRefAlt 17694 32170 At least one allele reported by at least one project that reports frequencies does not match any of the reference or alternate alleles listed by dbSNP. multiMap 562180 132123 This variant has been mapped to more than one distinct genomic location. otherMapErr 114095 204219 At least one other mapping of this variant has erroneous coordinates. The mapping(s) with erroneous coordinates are excluded from this track and are included in the Map Err subtrack. Sometimes despite this mapping having legal coordinates, there may still be an issue with this mapping's coordinates and alleles; you may want to click through to dbSNP to compare the initial submission's coordinates and alleles. In hg19, 55454 distinct rsIDs are affected; in hg38, 86636. Data Sources and Methods dbSNP has collected genetic variant reports from researchers worldwide for more than 20 years. Since the advent of next-generation sequencing methods and the population sequencing efforts that they enable, dbSNP has grown exponentially, requiring a new data schema, computational pipeline, web infrastructure, and download files. (Holmes et al.) The same challenges of exponential growth affected UCSC's presentation of dbSNP variants, so we have taken the opportunity to change our internal representation and import pipeline. Most notably, flanking sequences are no longer provided by dbSNP, because most submissions have been genomic variant calls in VCF format as opposed to independent sequences. We downloaded JSON files available from dbSNP at ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/, extracted a subset of the information about each variant, and collated it into a bigBed file using the bigDbSnp.as schema with the information necessary for filtering and displaying the variants, as well as a separate file containing more detailed information to be displayed on each variant's details page (dbSnpDetails.as schema). Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. Since dbSNP has grown to include approximately 700 million variants, the size of the All dbSNP (153) subtrack can cause the Table Browser and Data Integrator to time out, leading to a blank page or truncated output, unless queries are restricted to a chromosomal region, to particular defined regions, to a specific set of rs# IDs (which can be pasted/uploaded into the Table Browser), or to one of the subset tracks such as Common (~15 million variants) or ClinVar (~0.5M variants). For automated analysis, the track data files can be downloaded from the downloads server for hg19 and hg38. file format subtrack dbSnp153.bb hg19 hg38 bigDbSnp (bigBed4+13) All dbSNP (153) dbSnp153ClinVar.bb hg19 hg38 bigDbSnp (bigBed4+13) ClinVar dbSNP (153) dbSnp153Common.bb hg19 hg38 bigDbSnp (bigBed4+13) Common dbSNP (153) dbSnp153Mult.bb hg19 hg38 bigDbSnp (bigBed4+13) Mult. dbSNP (153) dbSnp153BadCoords.bb hg19 hg38 bigBed4 Map Err (153) dbSnp153Details.tab.gz gzip-compressed tab-separated text Detailed variant properties, independent of genome assembly version Several utilities for working with bigBed-formatted binary files can be downloaded here. Run a utility with no arguments to see a brief description of the utility and its options. bigBedInfo provides summary statistics about a bigBed file including the number of items in the file. With the -as option, the output includes an autoSql definition of data columns, useful for interpreting the column values. bigBedToBed converts the binary bigBed data to tab-separated text. Output can be restricted to a particular region by using the -chrom, -start and -end options. bigBedNamedItems extracts rows for one or more rs# IDs. Example: retrieve all variants in the region chr1:200001-200400 bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout Example: retrieve variant rs6657048 bigBedNamedItems dbSnp153.bb rs6657048 stdout Example: retrieve all variants with rs# IDs in file myIds.txt bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed The columns in the bigDbSnp/bigBed files and dbSnp153Details.tab.gz file are described in bigDbSnp.as and dbSnpDetails.as respectively. For columns that contain lists of allele frequency data, the order of projects providing the data listed is as follows: 1000Genomes GnomAD exomes TOPMED PAGE STUDY GnomAD genomes GoESP Estonian ALSPAC TWINSUK NorthernSweden Vietnamese UCSC also has an API that can be used to retrieve values from a particular chromosome range. A list of rs# IDs can be pasted/uploaded in the Variant Annotation Integrator tool to find out which genes (if any) the variants are located in, as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. References Holmes JB, Moyer E, Phan L, Maglott D, Kattman B. SPDI: Data Model for Variants and Applications at NCBI. Bioinformatics. 2019 Nov 18;. PMID: 31738401 Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, Connor R, Fiorini N, Funk K, Hefferon T et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28. PMID: 30395293; PMC: PMC6323993 Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 dbSnpArchive dbSNP Archive dbSNP Track Archive Variation Description This composite track contains information about single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP, available from ftp.ncbi.nih.gov/snp. You can click into each track for a version/subset-specific description. This collection includes numbered versions of the entire dbSNP datasets (All SNP) as well as three tracks with subsets of the items in that version. Here is information on each of the subsets: dbSNP 153: The dbSNP build 153 is composed of 5 subtracks. Click the track for a description of the subtracks. Common SNPs: SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs: SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs: SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from files like b138_SNPContigLoc.bcp.gz and b138_ContigInfo.bcp.gz. b138_SNPMapInfo.bcp.gz provides the alignment weights. Functional classification was obtained from files like b138_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation files can be downloaded in their entirety for hg38, hg19, and mm10 as (snp*.txt.gz). You can also make queries using the UCSC Genome Browser JSON API or public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download in the genome's snp*Mask folder. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 dbSnp153ViewVariants Variants Short Genetic Variants from dbSNP release 153 Variation dbSnp153 All dbSNP(153) All Short Genetic Variants from dbSNP Release 153 Variation dbSnp153Mult dbSNP(153) Mult. Short Genetic Variants from dbSNP Release 153 that Map to Multiple Genomic Loci Variation dbSnp153ClinVar dbSNP(153) in ClinVar Short Genetic Variants from dbSNP Release 153 Included in ClinVar Variation dbSnp153Common Common dbSNP(153) Common (1000 Genomes Phase 3 MAF >= 1%) Short Genetic Variants from dbSNP Release 153 Variation dbSnp153ViewErrs Mapping Errors Short Genetic Variants from dbSNP release 153 Variation dbSnp153BadCoords Map Err dbSnp(153) Mappings with Inconsistent Coordinates from dbSNP 153 Variation snp151Common Common SNPs(151) Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 151, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have a minor allele frequency (MAF) of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. Allele counts from all submissions that include frequency data are combined when determining MAF, so for example the allele counts from the 1000 Genomes Project and an independent submitter may be combined for the same variant. dbSNP provides download files in the Variant Call Format (VCF) that include a "COMMON" flag in the INFO column. That is determined by a different method, and is generally a superset of the UCSC Common set. dbSNP uses frequency data from the 1000 Genomes Project only, and considers a variant COMMON if it has a MAF of at least 0.01 in any of the five super-populations: African (AFR) Admixed American (AMR) East Asian (EAS) European (EUR) South Asian (SAS) In build 151, dbSNP marks approximately 38M variants as COMMON; 23M of those have a global MAF < 0.01. The remainder should be in agreement with UCSC's Common subset. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(151) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(151) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(151) - SNPs mapping in more than one place on reference assembly. All SNPs(151) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b151_SNPContigLoc_N.bcp.gz and b151_ContigInfo_N.bcp.gz. (N = 105 for hg19, 108 for hg38) b151_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b151_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp151*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp151 All SNPs(151) Simple Nucleotide Polymorphisms (dbSNP 151) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 151, available from ftp.ncbi.nlm.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(151): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(151): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(151): SNPs that have been mapped to multiple locations in the reference genome assembly. There are very few SNPs in this category because dbSNP has been filtering out almost all multiple-mapping SNPs since build 149. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(151) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(151) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(151) - SNPs mapping in more than one place on reference assembly. All SNPs(151) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b151_SNPContigLoc_N.bcp.gz and b151_ContigInfo_N.bcp.gz. (N = 105 for hg19, 108 for hg38) b151_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b151_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp151*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp151Flagged Flagged SNPs(151) Simple Nucleotide Polymorphisms (dbSNP 151) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 151, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(151) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(151) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(151) - SNPs mapping in more than one place on reference assembly. All SNPs(151) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b151_SNPContigLoc_N.bcp.gz and b151_ContigInfo_N.bcp.gz. (N = 105 for hg19, 108 for hg38) b151_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b151_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp151*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp151Mult Mult. SNPs(151) Simple Nucleotide Polymorphisms (dbSNP 151) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. Since build 149, dbSNP has been filtering out almost all such "SNPs" so there are very few items in this track. The default maximum weight for this track is 3, unlike the other dbSNP build 150 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(150) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly. All SNPs(150) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b150_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp150Mult Mult. SNPs(150) Simple Nucleotide Polymorphisms (dbSNP 150) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. Since build 149, dbSNP has been filtering out almost all such "SNPs" so there are very few items in this track. The default maximum weight for this track is 3, unlike the other dbSNP build 150 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(150) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly. All SNPs(150) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b150_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp150 All SNPs(150) Simple Nucleotide Polymorphisms (dbSNP 150) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(150): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(150): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(150): SNPs that have been mapped to multiple locations in the reference genome assembly. There are very few SNPs in this category because dbSNP has been filtering out almost all multiple-mapping SNPs since build 149. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly. All SNPs(150) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b150_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp150Common Common SNPs(150) Simple Nucleotide Polymorphisms (dbSNP 150) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have a minor allele frequency (MAF) of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. Allele counts from all submissions that include frequency data are combined when determining MAF, so for example the allele counts from the 1000 Genomes Project and an independent submitter may be combined for the same variant. dbSNP provides download files in the Variant Call Format (VCF) that include a "COMMON" flag in the INFO column. That is determined by a different method, and is generally a superset of the UCSC Common set. dbSNP uses frequency data from the 1000 Genomes Project only, and considers a variant COMMON if it has a MAF of at least 0.01 in any of the five super-populations: African (AFR) Admixed American (AMR) East Asian (EAS) European (EUR) South Asian (SAS) In build 151 (which has replaced build 150 on the dbSNP web and download site), dbSNP marks approximately 38M variants as COMMON; 23M of those have a global MAF < 0.01. The remainder should be in agreement with UCSC's Common subset. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly. All SNPs(150) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b150_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp150Flagged Flagged SNPs(150) Simple Nucleotide Polymorphisms (dbSNP 150) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly. All SNPs(150) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black. Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b150_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp147Mult Mult. SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 147 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(147) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(147) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(147) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(147) - SNPs mapping in more than one place on reference assembly. All SNPs(147) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are always colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc_N.bcp.gz and b147_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b147_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp147Flagged Flagged SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(147) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(147) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(147) - SNPs mapping in more than one place on reference assembly. All SNPs(147) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are always colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc_N.bcp.gz and b147_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b147_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp147Common Common SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(147) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(147) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(147) - SNPs mapping in more than one place on reference assembly. All SNPs(147) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are always colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc_N.bcp.gz and b147_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b147_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp147 All SNPs(147) Simple Nucleotide Polymorphisms (dbSNP 147) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 147, available from ftp.ncbi.nlm.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(147): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(147): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(147): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(147) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(147) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(147) - SNPs mapping in more than one place on reference assembly. All SNPs(147) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Non-coding (ncRNA): (nc_transcript_variant) are always colored blue. Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+". Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b147_SNPContigLoc_N.bcp.gz and b147_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b147_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b147_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp147*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp146Mult Mult. SNPs(146) Simple Nucleotide Polymorphisms (dbSNP 146) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 146, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 146 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(146) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(146) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(146) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(146) - SNPs mapping in more than one place on reference assembly. All SNPs(146) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b146_SNPContigLoc_N.bcp.gz and b146_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b146_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b146_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp146*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp146Flagged Flagged SNPs(146) Simple Nucleotide Polymorphisms (dbSNP 146) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 146, available from ftp.ncbi.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(146) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(146) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(146) - SNPs mapping in more than one place on reference assembly. All SNPs(146) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b146_SNPContigLoc_N.bcp.gz and b146_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b146_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b146_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp146*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp146Common Common SNPs(146) Simple Nucleotide Polymorphisms (dbSNP 146) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 146, available from ftp.ncbi.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(146) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(146) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(146) - SNPs mapping in more than one place on reference assembly. All SNPs(146) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b146_SNPContigLoc_N.bcp.gz and b146_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b146_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b146_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp146*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp146 All SNPs(146) Simple Nucleotide Polymorphisms (dbSNP 146) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 146, available from ftp.ncbi.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(146): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(146): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(146): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(146) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(146) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(146) - SNPs mapping in more than one place on reference assembly. All SNPs(146) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b146_SNPContigLoc_N.bcp.gz and b146_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b146_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b146_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp146*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp144Mult Mult. SNPs(144) Simple Nucleotide Polymorphisms (dbSNP 144) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 144, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 144 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(144) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(144) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(144) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(144) - SNPs mapping in more than one place on reference assembly. All SNPs(144) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b144_SNPContigLoc_N.bcp.gz and b144_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b144_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b144_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp144*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp144Flagged Flagged SNPs(144) Simple Nucleotide Polymorphisms (dbSNP 144) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 144, available from ftp.ncbi.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(144) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(144) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(144) - SNPs mapping in more than one place on reference assembly. All SNPs(144) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b144_SNPContigLoc_N.bcp.gz and b144_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b144_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b144_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp144*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp144Common Common SNPs(144) Simple Nucleotide Polymorphisms (dbSNP 144) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 144, available from ftp.ncbi.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(144) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(144) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(144) - SNPs mapping in more than one place on reference assembly. All SNPs(144) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b144_SNPContigLoc_N.bcp.gz and b144_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b144_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b144_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp144*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp144 All SNPs(144) Simple Nucleotide Polymorphisms (dbSNP 144) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 144, available from ftp.ncbi.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(144): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(144): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(144): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(144) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(144) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(144) - SNPs mapping in more than one place on reference assembly. All SNPs(144) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b144_GRCh38p2/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b144_SNPContigLoc_N.bcp.gz and b144_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38) b144_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b144_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp144*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp142Mult Mult. SNPs(142) Simple Nucleotide Polymorphisms (dbSNP 142) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 142, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 142 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(142) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(142) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(142) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(142) - SNPs mapping in more than one place on reference assembly. All SNPs(142) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b142_SNPContigLoc_N.bcp.gz and b142_ContigInfo_N.bcp.gz. (N = 105 for hg19, 106 for hg38) b142_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b142_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp142*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp142Flagged Flagged SNPs(142) Simple Nucleotide Polymorphisms (dbSNP 142) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 142, available from ftp.ncbi.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(142) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(142) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(142) - SNPs mapping in more than one place on reference assembly. All SNPs(142) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b142_SNPContigLoc_N.bcp.gz and b142_ContigInfo_N.bcp.gz. (N = 105 for hg19, 106 for hg38) b142_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b142_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp142*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp142Common Common SNPs(142) Simple Nucleotide Polymorphisms (dbSNP 142) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 142, available from ftp.ncbi.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(142) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(142) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(142) - SNPs mapping in more than one place on reference assembly. All SNPs(142) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b142_SNPContigLoc_N.bcp.gz and b142_ContigInfo_N.bcp.gz. (N = 105 for hg19, 106 for hg38) b142_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b142_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp142*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp142 All SNPs(142) Simple Nucleotide Polymorphisms (dbSNP 142) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 142, available from ftp.ncbi.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(142): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(142): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(142): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(142) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(142) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(142) - SNPs mapping in more than one place on reference assembly. All SNPs(142) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/database/organism_data/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/database/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b142_GRCh38/rs_fasta/ for hg38. Coordinates, orientation, location type and dbSNP reference allele data were obtained from b142_SNPContigLoc_N.bcp.gz and b142_ContigInfo_N.bcp.gz. (N = 105 for hg19, 106 for hg38) b142_SNPMapInfo_N.bcp.gz provided the alignment weights. Functional classification was obtained from b142_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp142*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp141Mult Mult. SNPs(141) Simple Nucleotide Polymorphisms (dbSNP 141) That Map to Multiple Genomic Loci Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 141, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 141 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(141) track because of its maximum weight filter. The remainder of this page is identical on the following tracks: Common SNPs(141) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(141) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(141) - SNPs mapping in more than one place on reference assembly. All SNPs(141) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b141_SNPContigLoc.bcp.gz and b141_ContigInfo.bcp.gz. b141_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b141_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp141*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp141Flagged Flagged SNPs(141) Simple Nucleotide Polymorphisms (dbSNP 141) Flagged by dbSNP as Clinically Assoc Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 141, available from ftp.ncbi.nih.gov/snp. Only SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%, are included in this subset. Frequency data are not available for all SNPs, so this subset probably includes some SNPs whose true minor allele frequency is 1% or greater. The significance of any particular variant in this track should be interpreted only by a trained medical geneticist using all available information. For example, some variants are included in this track because of their inclusion in a Locus-Specific Database (LSDB) or mention in OMIM, but are not thought to be disease-causing, so inclusion of a variant in this track is not necessarily an indicator of risk. Again, all available information must be carefully considered by a qualified professional. The remainder of this page is identical on the following tracks: Common SNPs(141) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(141) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(141) - SNPs mapping in more than one place on reference assembly. All SNPs(141) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b141_SNPContigLoc.bcp.gz and b141_ContigInfo.bcp.gz. b141_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b141_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp141*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp141Common Common SNPs(141) Simple Nucleotide Polymorphisms (dbSNP 141) Found in >= 1% of Samples Variation Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 141, available from ftp.ncbi.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks: Common SNPs(141) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(141) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(141) - SNPs mapping in more than one place on reference assembly. All SNPs(141) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b141_SNPContigLoc.bcp.gz and b141_ContigInfo.bcp.gz. b141_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b141_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp141*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 snp141 All SNPs(141) Simple Nucleotide Polymorphisms (dbSNP 141) Variation Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 141, available from ftp.ncbi.nih.gov/snp. Three tracks contain subsets of the items in this track: Common SNPs(141): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Flagged SNPs(141): SNPs flagged as clinically associated by dbSNP, mapped to a single location in the reference genome assembly, and not known to have a minor allele frequency of at least 1%. Frequency data are not available for all SNPs, so this subset may include some SNPs whose true minor allele frequency is 1% or greater. Mult. SNPs(141): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks: Common SNPs(141) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Flagged SNPs(141) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele! Mult. SNPs(141) - SNPs mapping in more than one place on reference assembly. All SNPs(141) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b141_SNPContigLoc.bcp.gz and b141_ContigInfo.bcp.gz. b141_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b141_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp141*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 cons100way Conservation Vertebrate Multiz Alignment & Conservation (100 Species) Comparative Genomics Downloads for data in this track are available: Multiz alignments (MAF format), and phylogenetic trees PhyloP conservation (WIG format) PhastCons conservation (WIG format) Description This track shows multiple alignments of 100 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species. The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. PHAST/Multiz are built from chains ("alignable") and nets ("syntenic"), see the documentation of the Chain/Net tracks for a description of the complete alignment process. PhastCons is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more "texture" at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to "runs" of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites). Another important difference is that phyloP can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). In the phyloP plots, sites predicted to be conserved are assigned positive scores (and shown in blue), while sites predicted to be fast-evolving are assigned negative scores (and shown in red). The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution. The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1. Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as missing data, and both were run with the same parameters. See also: lastz parameters and other details and chain minimum score and gap parameters used in these alignments. UCSC has repeatmasked and aligned all genome assemblies, and provides all the sequences for download. For genome assemblies not available in the genome browser, there are alternative assembly hub genome browsers. Missing sequence in any assembly is highlighted in the track display by regions of yellow when zoomed out and by Ns when displayed at base level (see Gap Annotation, below). Primate subset OrganismSpeciesRelease dateUCSC versionAlignment type BaboonPapio hamadryasMar 2012Baylor Panu_2.0/papAnu2Reciprocal best net BushbabyOtolemur garnettiiMar 2011Broad/otoGar3Syntenic net ChimpPan troglodytesFeb 2011CSAC 2.1.4/panTro4Syntenic net Crab-eating macaqueMacaca fascicularisJun 2013Macaca_fascicularis_5.0/macFas5Syntenic net GibbonNomascus leucogenysOct 2012GGSC Nleu3.0/nomLeu3Syntenic net GorillaGorilla gorilla gorillaMay 2011gorGor3.1/gorGor3Reciprocal best net Green monkeyChlorocebus sabaeusMar 2014Chlorocebus_sabeus 1.1/chlSab2Syntenic net HumanHomo sapiensDec 2013GRCh38/hg38reference species MarmosetCallithrix jacchusMar 2009WUGSC 3.2/calJac3Syntenic net OrangutanPongo pygmaeus abeliiJuly 2007WUGSC 2.0.2/ponAbe2Reciprocal best net RhesusMacaca mulattaOct 2010BGI CR_1.0/rheMac3Syntenic net Squirrel monkeySaimiri boliviensisOct 2011Broad/saiBol1Syntenic net Euarchontoglires subset Brush-tailed ratOctodon degusApr 2012OctDeg1.0/octDeg1Syntenic net ChinchillaChinchilla lanigeraMay 2012 ChiLan1.0/chiLan1Syntenic net Chinese hamsterCricetulus griseusJul 2013C_griseus_v1.0/criGri1Syntenic net Chinese tree shrewTupaia chinensisJan 2013TupChi_1.0/tupChi1Syntenic net Golden hamsterMesocricetus auratusMar 2013MesAur1.0/mesAur1Syntenic net Guinea pigCavia porcellusFeb 2008Broad/cavPor3Syntenic net Lesser Egyptian jerboaJaculus jaculusMay 2012JacJac1.0/jacJac1Syntenic net MouseMus musculusDec 2011GRCm38/mm10Syntenic net Naked mole-ratHeterocephalus glaberJan 2012Broad HetGla_female_1.0/hetGla2Syntenic net PikaOchotona princepsMay 2012OchPri3.0/ochPri3Syntenic net Prairie voleMicrotus ochrogasterOct 2012MicOch1.0/micOch1Syntenic net RabbitOryctolagus cuniculusApr 2009Broad/oryCun2Syntenic net RatRattus norvegicusJul 2014RGSC 6.0/rn6Syntenic net SquirrelSpermophilus tridecemlineatusNov 2011Broad/speTri2Syntenic net Laurasiatheria subset AlpacaVicugna pacosMar 2013Vicugna_pacos-2.0.1/vicPac2Syntenic net Bactrian camelCamelus ferusDec 2011CB1/camFer1Syntenic net Big brown batEptesicus fuscusJul 2012EptFus1.0/eptFus1Syntenic net Black flying-foxPteropus alectoAug 2012ASM32557v1/pteAle1Syntenic net CatFelis catusNov 2014ICGSC Felis_catus 8.0/felCat8Syntenic net CowBos taurusJun 2014Bos_taurus_UMD_3.1.1/bosTau8Syntenic net David's myotis batMyotis davidiiAug 2012ASM32734v1/myoDav1Syntenic net DogCanis lupus familiarisSep 2011Broad CanFam3.1/canFam3Syntenic net DolphinTursiops truncatusOct 2011Baylor Ttru_1.4/turTru2Reciprocal best net Domestic goatCapra hircusMay 2012CHIR_1.0/capHir1Syntenic net Ferret Mustela putorius furoApr 2011MusPutFur1.0/musFur1Syntenic net HedgehogErinaceus europaeusMay 2012EriEur2.0/eriEur2Syntenic net HorseEquus caballusSep 2007Broad/equCab2Syntenic net Killer whaleOrcinus orcaJan 2013Oorc_1.1/orcOrc1Syntenic net MegabatPteropus vampyrusJul 2008Broad/pteVam1Reciprocal best net Little brown batMyotis lucifugusJul 2010Broad Institute Myoluc2.0/myoLuc2Syntenic net Pacific walrusOdobenus rosmarus divergensJan 2013Oros_1.0/odoRosDiv1Syntenic net PandaAiluropoda melanoleucaDec 2009BGI-Shenzhen 1.0/ailMel1Syntenic net PigSus scrofaAug 2011SGSC Sscrofa10.2/susScr3Syntenic net SheepOvis ariesAug 2012ISGC Oar_v3.1/oviAri3Syntenic net ShrewSorex araneusAug 2008Broad/sorAra2Syntenic net Star-nosed moleCondylura cristataMar 2012ConCri1.0/conCri1Syntenic net Tibetan antelopePantholops hodgsoniiMay 2013PHO1.0/panHod1Syntenic net Weddell sealLeptonychotes weddelliiMar 2013LepWed1.0/lepWed1Reciprocal best net White rhinocerosCeratotherium simumMay 2012CerSimSim1.0/cerSim1Syntenic net Afrotheria subset AardvarkOrycteropus afer aferMay 2012OryAfe1.0/oryAfe1Syntenic net Cape elephant shrewElephantulus edwardiiAug 2012EleEdw1.0/eleEdw1Syntenic net Cape golden moleChrysochloris asiaticaAug 2012ChrAsi1.0/chrAsi1Syntenic net ElephantLoxodonta africanaJul 2009Broad/loxAfr3Syntenic net ManateeTrichechus manatus latirostrisOct 2011Broad v1.0/triMan1Syntenic net TenrecEchinops telfairiNov 2012Broad/echTel2Syntenic net Mammal subset ArmadilloDasypus novemcinctusDec 2011Baylor/dasNov3Syntenic net OpossumMonodelphis domesticaOct 2006Broad/monDom5Net PlatypusOrnithorhynchus anatinusMar 2007WUGSC 5.0.1/ornAna1Reciprocal best net Tasmanian devilSarcophilus harrisiiFeb 2011WTSI Devil_ref v7.0/sarHar1Net WallabyMacropus eugeniiSep 2009TWGS Meug_1.1/macEug2Reciprocal best net Aves subset BudgerigarMelopsittacus undulatusSep 2011WUSTL v6.3/melUnd1Net ChickenGallus gallusNov 2011ICGSC Gallus_gallus-4.0/galGal4Net Collared flycatcherFicedula albicollisJun 2013FicAlb1.5/ficAlb2Net Mallard duckAnas platyrhynchosApr 2013BGI_duck_1.0/anaPla1Net Medium ground finchGeospiza fortisApr 2012GeoFor_1.0/geoFor1Net ParrotAmazona vittataJan 2013AV1/amaVit1Net Peregrine falconFalco peregrinusFeb 2013F_peregrinus_v1.0/falPer1Net Rock pigeonColumba liviaFeb 2013Cliv_1.0/colLiv1Net Saker falconFalco cherrugFeb 2013F_cherrug_v1.0/falChe1Net Scarlet macawAra macaoJun 2013SMACv1.1/araMac1Net Tibetan ground jayPseudopodoces humilisJan 2013PseHum1.0/pseHum1Net TurkeyMeleagris gallopavoDec 2009TGC Turkey_2.01/melGal1Net White-throated sparrowZonotrichia albicollisApr 2013ASM38545v1/zonAlb1Net Zebra finchTaeniopygia guttataFeb 2013WashU taeGut324/taeGut2Net Sarcopterygii subset American alligatorAlligator mississippiensisAug 2012allMis0.2/allMis1Net Chinese softshell turtlePelodiscus sinensisOct 2011PelSin_1.0/pelSin1Net CoelacanthLatimeria chalumnaeAug 2011Broad/latCha1Net Green seaturtleChelonia mydasMar 2013CheMyd_1.0/cheMyd1Net LizardAnolis carolinensisMay 2010Broad AnoCar2.0/anoCar2Net Painted turtleChrysemys picta belliiMar 2014v3.0.3/chrPic2Net Spiny softshell turtleApalone spiniferaMay 2013ASM38561v1/apaSpi1Net X. tropicalisXenopus tropicalisSep 2012JGI 7.0/xenTro7Net Fish subset Atlantic codGadus morhuaMay 2010Genofisk GadMor_May2010/gadMor1Net Burton's mouthbreederHaplochromis burtoniOct 2011AstBur1.0/hapBur1Net FuguTakifugu rubripesOct 2011FUGU5/fr3Net LampreyPetromyzon marinusSep 2010WUGSC 7.0/petMar2Net MedakaOryzias latipesOct 2005NIG/UT MEDAKA1/oryLat2Net Mexican tetra (cavefish)Astyanax mexicanusApr 2013Astyanax_mexicanus-1.0.2/astMex1Net Nile tilapiaOreochromis niloticusJan 2011Broad oreNil1.1/oreNil2Net Princess of BurundiNeolamprologus brichardiMay 2011NeoBri1.0/neoBri1Net Pundamilia nyerereiPundamilia nyerereiOct 2011PunNye1.0/punNye1Net Southern platyfishXiphophorus maculatusJan 2012Xiphophorus_maculatus-4.4.2/xipMac1Net Spotted garLepisosteus oculatusDec 2011LepOcu1/lepOcu1Net SticklebackGasterosteus aculeatusFeb 2006Broad/gasAcu1Net TetraodonTetraodon nigroviridisMar 2007Genoscope 8.0/tetNig2Net Yellowbelly pufferfishTakifugu flavidusMay 2013version 1 of Takifugu flavidus genome/takFla1Net Zebra mbunaMaylandia zebraMar 2012MetZeb1.1/mayZeb1Net ZebrafishDanio rerioSep 2014GRCz10/danRer10Net Table 1. Genome assemblies included in the 100-way Conservation track. Display Conventions and Configuration In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: No bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: Aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: Aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: Represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: Enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: The gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: The annotations from the genome displayed in the Default species to establish reading frame pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: Codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: Codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation: Gene TrackSpecies UCSC GenesHuman, Mouse RefSeq GenesCow, Frog (X. tropicalis) Ensembl Genes v73Atlantic cod, Bushbaby, Cat, Chicken, Chimp, Coelacanth, Dog, Elephant, Ferret, Fugu, Gorilla, Horse, Lamprey, Little brown bat, Lizard, Mallard duck, Marmoset, Medaka, Megabat, Orangutan, Panda, Pig, Platypus, Rat, Soft-shell Turtle, Southern platyfish, Squirrel, Tasmanian devil, Tetraodon, Zebrafish no annotationAardvark, Alpaca, American alligator, Armadillo, Baboon, Bactrian camel, Big brown bat, Black flying-fox, Brush-tailed rat, Budgerigar, Burton's mouthbreeder, Cape elephant shrew, Cape golden mole, Chinchilla, Chinese hamster, Chinese tree shrew, Collared flycatcher, Crab-eating macaque, David's myotis (bat), Dolphin, Domestic goat, Gibbon, Golden hamster, Green monkey, Green seaturtle, Hedgehog, Killer whale, Lesser Egyptian jerboa, Manatee, Medium ground finch, Mexican tetra (cavefish), Naked mole-rat, Nile tilapia, Pacific walrus, Painted turtle, Parrot, Peregrine falcon, Pika, Prairie vole, Princess of Burundi, Pundamilia nyererei, Rhesus, Rock pigeon, Saker falcon, Scarlet Macaw, Sheep, Shrew, Spiny softshell turtle, Spotted gar, Squirrel monkey, Star-nosed mole, Tawny puffer fish, Tenrec, Tibetan antelope, Tibetan ground jay, Wallaby, Weddell seal, White rhinoceros, White-throated sparrow, Zebra Mbuna, Zebra finch Table 2. Gene tracks used for codon translation. Methods Pairwise alignments with the human genome were generated for each species using lastz from repeat-masked genomic sequence. Pairwise alignments were then linked into chains using a dynamic programming algorithm that finds maximally scoring chains of gapless subsections of the alignments organized in a kd-tree. The scoring matrix and parameters for pairwise alignment and chaining were tuned for each species based on phylogenetic distance from the reference. High-scoring chains were then placed along the genome, with gaps filled by lower-scoring chains, to produce an alignment net. For more information about the chaining and netting process and parameters for each species, see the description pages for the Chain and Net tracks. An additional filtering step was introduced in the generation of the 100-way conservation track to reduce the number of paralogs and pseudogenes from the high-quality assemblies and the suspect alignments from the low-quality assemblies: the pairwise alignments of high-quality mammalian sequences (placental and marsupial) were filtered based on synteny; those for 2X mammalian genomes were filtered to retain only alignments of best quality in both the target and query ("reciprocal best"). The resulting best-in-genome pairwise alignments were progressively aligned using multiz/autoMZ, following the tree topology diagrammed above, to produce multiple alignments. The multiple alignments were post-processed to add annotations indicating alignment gaps, genomic breaks, and base quality of the component sequences. The annotated multiple alignments, in MAF format, are available for bulk download. An alignment summary table containing an entry for each alignment block in each species was generated to improve track display performance at large scales. Framing tables were constructed to enable visualization of codons in the multiple alignment display. Phylogenetic Tree Model Both phastCons and phyloP are phylogenetic methods that rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The all-species tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 100-way alignment (msa_view). The 4d sites were derived from the RefSeq (Reviewed+Coding) gene set, filtered to select single-coverage long transcripts. This same tree model was used in the phyloP calculations; however, the background frequencies were modified to maintain reversibility. The resulting tree model: all species. PhastCons Conservation The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size; therefore, short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. 2005. The phastCons parameters used were: expected-length=45, target-coverage=0.3, rho=0.3. PhyloP Conservation The phyloP program supports several different methods for computing p-values of conservation or acceleration, for individual nucleotides or larger elements ( http://compgen.cshl.edu/phast/). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., the --subtree option was not used). The scores were computed by performing a likelihood ratio test at each alignment column (--method LRT), and scores for both conservation and acceleration were produced (--mode CONACC). Conserved Elements The conserved elements were predicted by running phastCons with the --viterbi option. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created using the following programs: Alignment tools: lastz (formerly blastz) and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC Conservation scoring: phastCons, phyloP, phyloFit, tree_doctor, msa_view and other programs in PHAST by Adam Siepel at Cold Spring Harbor Laboratory (original development done at the Haussler lab at UCSC). MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC Tree image generator: phyloPng by Galt Barber, UCSC Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle display), and Brian Raney (gap annotation and codon framing) at UCSC The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community. Thanks to Giacomo Bernardi for help with the fish relationships. References Phylo-HMMs, phastCons, and phyloP: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. PMID: 8583911 Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010 Jan;20(1):110-21. PMID: 19858363; PMC: PMC2798823 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. DOI: 10.1007/0-387-27733-1_12 Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. PMID: 7713447; PMC: PMC1206396 Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Lastz (formerly Blastz): Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Harris RS. Improved pairwise alignment of genomic DNA. Ph.D. Thesis. Pennsylvania State University, USA. 2007. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. PMID: 11743200 cons100wayViewalign Multiz Alignments Vertebrate Multiz Alignment & Conservation (100 Species) Comparative Genomics multiz100way Multiz Align Multiz Alignments of 100 Vertebrates Comparative Genomics cons100wayViewphastcons Element Conservation (phastCons) Vertebrate Multiz Alignment & Conservation (100 Species) Comparative Genomics phastCons100way Cons 100 Verts 100 vertebrates conservation by PhastCons Comparative Genomics cons100wayViewelements Conserved Elements Vertebrate Multiz Alignment & Conservation (100 Species) Comparative Genomics phastConsElements100way 100 Vert. El 100 vertebrates Conserved Elements Comparative Genomics cons100wayViewphyloP Basewise Conservation (phyloP) Vertebrate Multiz Alignment & Conservation (100 Species) Comparative Genomics phyloP100way Cons 100 Verts 100 vertebrates Basewise Conservation by PhyloP Comparative Genomics cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 crossTissueMapsTissueCellType Cross Tissue Nuclei Cross tissue nuclei RNA by tissue and cell type Single Cell RNA-seq Description This track collection shows data from Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. The dataset covers ~200,000 single nuclei from a total of 16 human donors across 25 samples, using 4 different sample preparation protocols followed by droplet based single-cell RNA-seq. The samples were obtained from frozen tissue as part of the Genotype-Tissue Expression (GTEx) project. Samples were taken from the esophagus, skeletal muscle, heart, lung, prostate, breast, and skin. The dataset includes 43 broad cell classes, some specific to certain tissues and some shared across all tissue types. This track collection contains three bar chart tracks of RNA expression. The first track, Cross Tissue Nuclei, allows cells to be grouped together and faceted on up to 4 categories: tissue, cell class, cell subclass, and cell type. The second track, Cross Tissue Details, allows cells to be grouped together and faceted on up to 7 categories: tissue, cell class, cell subclass, cell type, granular cell type, sex, and donor. The third track, GTEx Immune Atlas, allows cells to be grouped together and faceted on up to 5 categories: tissue, cell type, cell class, sex, and donor. Please see the GTEx portal for further interactive displays and additional data. Display Conventions and Configuration Tissue-cell type combinations in the Full and Combined tracks are colored by which cell type they belong to in the below table: Color Cell Type Endothelial Epithelial Glia Immune Neuron Stromal Other Tissue-cell type combinations in the Immune Atlas track are shaded according to the below table: Color Cell Type Inflammatory Macrophage Lung Macrophage Monocyte/Macrophage FCGR3A High Monocyte/Macrophage FCGR3A Low Macrophage HLAII High Macrophage LYVE1 High Proliferating Macrophage Dendritic Cell 1 Dendritic Cell 2 Mature Dendritic Cell Langerhans CD14+ Monocyte CD16+ Monocyte LAM-like Other Methods Using the previously collected tissue samples from the Genotype-Tissue Expression project, nuclei were isolated using four different protocols and sequenced using droplet based single cell RNA-seq. CellBender v2.1 and other standard quality control techniques were applied, resulting in 209,126 nuclei profiles across eight tissues, with a mean of 918 genes and 1519 transcripts per profile. Data from all samples was integrated with a conditional variation autoencoder in order to correct for multiple sources of variation like sex, and protocol while preserving tissue and cell type specific effects. For detailed methods, please refer to Eraslan et al, or the GTEx portal website. UCSC Methods The gene expression files were downloaded from the GTEx portal. The UCSC command line utilities matrixClusterColumns, matrixToBarChartBed, and bedToBigBed were used to transform these into a bar chart format bigBed file that can be visualized. The UCSC utilities can be found on our download server. Data Access The raw bar chart data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. Credits Thanks to the GTEx Consortium for creating and analyzing these data. References Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Slyper M, Wang J, Van Wittenberghe N, Rouhana JM, Waldman J et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science. 2022 May 13;376(6594):eabl4290. PMID: 35549429; PMC: PMC9383269 crossTissueMaps Cross Tissue Nuclei Single Nuclei sequenced across many tissues Single Cell RNA-seq Description This track collection shows data from Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. The dataset covers ~200,000 single nuclei from a total of 16 human donors across 25 samples, using 4 different sample preparation protocols followed by droplet based single-cell RNA-seq. The samples were obtained from frozen tissue as part of the Genotype-Tissue Expression (GTEx) project. Samples were taken from the esophagus, skeletal muscle, heart, lung, prostate, breast, and skin. The dataset includes 43 broad cell classes, some specific to certain tissues and some shared across all tissue types. This track collection contains three bar chart tracks of RNA expression. The first track, Cross Tissue Nuclei, allows cells to be grouped together and faceted on up to 4 categories: tissue, cell class, cell subclass, and cell type. The second track, Cross Tissue Details, allows cells to be grouped together and faceted on up to 7 categories: tissue, cell class, cell subclass, cell type, granular cell type, sex, and donor. The third track, GTEx Immune Atlas, allows cells to be grouped together and faceted on up to 5 categories: tissue, cell type, cell class, sex, and donor. Please see the GTEx portal for further interactive displays and additional data. Display Conventions and Configuration Tissue-cell type combinations in the Full and Combined tracks are colored by which cell type they belong to in the below table: Color Cell Type Endothelial Epithelial Glia Immune Neuron Stromal Other Tissue-cell type combinations in the Immune Atlas track are shaded according to the below table: Color Cell Type Inflammatory Macrophage Lung Macrophage Monocyte/Macrophage FCGR3A High Monocyte/Macrophage FCGR3A Low Macrophage HLAII High Macrophage LYVE1 High Proliferating Macrophage Dendritic Cell 1 Dendritic Cell 2 Mature Dendritic Cell Langerhans CD14+ Monocyte CD16+ Monocyte LAM-like Other Methods Using the previously collected tissue samples from the Genotype-Tissue Expression project, nuclei were isolated using four different protocols and sequenced using droplet based single cell RNA-seq. CellBender v2.1 and other standard quality control techniques were applied, resulting in 209,126 nuclei profiles across eight tissues, with a mean of 918 genes and 1519 transcripts per profile. Data from all samples was integrated with a conditional variation autoencoder in order to correct for multiple sources of variation like sex, and protocol while preserving tissue and cell type specific effects. For detailed methods, please refer to Eraslan et al, or the GTEx portal website. UCSC Methods The gene expression files were downloaded from the GTEx portal. The UCSC command line utilities matrixClusterColumns, matrixToBarChartBed, and bedToBigBed were used to transform these into a bar chart format bigBed file that can be visualized. The UCSC utilities can be found on our download server. Data Access The raw bar chart data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. Credits Thanks to the GTEx Consortium for creating and analyzing these data. References Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Slyper M, Wang J, Van Wittenberghe N, Rouhana JM, Waldman J et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science. 2022 May 13;376(6594):eabl4290. PMID: 35549429; PMC: PMC9383269 encodeCcreCombined ENCODE cCREs ENCODE Candidate Cis-Regulatory Elements (cCREs) combined from all cell types Regulation Description This track displays the ENCODE Registry of candidate cis-Regulatory Elements (cCREs) in the human genome, a total of 926,535 elements identified and classified by the ENCODE Data Analysis Center according to biochemical signatures. cCREs are the subset of representative DNase hypersensitive sites across ENCODE and Roadmap Epigenomics samples that are supported by either histone modifications (H3K4me3 and H3K27ac) or CTCF-binding data. The Registry of cCREs is one of the core components of the integrative level of the ENCODE Encyclopedia of DNA Elements. Additional exploration of the cCRE's and underlying raw ENCODE data is provided by the SCREEN (Search Candidate cis-Regulatory Elements) web tool, designed specifically for the Registry, accessible by linkouts from the track details page. The cCREs identified in the mouse genome are available in a companion track, here. The related cCREs by Biosample composite track presents ccREs and associated epigenetic signal in all individual biosamples in a large matrix. Additional views of the data are provided by the ENCODE Integrative Megahub. --> Display Conventions and Configuration CCREs are colored and labeled according to classification by regulatory signature: Color UCSC label ENCODE classification ENCODE label red prom promoter-like signature PLS orange enhP proximal enhancer-like signature pELS yellow enhD distal enhancer-like signature dELS pink K4m3 DNase-H3K4me3 DNase-H3K4me3 blue CTCF CTCF-only CTCF-only The DNase-H3K4me3 elements are those with promoter-like biochemical signature that are not within 200bp of an annotated TSS. Methods All individual DNase hypsersensitive sites (DHSs) identified from 706 DNase-seq experiments in humans (a total of 93 million sites from 706 experiments) were iteratively clustered and filtered for the highest signal across all experiments, producing representative DHSs (rDHSs), with a total of 2.2 million such sites in human. The highest signal elements from this set that were also supported by high H3K4me3, H3K27ac and/or CTCF ChIP-seq signals were designated cCRE's (a total of 926,535 in human). Classification of cCRE's was performed based on the following criteria: cCREs with promoter-like signatures (cCRE-PLS) fall within 200 bp of an annotated GENCODE TSS and have high DNase and H3K4me3 signals. cCREs with enhancer-like signatures (cCRE-ELS) have high DNase and H3K27ac with low H3K4me3 max-Z score if they are within 200 bp of an annotated TSS. The subset of cCREs-ELS within 2 kb of a TSS is denoted proximal (cCRE-pELS), while the remaining subset is denoted distal (cCRE-dELS). DNase-H3K4me3 cCREs have high H3K4me3 max-Z scores but low H3K27ac max-Z scores and do not fall within 200 bp of a TSS. CTCF-only cCREs have high DNase and CTCF and low H3K4me3 and H3K27ac. The GENCODE V24 (Ensembl 33) basic gene annotation set was used in this analysis. For further detail about the identification and classification of ENCODE cCREs see the About page of the SCREEN web tool. Data Access The ENCODE accession numbers of the constituent datasets at the ENCODE Portal are available from the cCRE details page. The data in this track can be interactively explored with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API, the track name is "encodeCcreCombined". For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called encodeCcreCombined.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/encode3/ccre/encodeCcreCombined.bb -chrom=chr21 -start=0 -end=100000000 stdout Release Notes This annotation is based on ENCODE data released on or before September 14, 2018. Data from the Common fund supported Roadmap Epigenomics Mapping Consortium (REMC) were included for building the ENCODE cCREs. Please see the 2015 paper on their analysis of reference human genomes for more information. Credits This dataset was produced by the ENCODE Data Analysis Center (ZLab at UMass Medical Center). Please check the ZLab ENCODE Public Hubs for the most updated data. Thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI for providing this data. Thanks also to the ENCODE Consortium, the ENCODE production laboratories, and the ENCODE Data Coordination Center for generating and processing the datasets used here. References ENCODE Project Consortium. Expanded Encyclopedias of DNA Elements in the Human and Mouse Genomes. Nature. 2020 July 30;583(7818):699-710 ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMC: PMC3439153 ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMC: PMC3079585 fixSeqLiftOverPsl Fix Patches Reference Assembly Fix Patch Sequence Alignments Mapping and Sequencing Description This track shows alignments of fix patch sequences to main chromosome sequences in the reference genome assembly. When errors are corrected in the reference genome assembly, the Genome Reference Consortium (GRC) adds fix patch sequences containing the corrected regions. This strikes a balance between providing the most complete and correct genome sequence, while maintaining stable chromosome coordinates for the original assembly sequences. Fix patches are often associated with incident reports displayed in the GRC Incidents track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. Mismatching bases are highlighted in red. Several types of alignment gap may also be colored; for more information, see Alignment Insertion/Deletion Display Options. Credits The alignments were provided by NCBI as GFF files and translated into the PSL representation for browser display by UCSC. knownGene GENCODE V46 GENCODE V46 Genes and Gene Predictions Description The GENCODE Genes track (version 46, May 2024) shows high-quality manual annotations merged with evidence-based automated annotations across the entire human genome generated by the GENCODE project. By default, only the basic gene set is displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts that GENCODE believes will be useful to the majority of users. The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes are not displayed by default. It contains annotations on the reference chromosomes as well as assembly patches and alternative loci (haplotypes). The v46 release was derived from the GTF file that contains annotations only on the main chromosomes. Statistics for this build and information on how they were generated can be found on the GENCODE site. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes. It includes options to display the entire GENCODE set and pseudogenes. To customize these options, the respective boxes can be checked or unchecked at the top of this description page. This track also includes a variety of labels which identify the transcripts when visibility is set to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID (uc001yve.4), UniProt Display ID (Q7RTP0). Additional information about gene and transcript names can be found in our FAQ. This track, in general, follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. Coloring for the gene annotations is mostly based on the annotation type: MANE: MANE Select Plus Clinical transcripts. For non-MANE transcripts, the following conventions apply. coding: protein coding transcripts, including polymorphic pseudogenes non-coding: non-protein coding transcripts pseudogene: pseudogene transcript annotations problem: problem transcripts (Biotypes of retained_intron, TEC, or disrupted_domain) This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. There is also an option to display the data as a density graph, which can be helpful for visualizing the distribution of items over a region. Squishy-pack Display Within a gene using the pack display mode, transcripts below a specified rank will be condensed into a view similar to squish mode. The transcript ranking approach is preliminary and will change in future releases. The transcripts rankings are defined by the following criteria for protein-coding and non-coding genes: Protein_coding genes MANE or Ensembl canonical 1st: MANE Select / Ensembl canonical 2nd: MANE Plus Clinical Coding biotypes 1st: protein_coding and protein_coding_LoF 2nd: NMDs and NSDs 3rd: retained intron and protein_coding_CDS_not_defined Completeness 1st: full length 2nd: CDS start/end not found CARS score (only for coding transcripts) Transcript genomic span and length (only for non-coding transcripts) Non-coding genes Transcript biotype 1st: transcript biotype identical to gene biotype Ensembl canonical GENCODE basic Transcript genomic span Transcript length Methods The GENCODE v46 track was built from the GENCODE downloads file gencode.v46.chr_patch_hapl_scaff.annotation.gff3.gz. Data from other sources were correlated with the GENCODE data to build association tables. Related Data The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a downloadable file. One can see a full list of the associated tables in the Table Browser by selecting GENCODE Genes from the track menu; this list is then available on the table menu. Data access GENCODE Genes and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. The genePred format files for hg38 are available from our downloads directory or in our GTF download directory. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. Credits The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a computational pipeline developed by Jim Kent and Brian Raney. This version of the track was generated by Jonathan Casper. References Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, Sisu C, Wright JC, Arnan C, Barnes I et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D942-D949. PMID: 36420896; PMC: PMC9825462 A full list of GENCODE publications is available at The GENCODE Project web site. Data Release Policy GENCODE data are available for use without restrictions. gnomadVariantsV4 gnomAD v4 Pre-Release Genome Aggregation Database (gnomAD) Genome Variants v4.0.0 Pre-Release Variation Description The gnomAD v4 track shows variants from 807,162 individuals, including 730,947 exomes and 76,215 genomes. This release includes the 76,156 genomes from the gnomAD v3.1.2 release as well as new exome data from 416,555 UK Biobank individuals. For more detailed information on gnomAD v4, see the related blog post. For now, the track is just the raw VCFs as provided by gnomAD, although a version of the track similar to v3.1.1 may be created in the future. The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post. The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog. GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes. The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level. The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README. gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site. On hg38 only, a subtrack "Gnomad mutational constraint" aka "Genome non-coding constraint of haploinsufficient variation (Gnocchi)" captures the depletion of variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions, too. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see Chen et al 2024 in the Reference section for details). The chrX scores were added as received from the authors, as there are no mutations available for chrX, they are more speculative than the ones on the autosomes. For questions on the gnomAD data, also see the gnomAD FAQ. More details on the Variant type(s) can be found on the Sequence Ontology page. Display Conventions and Configuration gnomAD v4 The gnomAD v4 track follows the standard display and configuration options available for VCF tracks, briefly explained below. In dense mode, a vertical line is drawn at the position of each variant. In pack mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts. gnomAD v3.1.1 The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track, except as noted below. There is a Non-cancer filter used to exclude/include variants from samples of individuals who were not ascertained for having cancer in a cancer study. There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only). Where possible, variants overlapping multiple transcripts/genes have been collapsed into one variant, with additional information available on the details page, which has roughly halved the number of items in the bigBed. The bigBed has been split into two files, one with the information necessary for the track display, and one with the information necessary for the details page. For more information on this data format, please see the Data Access section below. The VEP annotation is shown as a table instead of spread across multiple fields. Intergenic variants have not been pre-filtered. gnomAD v3.1 By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters described below), before the track switches to dense display mode. Mouse hover on an item will display many details about each variant, including the affected gene(s), the variant type, and annotation (missense, synonymous, etc). Clicking on an item will display additional details on the variant, including a population frequency table showing allele count in each sub-population. Following the conventions on the gnomAD browser, items are shaded according to their Annotation type: pLoF Missense Synonymous Other Label Options To maintain consistency with the gnomAD website, variants are by default labeled according to their chromosomal start position followed by the reference and alternate alleles, for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional label, if the variant is present in dbSnp. Filtering Options Three filters are available for these tracks: FILTER: Used to exclude/include variants that failed Random Forest (RF), Inbreeding Coefficient (Inbreeding Coeff), or Allele Count (AC0) filters. The PASS option is used to include/exclude variants that pass all of the RF, InbreedingCoeff, and AC0 filters, as denoted in the original VCF. Annotation type: Used to exclude/include variants that are annotated as Probability Loss of Function (pLoF), Missense, Synonymous, or Other, as annotated by VEP version 85 (GENCODE v19). Variant Type: Used to exclude/include variants according to the type of variation, as annotated by VEP v85. There is one additional configurable filter on the minimum minor allele frequency. gnomAD v2.1.1 The gnomAD v2.1.1 track follows the standard display and configuration options available for VCF tracks, briefly explained below. In dense mode, a vertical line is drawn at the position of each variant. In pack mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts. Filtering Options Four filters are available for these tracks, the same as the underlying VCF: AC0: Allele Count 0 after filtering out low confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls)) InbreedingCoeff: Inbreeding Coefficient < -0.3 RF: Used to exclude/include variants that failed Random Forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels) Pass: Variant passes all 3 filters There are two additional filters available, one for the minimum minor allele frequency, and a configurable filter on the QUAL score. UCSC Methods The gnomAD v3.1.1 data is unfiltered. For the v3.1 update only, in order to cut down on the amount of displayed data, the following variant types have been filtered out, but are still viewable in the gnomAD browser: Regulatory Region Variants Downstream/Upstream Gene Variants Transcription Factor Binding Site Variants For the full steps used to create the track at UCSC, please see the section denoted "gnomAD v3.1 update" in the hg38 makedoc. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Variant VCFs can be found in the vcf/ subdirectory. The v3.1 and v3.1.1 variants can be found in a special directory as they have been transformed from the underlying VCF. For the v3.1.1 variants in particular, the underlying bigBed only contains enough information necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are available in the same directory as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip compressed extra data in JSON format, and the .gzi file is available to speed searching of this data. Each variant has an associated md5sum in the name field of the bigBed which can be used along with the _dataOffset and _dataLen fields to get the associated external data, as show below: # find item of interest: bigBedToBed genomes.bb stdout | head -4 | tail -1 chr1 12416 12417 854246d79dc5d02dcdbd5f5438542b6e [..omitted for brevity..] chr1-12417-G-A 67293 902 # use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data: bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz 854246d79dc5d02dcdbd5f5438542b6e {"DDX11L1": {"cons": ["non_coding_transcript_variant", [..omitted for brevity..] The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. The mutational constraints score was updated in October 2022 from a previous, now deprecated, pre-publication version. The old version can be found in our archive directory on the download server. It can be loaded by copying the URL into our "Custom tracks" input box. Credits Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the Creative Commons Zero Public Domain Dedication as described here. Please note that some annotations within the provided files may have restrictions on usage. See here for more information. References Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. doi: https://doi.org/10.1101/531210. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207 Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024 Jan;625(7993):92-100. PMID: 38057664 (We added the data in 2021, then later referenced the 2022 Biorxiv preprint, in which the track was not called "Gnocchi" yet) gnomadVariants gnomAD Variants Genome Aggregation Database (gnomAD) Genome and Exome Variants Variation Description The gnomAD v4 track shows variants from 807,162 individuals, including 730,947 exomes and 76,215 genomes. This release includes the 76,156 genomes from the gnomAD v3.1.2 release as well as new exome data from 416,555 UK Biobank individuals. For more detailed information on gnomAD v4, see the related blog post. For now, the track is just the raw VCFs as provided by gnomAD, although a version of the track similar to v3.1.1 may be created in the future. The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post. The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog. GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes. The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level. The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README. gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site. On hg38 only, a subtrack "Gnomad mutational constraint" aka "Genome non-coding constraint of haploinsufficient variation (Gnocchi)" captures the depletion of variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions, too. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see Chen et al 2024 in the Reference section for details). The chrX scores were added as received from the authors, as there are no mutations available for chrX, they are more speculative than the ones on the autosomes. For questions on the gnomAD data, also see the gnomAD FAQ. More details on the Variant type(s) can be found on the Sequence Ontology page. Display Conventions and Configuration gnomAD v4 The gnomAD v4 track follows the standard display and configuration options available for VCF tracks, briefly explained below. In dense mode, a vertical line is drawn at the position of each variant. In pack mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts. gnomAD v3.1.1 The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track, except as noted below. There is a Non-cancer filter used to exclude/include variants from samples of individuals who were not ascertained for having cancer in a cancer study. There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only). Where possible, variants overlapping multiple transcripts/genes have been collapsed into one variant, with additional information available on the details page, which has roughly halved the number of items in the bigBed. The bigBed has been split into two files, one with the information necessary for the track display, and one with the information necessary for the details page. For more information on this data format, please see the Data Access section below. The VEP annotation is shown as a table instead of spread across multiple fields. Intergenic variants have not been pre-filtered. gnomAD v3.1 By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters described below), before the track switches to dense display mode. Mouse hover on an item will display many details about each variant, including the affected gene(s), the variant type, and annotation (missense, synonymous, etc). Clicking on an item will display additional details on the variant, including a population frequency table showing allele count in each sub-population. Following the conventions on the gnomAD browser, items are shaded according to their Annotation type: pLoF Missense Synonymous Other Label Options To maintain consistency with the gnomAD website, variants are by default labeled according to their chromosomal start position followed by the reference and alternate alleles, for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional label, if the variant is present in dbSnp. Filtering Options Three filters are available for these tracks: FILTER: Used to exclude/include variants that failed Random Forest (RF), Inbreeding Coefficient (Inbreeding Coeff), or Allele Count (AC0) filters. The PASS option is used to include/exclude variants that pass all of the RF, InbreedingCoeff, and AC0 filters, as denoted in the original VCF. Annotation type: Used to exclude/include variants that are annotated as Probability Loss of Function (pLoF), Missense, Synonymous, or Other, as annotated by VEP version 85 (GENCODE v19). Variant Type: Used to exclude/include variants according to the type of variation, as annotated by VEP v85. There is one additional configurable filter on the minimum minor allele frequency. gnomAD v2.1.1 The gnomAD v2.1.1 track follows the standard display and configuration options available for VCF tracks, briefly explained below. In dense mode, a vertical line is drawn at the position of each variant. In pack mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts. Filtering Options Four filters are available for these tracks, the same as the underlying VCF: AC0: Allele Count 0 after filtering out low confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls)) InbreedingCoeff: Inbreeding Coefficient < -0.3 RF: Used to exclude/include variants that failed Random Forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels) Pass: Variant passes all 3 filters There are two additional filters available, one for the minimum minor allele frequency, and a configurable filter on the QUAL score. UCSC Methods The gnomAD v3.1.1 data is unfiltered. For the v3.1 update only, in order to cut down on the amount of displayed data, the following variant types have been filtered out, but are still viewable in the gnomAD browser: Regulatory Region Variants Downstream/Upstream Gene Variants Transcription Factor Binding Site Variants For the full steps used to create the track at UCSC, please see the section denoted "gnomAD v3.1 update" in the hg38 makedoc. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Variant VCFs can be found in the vcf/ subdirectory. The v3.1 and v3.1.1 variants can be found in a special directory as they have been transformed from the underlying VCF. For the v3.1.1 variants in particular, the underlying bigBed only contains enough information necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are available in the same directory as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip compressed extra data in JSON format, and the .gzi file is available to speed searching of this data. Each variant has an associated md5sum in the name field of the bigBed which can be used along with the _dataOffset and _dataLen fields to get the associated external data, as show below: # find item of interest: bigBedToBed genomes.bb stdout | head -4 | tail -1 chr1 12416 12417 854246d79dc5d02dcdbd5f5438542b6e [..omitted for brevity..] chr1-12417-G-A 67293 902 # use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data: bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz 854246d79dc5d02dcdbd5f5438542b6e {"DDX11L1": {"cons": ["non_coding_transcript_variant", [..omitted for brevity..] The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. The mutational constraints score was updated in October 2022 from a previous, now deprecated, pre-publication version. The old version can be found in our archive directory on the download server. It can be loaded by copying the URL into our "Custom tracks" input box. Credits Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the Creative Commons Zero Public Domain Dedication as described here. Please note that some annotations within the provided files may have restrictions on usage. See here for more information. References Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. doi: https://doi.org/10.1101/531210. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207 Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024 Jan;625(7993):92-100. PMID: 38057664 (We added the data in 2021, then later referenced the 2022 Biorxiv preprint, in which the track was not called "Gnocchi" yet) gnomadGenomesV4 gnomAD4 Genome Vars Genome Aggregation Database (gnomAD) Genomes Variants v4.0.0 Pre-Release Variation gnomadExomesV4 gnomAD4 Exome Vars Genome Aggregation Database (gnomAD) Exomes Variants v4.0.0 Pre-Release Variation gtexGeneV8 GTEx Gene V8 Gene Expression in 54 tissues from GTEx RNA-seq of 17382 samples, 948 donors (V8, Aug 2019) Expression Description The NIH Genotype-Tissue Expression (GTEx) project was created to establish a sample and data resource for studies on the relationship between genetic variation and gene expression in multiple human tissues. This track shows median gene expression levels in 52 tissues and 2 cell lines, based on RNA-seq data from the GTEx final data release (V8, August 2019). This release is based on data from 17,382 tissue samples obtained from 948 adult post-mortem individuals. Display Conventions In Full and Pack display modes, expression for each gene is represented by a colored bargraph, where the height of each bar represents the median expression level across all samples for a tissue, and the bar color indicates the tissue. Tissue colors were assigned to conform to the GTEx Consortium publication conventions.       The bargraph display has the same width and tissue order for all genes. Mouse hover over a bar will show the tissue and median expression level. The Squish display mode draws a rectangle for each gene, colored to indicate the tissue with highest expression level if it contributes more than 10% to the overall expression (and colored black if no tissue predominates). In Dense mode, the darkness of the grayscale rectangle displayed for the gene reflects the total median expression level across all tissues. The GTEx transcript model used to quantify expression level is displayed below the graph, colored to indicate the transcript class (coding, noncoding, pseudogene, problem), following GENCODE conventions. Click-through on a graph displays a boxplot of expression level quartiles with outliers, per tissue, along with a link to the corresponding gene page on the GTEx Portal. The track configuration page provides controls to limit the genes and tissues displayed, and to select raw or log transformed expression level display. Methods Tissue samples were obtained using the GTEx standard operating procedures for informed consent and tissue collection, in conjunction with the National Cancer Institute Biorepositories and Biospecimen. All tissue specimens were reviewed by pathologists to characterize and verify organ source. Images from stained tissue samples can be viewed via the NCI histopathology viewer. The Qiagen PAXgene non-formalin tissue preservation product was used to stabilize tissue specimens without cross-linking biomolecules. RNA-seq was performed by the GTEx Laboratory, Data Analysis and Coordinating Center (LDACC) at the Broad Institute. The Illumina TruSeq protocol was used to create an unstranded polyA+ library sequenced on the Illumina HiSeq 2000 and HiSeq 2500 platforms to produce 76-bp paired end reads with a coverage goal of 50M (median achieved was ~82M total reads). Sequence reads were aligned to the hg38/GRCh38 human genome using STAR v2.5.3a assisted by the GENCODE 26 transcriptome definition. The alignment pipeline is available here. Gene annotations were produced using a custom isoform collapsing procedure that excluded retained intron and read through transcripts, merged overlapping exon intervals and then excluded exon intervals overlapping between genes. Gene expression levels in TPM were called via the RNA-SeQC tool (v1.1.9), after filtering for unique mapping, proper pairing, and exon overlap. For further method details, see the GTEx Portal Documentation page. UCSC obtained the gene-level expression files, gene annotations and sample metadata from the GTEx Portal Download page. Median expression level in TPM was computed per gene/per tissue. Subject and Sample Characteristics The scientific goal of the GTEx project required that the donors and their biospecimen present with no evidence of disease. The tissue types collected were chosen based on their clinical significance, logistical feasibility and their relevance to the scientific goal of the project and the research community. Summary plots of GTEx sample characteristics are available at the GTEx Portal Tissue Summary page. Data Access The raw data for the GTEx Gene expression track can be accessed interactively through the Table Browser or Data Integrator. Metadata can be found in the connected tables below. gtexGeneModelV8 describes the gene names and coordinates in genePred format. hgFixed.gtexTissueV8 lists each of the 53 tissues in alphabetical order, corresponding to the comma separated expression values in gtexGeneV8. hgFixed.gtexSampleDataV8 has TPM expression scores for each individual gene-sample data point, connected to gtexSampleV8. hgFixed.gtexSampleV8 contains metadata about sample time, collection site, and tissue, connected to the donor field in the gtexDonorV8 table. hgFixed.gtexDonorV8 has anonymized information on the tissue donor. For automated analysis and downloads, the track data files can be downloaded from our downloads server or the JSON API. Individual regions or the whole genome annotation can be accessed as text using our utility bigBedToBed. Instructions for downloading the utility can be found here. That utility can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/gtex/gtexGeneV8.bb -chrom=chr21 -start=0 -end=100000000 stdout Data can also be obtained directly from GTEx at the following link: https://gtexportal.org/home/datasets Credits Statistical analysis and data interpretation was performed by The GTEx Consortium Analysis Working Group. Data was provided by the GTEx LDACC at The Broad Institute of MIT and Harvard. References GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020 Sep 11;369(6509):1318-1330. PMID: 32913098; PMC: PMC7737656 GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013 Jun;45(6):580-5. PMID: 23715323; PMC: PMC4010069 Carithers LJ, Ardlie K, Barcus M, Branton PA, Britton A, Buia SA, Compton CC, DeLuca DS, Peter-Demchok J, Gelfand ET et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank. 2015 Oct;13(5):311-9. PMID: 26484571; PMC: PMC4675181 Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015 May 8;348(6235):660-5. PMID: 25954002; PMC: PMC4547472 DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, Reich M, Winckler W, Getz G. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics. 2012 Jun 1;28(11):1530-2. PMID: 22539670; PMC: PMC3356847 jarvis JARVIS JARVIS: score to prioritize non-coding regions for disease relevance Phenotype and Literature Description The "Constraint scores" container track includes several subtracks showing the results of constraint prediction algorithms. These try to find regions of negative selection, where variations likely have functional impact. The algorithms do not use multi-species alignments to derive evolutionary constraint, but use primarily human variation, usually from variants collected by gnomAD (see the gnomAD V2 or V3 tracks on hg19 and hg38) or TOPMED (contained in our dbSNP tracks and available as a filter). One of the subtracks is based on UK Biobank variants, which are not available publicly, so we have no track with the raw data. The number of human genomes that are used as the input for these scores are 76k, 53k and 110k for gnomAD, TOPMED and UK Biobank, respectively. Note that another important constraint score, gnomAD constraint, is not part of this container track but can be found in the hg38 gnomAD track. The algorithms included in this track are: JARVIS - "Junk" Annotation genome-wide Residual Variation Intolerance Score: JARVIS scores were created by first scanning the entire genome with a sliding-window approach (using a 1-nucleotide step), recording the number of all TOPMED variants and common variants, irrespective of their predicted effect, within each window, to eventually calculate a single-nucleotide resolution genome-wide residual variation intolerance score (gwRVIS). That score, gwRVIS was then combined with primary genomic sequence context, and additional genomic annotations with a multi-module deep learning framework to infer pathogenicity of noncoding regions that still remains naive to existing phylogenetic conservation metrics. The higher the score, the more deleterious the prediction. This score covers the entire genome, except the gaps. HMC - Homologous Missense Constraint: Homologous Missense Constraint (HMC) is a amino acid level measure of genetic intolerance of missense variants within human populations. For all assessable amino-acid positions in Pfam domains, the number of missense substitutions directly observed in gnomAD (Observed) was counted and compared to the expected value under a neutral evolution model (Expected). The upper limit of a 95% confidence interval for the Observed/Expected ratio is defined as the HMC score. Missense variants disrupting the amino-acid positions with HMC<0.8 are predicted to be likely deleterious. This score only covers PFAM domains within coding regions. MetaDome - Tolerance Landscape Score (hg19 only): MetaDome Tolerance Landscape scores are computed as a missense over synonymous variant count ratio, which is calculated in a sliding window (with a size of 21 codons/residues) to provide a per-position indication of regional tolerance to missense variation. The variant database was gnomAD and the score corrected for codon composition. Scores <0.7 are considered intolerant. This score covers only coding regions. MTR - Missense Tolerance Ratio (hg19 only): Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across sliding windows of 31 codons (default) and uses observed standing variation data from the WES component of gnomAD / the Exome Aggregation Consortium Database (ExAC), version 2.0. Scores were computed using Ensembl v95 release. The number of gnomAD 2 exomes used here is higher than the number of gnomAD 3 samples (125 exoms versus 76k full genomes), but this score only covers coding regions. UK Biobank depletion rank score (hg38 only): Halldorsson et al. tabulated the number of UK Biobank variants in each 500bp window of the genome and compared this number to an expected number given the heptamer nucleotide composition of the window and the fraction of heptamers with a sequence variant across the genome and their mutational classes. A variant depletion score was computed for every overlapping set of 500-bp windows in the genome with a 50-bp step size. They then assigned a rank (depletion rank (DR)) from 0 (most depletion) to 100 (least depletion) for each 500-bp window. Since the windows are overlapping, we plot the value only in the central 50bp of the 500bp window, following advice from the author of the score, Hakon Jonsson, deCODE Genetics. He suggested that the value of the central window, rather than the worst possible score of all overlapping windows, is the most informative for a position. This score covers almost the entire genome, only very few regions were excluded, where the genome sequence had too many gap characters. Display Conventions and Configuration JARVIS JARVIS scores are shown as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The scores were downloaded and converted to a single bigWig file. Move the mouse over the bars to display the exact values. A horizontal line is shown at the 0.733 value which signifies the 90th percentile. See hg19 makeDoc and hg38 makeDoc. Interpretation: The authors offer a suggested guideline of > 0.9998 for identifying higher confidence calls and minimizing false positives. In addition to that strict threshold, the following two more relaxed cutoffs can be used to explore additional hits. Note that these thresholds are offered as guidelines and are not necessarily representative of pathogenicity. PercentileJARVIS score threshold 99th0.9998 95th0.9826 90th0.7338 HMC HMC scores are displayed as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The highly-constrained cutoff of 0.8 is indicated with a line. Interpretation: A protein residue with HMC score <1 indicates that missense variants affecting the homologous residues are significantly under negative selection (P-value < 0.05) and likely to be deleterious. A more stringent score threshold of HMC<0.8 is recommended to prioritize predicted disease-associated variants. MetaDome MetaDome data can be found on two tracks, MetaDome and MetaDome All Data. The MetaDome track should be used by default for data exploration. In this track the raw data containing the MetaDome tolerance scores were converted into a signal ("wiggle") track. Since this data was computed on the proteome, there was a small amount of coordinate overlap, roughly 0.42%. In these regions the lowest possible score was chosen for display in the track to maintain sensitivity. For this reason, if a protein variant is being evaluated, the MetaDome All Data track can be used to validate the score. More information on this data can be found in the MetaDome FAQ. Interpretation: The authors suggest the following guidelines for evaluating intolerance. By default, the MetaDome track displays a horizontal line at 0.7 which signifies the first intolerant bin. For more information see the MetaDome publication. ClassificationMetaDome Tolerance Score Highly intolerant≤ 0.175 Intolerant≤ 0.525 Slightly intolerant≤ 0.7 MTR MTR data can be found on two tracks, MTR All data and MTR Scores. In the MTR Scores track the data has been converted into 4 separate signal tracks representing each base pair mutation, with the lowest possible score shown when multiple transcripts overlap at a position. Overlaps can happen since this score is derived from transcripts and multiple transcripts can overlap. A horizontal line is drawn on the 0.8 score line to roughly represent the 25th percentile, meaning the items below may be of particular interest. It is recommended that the data be explored using this version of the track, as it condenses the information substantially while retaining the magnitude of the data. Any specific point mutations of interest can then be researched in the MTR All data track. This track contains all of the information from MTRV2 including more than 3 possible scores per base when transcripts overlap. A mouse-over on this track shows the ref and alt allele, as well as the MTR score and the MTR score percentile. Filters are available for MTR score, False Discovery Rate (FDR), MTR percentile, and variant consequence. By default, only items in the bottom 25 percentile are shown. Items in the track are colored according to their MTR percentile: Green items MTR percentiles over 75 Black items MTR percentiles between 25 and 75 Red items MTR percentiles below 25 Blue items No MTR score Interpretation: Regions with low MTR scores were seen to be enriched with pathogenic variants. For example, ClinVar pathogenic variants were seen to have an average score of 0.77 whereas ClinVar benign variants had an average score of 0.92. Further validation using the FATHMM cancer-associated training dataset saw that scores less than 0.5 contained 8.6% of the pathogenic variants while only containing 0.9% of neutral variants. In summary, lower scores are more likely to represent pathogenic variants whereas higher scores could be pathogenic, but have a higher chance to be a false positive. For more information see the MTR-Viewer publication. Methods JARVIS Scores were downloaded and converted to a single bigWig file. See the hg19 makeDoc and the hg38 makeDoc for more info. HMC Scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log. MetaDome The authors provided a bed file containing codon coordinates along with the scores. This file was parsed with a python script to create the two tracks. For the first track the scores were aggregated for each coordinate, then the lowest score chosen for any overlaps and the result written out to bedGraph format. The file was then converted to bigWig with the bedGraphToBigWig utility. For the second track the file was reorganized into a bed 4+3 and conveted to bigBed with the bedToBigBed utility. See the hg19 makeDoc for details including the build script. The raw MetaDome data can also be accessed via their Zenodo handle. MTR V2 file was downloaded and columns were reshuffled as well as itemRgb added for the MTR All data track. For the MTR Scores track the file was parsed with a python script to pull out the highest possible MTR score for each of the 3 possible mutations at each base pair and 4 tracks built out of these values representing each mutation. See the hg19 makeDoc entry on MTR for more info. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig or bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g., bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg38/hmc/hmc.bw stdout Please refer to our Data Access FAQ for more information. Credits Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slave Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. Additional thanks to Laurens van de Wiel for providing the MetaDome data as well as guidance on the track development and interpretation. References Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646 Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware Genetic constraint at single amino acid resolution improves missense variant prioritisation and gene discovery. Medrxiv 2022.02.16.22271023 Wiel L, Baakman C, Gilissen D, Veltman JA, Vriend G, Gilissen C. MetaDome: Pathogenicity analysis of genetic variants through aggregation of homologous human protein domains. Hum Mutat. 2019 Aug;40(8):1030-1038. PMID: 31116477; PMC: PMC6772141 Silk M, Petrovski S, Ascher DB. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126. PMID: 31170280; PMC: PMC6602522 Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO et al. The sequences of 150,119 genomes in the UK Biobank. Nature. 2022 Jul;607(7920):732-740. PMID: 35859178; PMC: PMC9329122 constraintSuper Constraint scores Human constraint scores Phenotype and Literature Description The "Constraint scores" container track includes several subtracks showing the results of constraint prediction algorithms. These try to find regions of negative selection, where variations likely have functional impact. The algorithms do not use multi-species alignments to derive evolutionary constraint, but use primarily human variation, usually from variants collected by gnomAD (see the gnomAD V2 or V3 tracks on hg19 and hg38) or TOPMED (contained in our dbSNP tracks and available as a filter). One of the subtracks is based on UK Biobank variants, which are not available publicly, so we have no track with the raw data. The number of human genomes that are used as the input for these scores are 76k, 53k and 110k for gnomAD, TOPMED and UK Biobank, respectively. Note that another important constraint score, gnomAD constraint, is not part of this container track but can be found in the hg38 gnomAD track. The algorithms included in this track are: JARVIS - "Junk" Annotation genome-wide Residual Variation Intolerance Score: JARVIS scores were created by first scanning the entire genome with a sliding-window approach (using a 1-nucleotide step), recording the number of all TOPMED variants and common variants, irrespective of their predicted effect, within each window, to eventually calculate a single-nucleotide resolution genome-wide residual variation intolerance score (gwRVIS). That score, gwRVIS was then combined with primary genomic sequence context, and additional genomic annotations with a multi-module deep learning framework to infer pathogenicity of noncoding regions that still remains naive to existing phylogenetic conservation metrics. The higher the score, the more deleterious the prediction. This score covers the entire genome, except the gaps. HMC - Homologous Missense Constraint: Homologous Missense Constraint (HMC) is a amino acid level measure of genetic intolerance of missense variants within human populations. For all assessable amino-acid positions in Pfam domains, the number of missense substitutions directly observed in gnomAD (Observed) was counted and compared to the expected value under a neutral evolution model (Expected). The upper limit of a 95% confidence interval for the Observed/Expected ratio is defined as the HMC score. Missense variants disrupting the amino-acid positions with HMC<0.8 are predicted to be likely deleterious. This score only covers PFAM domains within coding regions. MetaDome - Tolerance Landscape Score (hg19 only): MetaDome Tolerance Landscape scores are computed as a missense over synonymous variant count ratio, which is calculated in a sliding window (with a size of 21 codons/residues) to provide a per-position indication of regional tolerance to missense variation. The variant database was gnomAD and the score corrected for codon composition. Scores <0.7 are considered intolerant. This score covers only coding regions. MTR - Missense Tolerance Ratio (hg19 only): Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across sliding windows of 31 codons (default) and uses observed standing variation data from the WES component of gnomAD / the Exome Aggregation Consortium Database (ExAC), version 2.0. Scores were computed using Ensembl v95 release. The number of gnomAD 2 exomes used here is higher than the number of gnomAD 3 samples (125 exoms versus 76k full genomes), but this score only covers coding regions. UK Biobank depletion rank score (hg38 only): Halldorsson et al. tabulated the number of UK Biobank variants in each 500bp window of the genome and compared this number to an expected number given the heptamer nucleotide composition of the window and the fraction of heptamers with a sequence variant across the genome and their mutational classes. A variant depletion score was computed for every overlapping set of 500-bp windows in the genome with a 50-bp step size. They then assigned a rank (depletion rank (DR)) from 0 (most depletion) to 100 (least depletion) for each 500-bp window. Since the windows are overlapping, we plot the value only in the central 50bp of the 500bp window, following advice from the author of the score, Hakon Jonsson, deCODE Genetics. He suggested that the value of the central window, rather than the worst possible score of all overlapping windows, is the most informative for a position. This score covers almost the entire genome, only very few regions were excluded, where the genome sequence had too many gap characters. Display Conventions and Configuration JARVIS JARVIS scores are shown as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The scores were downloaded and converted to a single bigWig file. Move the mouse over the bars to display the exact values. A horizontal line is shown at the 0.733 value which signifies the 90th percentile. See hg19 makeDoc and hg38 makeDoc. Interpretation: The authors offer a suggested guideline of > 0.9998 for identifying higher confidence calls and minimizing false positives. In addition to that strict threshold, the following two more relaxed cutoffs can be used to explore additional hits. Note that these thresholds are offered as guidelines and are not necessarily representative of pathogenicity. PercentileJARVIS score threshold 99th0.9998 95th0.9826 90th0.7338 HMC HMC scores are displayed as a signal ("wiggle") track, with one score per genome position. Mousing over the bars displays the exact values. The highly-constrained cutoff of 0.8 is indicated with a line. Interpretation: A protein residue with HMC score <1 indicates that missense variants affecting the homologous residues are significantly under negative selection (P-value < 0.05) and likely to be deleterious. A more stringent score threshold of HMC<0.8 is recommended to prioritize predicted disease-associated variants. MetaDome MetaDome data can be found on two tracks, MetaDome and MetaDome All Data. The MetaDome track should be used by default for data exploration. In this track the raw data containing the MetaDome tolerance scores were converted into a signal ("wiggle") track. Since this data was computed on the proteome, there was a small amount of coordinate overlap, roughly 0.42%. In these regions the lowest possible score was chosen for display in the track to maintain sensitivity. For this reason, if a protein variant is being evaluated, the MetaDome All Data track can be used to validate the score. More information on this data can be found in the MetaDome FAQ. Interpretation: The authors suggest the following guidelines for evaluating intolerance. By default, the MetaDome track displays a horizontal line at 0.7 which signifies the first intolerant bin. For more information see the MetaDome publication. ClassificationMetaDome Tolerance Score Highly intolerant≤ 0.175 Intolerant≤ 0.525 Slightly intolerant≤ 0.7 MTR MTR data can be found on two tracks, MTR All data and MTR Scores. In the MTR Scores track the data has been converted into 4 separate signal tracks representing each base pair mutation, with the lowest possible score shown when multiple transcripts overlap at a position. Overlaps can happen since this score is derived from transcripts and multiple transcripts can overlap. A horizontal line is drawn on the 0.8 score line to roughly represent the 25th percentile, meaning the items below may be of particular interest. It is recommended that the data be explored using this version of the track, as it condenses the information substantially while retaining the magnitude of the data. Any specific point mutations of interest can then be researched in the MTR All data track. This track contains all of the information from MTRV2 including more than 3 possible scores per base when transcripts overlap. A mouse-over on this track shows the ref and alt allele, as well as the MTR score and the MTR score percentile. Filters are available for MTR score, False Discovery Rate (FDR), MTR percentile, and variant consequence. By default, only items in the bottom 25 percentile are shown. Items in the track are colored according to their MTR percentile: Green items MTR percentiles over 75 Black items MTR percentiles between 25 and 75 Red items MTR percentiles below 25 Blue items No MTR score Interpretation: Regions with low MTR scores were seen to be enriched with pathogenic variants. For example, ClinVar pathogenic variants were seen to have an average score of 0.77 whereas ClinVar benign variants had an average score of 0.92. Further validation using the FATHMM cancer-associated training dataset saw that scores less than 0.5 contained 8.6% of the pathogenic variants while only containing 0.9% of neutral variants. In summary, lower scores are more likely to represent pathogenic variants whereas higher scores could be pathogenic, but have a higher chance to be a false positive. For more information see the MTR-Viewer publication. Methods JARVIS Scores were downloaded and converted to a single bigWig file. See the hg19 makeDoc and the hg38 makeDoc for more info. HMC Scores were downloaded and converted to .bedGraph files with a custom Python script. The bedGraph files were then converted to bigWig files, as documented in our makeDoc hg19 build log. MetaDome The authors provided a bed file containing codon coordinates along with the scores. This file was parsed with a python script to create the two tracks. For the first track the scores were aggregated for each coordinate, then the lowest score chosen for any overlaps and the result written out to bedGraph format. The file was then converted to bigWig with the bedGraphToBigWig utility. For the second track the file was reorganized into a bed 4+3 and conveted to bigBed with the bedToBigBed utility. See the hg19 makeDoc for details including the build script. The raw MetaDome data can also be accessed via their Zenodo handle. MTR V2 file was downloaded and columns were reshuffled as well as itemRgb added for the MTR All data track. For the MTR Scores track the file was parsed with a python script to pull out the highest possible MTR score for each of the 3 possible mutations at each base pair and 4 tracks built out of these values representing each mutation. See the hg19 makeDoc entry on MTR for more info. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig or bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g., bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg38/hmc/hmc.bw stdout Please refer to our Data Access FAQ for more information. Credits Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slave Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. Additional thanks to Laurens van de Wiel for providing the MetaDome data as well as guidance on the track development and interpretation. References Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat Commun. 2021 Mar 8;12(1):1504. PMID: 33686085; PMC: PMC7940646 Xiaolei Zhang, Pantazis I. Theotokis, Nicholas Li, the SHaRe Investigators, Caroline F. Wright, Kaitlin E. Samocha, Nicola Whiffin, James S. Ware Genetic constraint at single amino acid resolution improves missense variant prioritisation and gene discovery. Medrxiv 2022.02.16.22271023 Wiel L, Baakman C, Gilissen D, Veltman JA, Vriend G, Gilissen C. MetaDome: Pathogenicity analysis of genetic variants through aggregation of homologous human protein domains. Hum Mutat. 2019 Aug;40(8):1030-1038. PMID: 31116477; PMC: PMC6772141 Silk M, Petrovski S, Ascher DB. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126. PMID: 31170280; PMC: PMC6602522 Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO et al. The sequences of 150,119 genomes in the UK Biobank. Nature. 2022 Jul;607(7920):732-740. PMID: 35859178; PMC: PMC9329122 omimAvSnp OMIM Alleles OMIM Allelic Variant Phenotypes Phenotype and Literature Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles)     Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes)     The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci)     Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the allelic variants in the Online Mendelian Inheritance in Man (OMIM) database that have associated dbSNP identifiers. Display Conventions and Configuration Genomic positions of OMIM allelic variants are marked by solid blocks, which appear as tick marks when zoomed out. The details page for each variant displays the allelic variant description, the amino acid replacement, and the associated dbSNP and/or ClinVar identifiers with links to the variant's details at those resources. The descriptions of OMIM entries are shown on the main browser display when Full display mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Methods This track was constructed as follows: The OMIM allelic variant data file mimAV.txt was obtained from OMIM and loaded into the MySQL table omimAv. The genomic position for each allelic variant in omimAv with an associated dbSnp identifier was obtained from the snp151 table. The OMIM AV identifiers and their corresponding genomic positions from dbSNP were then loaded into the omimAvSnp table. Data Updates This track is automatically updated once a week from OMIM data. The most recent update time is shown at the top of the track documentation page. Data Access Because OMIM has only allowed Data queries within individual chromosomes, no download files are available from the Genome Browser. Full genome datasets can be downloaded directly from the OMIM Downloads page. All genome-wide downloads are freely available from OMIM after registration. If you need the OMIM data in exactly the format of the UCSC Genome Browser, for example if you are running a UCSC Genome Browser local installation (a partial "mirror"), please create a user account on omim.org and contact OMIM via https://omim.org/contact. Send them your OMIM account name and request access to the UCSC Genome Browser 'entitlement'. They will then grant you access to a MySQL/MariaDB data dump that contains all UCSC Genome Browser OMIM tables. UCSC offers queries within chromosomes from Table Browser that include a variety of filtering options and cross-referencing other datasets using our Data Integrator tool. UCSC also has an API that can be used to retrieve data in JSON format from a particular chromosome range. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. PMID: 18842627; PMC: PMC2686440 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. PMID: 15608251; PMC: PMC539987 omimContainer OMIM Online Mendelian Inheritance in Man Phenotype and Literature OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles) - Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes) - The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes: Gene Unknown (OMIM Cyto Loci) - Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. Clicking into the individual tracks provides additional information including display conventions. recombAvg Recomb. deCODE Avg Recombination rate: deCODE Genetics, average from paternal and maternal (mat for chrX) Mapping and Sequencing Description The recombination rate track represents calculated rates of recombination based on the genetic maps from deCODE (Halldorsson et al., 2019) and 1000 Genomes (2013 Phase 3 release, lifted from hg19). The deCODE map is more recent, has a higher resolution and was natively created on hg38 and therefore recommended. For the Recomb. deCODE average track, the recombination rates for chrX represent the female rate. This track also includes a subtrack with all the individual deCODE recombination events and another subtrack with several thousand de-novo mutations found in the deCODE sequencing data. These two tracks are hidden by default and have to be switched on explicitly on the configuration page. Display Conventions and Configuration This is a super track that contains different subtracks, three with the deCODE recombination rates (paternal, maternal and average) and one with the 1000 Genomes recombination rate (average). These tracks are in signal graph (wiggle) format. By default, to show most recombination hotspots, their maximum value is set to 100 cM, even though many regions have values higher than 100. The maximum value can be changed on the configuration pages of the tracks. There are two more tracks that show additional details provided by deCODE: one subtrack with the raw data of all cross-overs tagged with their proband ID and another one with around 8000 human de-novo mutation variants that are linked to cross-over changes. Methods The deCODE genetic map was created at deCODE Genetics. It is based on microarrays assaying 626,828 SNP markers that allowed to identify 1,476,140 crossovers in 56,321 paternal meioses and 3,055,395 crossovers in 70,086 maternal meioses. In total, the data is based on 4,531,535 crossovers in 126,427 meioses. By using WGS data with 9,305,070 SNPs, the boundaries for 761,981 crossovers were refined: 247,942 crossovers in 9423 paternal meioses and 514,039 crossovers in 11,750 maternal meioses. The average resolution of the genetic map is 682 base pairs (bp): 655 and 708 bp for the paternal and maternal maps, respectively. The 1000 Genomes genetic map is based on the IMPUTE genetic map based on 1000 Genomes Phase 3, on hg19 coordinates. It was converted to hg38 by Po-Ru Loh at the Broad Institute. After a run of liftOver, he post-processed the data to deal with situations in which consecutive map locations became much closer/farther after lifting. The heuristic used is sufficient for statistical phasing but may not be optimal for other analyses. For this reason, and because of its higher resolution, the DeCODE map is therefore recommended for hg38. As with all other tracks, the data conversion commands and pointers to the original data files are documented in the makeDoc file of this track. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig or bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g., bigWigToBedGraph -chrom=chr17 -start=45941345 -end=45942345 http://hgdownload.soe.ucsc.edu/gbdb/hg38/recombRate/recombAvg.bw stdout Please refer to our Data Access FAQ for more information. Credits This track was produced at UCSC using data that are freely available for the deCODE and 1000 Genomes genetic maps. Thanks to Po-Ru Loh at the Broad Institute for providing the code to lift the hg19 1000 Genomes map data to hg38. References 1000 Genomes Project Consortium., Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PMID: 20981092; PMC: PMC3042601 Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, Eggertsson HP, Gunnarsson B, Oddsson A, Halldorsson GH, Zink F et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science. 2019 Jan 25;363(6425). PMID: 30679340 recombRate2 Recomb Rate Recombination rate: Genetic maps from deCODE and 1000 Genomes Mapping and Sequencing Description The recombination rate track represents calculated rates of recombination based on the genetic maps from deCODE (Halldorsson et al., 2019) and 1000 Genomes (2013 Phase 3 release, lifted from hg19). The deCODE map is more recent, has a higher resolution and was natively created on hg38 and therefore recommended. For the Recomb. deCODE average track, the recombination rates for chrX represent the female rate. This track also includes a subtrack with all the individual deCODE recombination events and another subtrack with several thousand de-novo mutations found in the deCODE sequencing data. These two tracks are hidden by default and have to be switched on explicitly on the configuration page. Display Conventions and Configuration This is a super track that contains different subtracks, three with the deCODE recombination rates (paternal, maternal and average) and one with the 1000 Genomes recombination rate (average). These tracks are in signal graph (wiggle) format. By default, to show most recombination hotspots, their maximum value is set to 100 cM, even though many regions have values higher than 100. The maximum value can be changed on the configuration pages of the tracks. There are two more tracks that show additional details provided by deCODE: one subtrack with the raw data of all cross-overs tagged with their proband ID and another one with around 8000 human de-novo mutation variants that are linked to cross-over changes. Methods The deCODE genetic map was created at deCODE Genetics. It is based on microarrays assaying 626,828 SNP markers that allowed to identify 1,476,140 crossovers in 56,321 paternal meioses and 3,055,395 crossovers in 70,086 maternal meioses. In total, the data is based on 4,531,535 crossovers in 126,427 meioses. By using WGS data with 9,305,070 SNPs, the boundaries for 761,981 crossovers were refined: 247,942 crossovers in 9423 paternal meioses and 514,039 crossovers in 11,750 maternal meioses. The average resolution of the genetic map is 682 base pairs (bp): 655 and 708 bp for the paternal and maternal maps, respectively. The 1000 Genomes genetic map is based on the IMPUTE genetic map based on 1000 Genomes Phase 3, on hg19 coordinates. It was converted to hg38 by Po-Ru Loh at the Broad Institute. After a run of liftOver, he post-processed the data to deal with situations in which consecutive map locations became much closer/farther after lifting. The heuristic used is sufficient for statistical phasing but may not be optimal for other analyses. For this reason, and because of its higher resolution, the DeCODE map is therefore recommended for hg38. As with all other tracks, the data conversion commands and pointers to the original data files are documented in the makeDoc file of this track. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. For automated download and analysis, the genome annotation is stored at UCSC in bigWig and bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tools bigWigToWig or bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g., bigWigToBedGraph -chrom=chr17 -start=45941345 -end=45942345 http://hgdownload.soe.ucsc.edu/gbdb/hg38/recombRate/recombAvg.bw stdout Please refer to our Data Access FAQ for more information. Credits This track was produced at UCSC using data that are freely available for the deCODE and 1000 Genomes genetic maps. Thanks to Po-Ru Loh at the Broad Institute for providing the code to lift the hg19 1000 Genomes map data to hg38. References 1000 Genomes Project Consortium., Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PMID: 20981092; PMC: PMC3042601 Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, Eggertsson HP, Gunnarsson B, Oddsson A, Halldorsson GH, Zink F et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science. 2019 Jan 25;363(6425). PMID: 30679340 rmsk RepeatMasker Repeating Elements by RepeatMasker Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. This track and the masking information in our hg38 genome download FASTA files was created in 2010 with the original RepBase library from 2010-03-02 and RepeatMasker 3.0.1. Since April 2019, RepBase is under a commercial license, we cannot distribute it or update the track using the RepBase library without a license. Therefore, and for compatibility with past results, given how central the masking is for many other annotations, we decided to not update the repeatmasking of hg38. However, you can show the small differences between the RepeatMasker 3/RepBase from 2010 and RepeatMasker 4/DFAM from 2020 using the track "RepeatMasker Viz" in the same track group. It contains two subtracks, one with the old and one with the new data. Also, these tracks have many more visusalisation options than the original RepeatMasker track. However, the last track update time of this track at UCSC is not 2010, because we had to add repeatmasking annotations to the rarely used _alt and _fix "patch" sequences of the hg38 genome. The repeatmasking annotations of the main chromosomes were unaffected and have not changed since 2010. For more information on genome patches, see our blog post. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 spliceAIsnvs SpliceAI SNVs SpliceAI SNVs (unmasked) Phenotype and Literature Important: The SpliceAI data on the UCSC Genome Browser is directly from Illumina (See Data Access below). However, since SpliceAI refers to the algorithm, and not the computed dataset, the data on the Broad server or other sources may have some differences between them. Description SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a lookup tool is provided by the Broad institute. Why are some variants not scored by SpliceAI? SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. What are the differeneces between masked and unmasked tracks? The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative splicing analysis and masked tracks for variant interpretation. Display Conventions and Interpretation Variants are colored according to Walker et al. 2023 splicing imact: Predicted impact on splicing: Score >= 0.2 Not informative: Score < 0.2 and > 0.1 No impact on splicing: Score <= 0.1 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up a table with this information. The scores range from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Methods The data were downloaded from Illumina. The spliceAI scores are represented in the VCF INFO field as SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31 Here, the pipe-separated fields contain ALT allele Gene name Acceptor gain score Acceptor loss score Donor gain score Donor loss score Relative location of affected cryptic acceptor Relative location of affected acceptor Relative location of affected cryptic donor Relative location of affected donor Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. The complete processing of this track can be found in the makedoc. Data Access These data are not available for download from the Genome Browser. The raw data can be found directly on Illumina. See below for a copy of the license restrictions pertaining to these data. License FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. References Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023 Jul 6;110(7):1046-1067. PMID: 37352859; PMC: PMC10357475 spliceAI SpliceAI SpliceAI: Splice Variant Prediction Score Phenotype and Literature Important: The SpliceAI data on the UCSC Genome Browser is directly from Illumina (See Data Access below). However, since SpliceAI refers to the algorithm, and not the computed dataset, the data on the Broad server or other sources may have some differences between them. Description SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a lookup tool is provided by the Broad institute. Why are some variants not scored by SpliceAI? SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. What are the differeneces between masked and unmasked tracks? The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative splicing analysis and masked tracks for variant interpretation. Display Conventions and Interpretation Variants are colored according to Walker et al. 2023 splicing imact: Predicted impact on splicing: Score >= 0.2 Not informative: Score < 0.2 and > 0.1 No impact on splicing: Score <= 0.1 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up a table with this information. The scores range from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Methods The data were downloaded from Illumina. The spliceAI scores are represented in the VCF INFO field as SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31 Here, the pipe-separated fields contain ALT allele Gene name Acceptor gain score Acceptor loss score Donor gain score Donor loss score Relative location of affected cryptic acceptor Relative location of affected acceptor Relative location of affected cryptic donor Relative location of affected donor Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. The complete processing of this track can be found in the makedoc. Data Access These data are not available for download from the Genome Browser. The raw data can be found directly on Illumina. See below for a copy of the license restrictions pertaining to these data. License FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. References Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023 Jul 6;110(7):1046-1067. PMID: 37352859; PMC: PMC10357475 lincRNAsAllCellTypeTopView lincRNA RNA-Seq lincRNA RNA-Seq reads expression abundances Genes and Gene Predictions Description This track displays the Human Body Map lincRNAs (large intergenic non coding RNAs) and TUCPs (transcripts of uncertain coding potential), as well as their expression levels across 22 human tissues and cell lines. The Human Body Map catalog was generated by integrating previously existing annotation sources with transcripts that were de-novo assembled from RNA-Seq data. These transcripts were collected from ~4 billion RNA-Seq reads across 24 tissues and cell types. Expression abundance was estimated by Cufflinks (Trapnell et al., 2010) based on RNA-Seq. Expression abundances were estimated on the gene locus level, rather than for each transcript separately and are given as raw FPKM. The prefixes tcons_ and tcons_l2_ are used to describe lincRNAs and TUCP transcripts, respectively. Specific details about the catalog generation and data sets used for this study can be found in Cabili et al (2011). Extended characterization of each transcript in the human body map catalog can be found at the Human lincRNA Catalog website. Expression abundance scores range from 0 to 1000, and are displayed from light blue to dark blue respectively: 01000 Credits The body map RNA-Seq data was kindly provided by the Gene Expression Applications research group at Illumina. References Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011 Sep 15;25(18):1915-27. PMID: 21890647; PMC: PMC3185964 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5. PMID: 20436464; PMC: PMC3146043 nonCodingRNAs Non-coding RNA RNA sequences that do not code for a protein Genes and Gene Predictions Description This is a super track for non-coding RNA data, subtracks represent some form of non-coding RNA data. Credits The body map RNA-Seq data was kindly provided by the Gene Expression Applications research group at Illumina. Genome coordinates for the sno/miRNA track were obtained from the miRBase sequences FTP site and from snoRNABase coordinates download page. References When making use of these data, please cite the folowing articles in addition to the primary sources of the miRNA sequences: Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008 Jan 1;36(Database issue):D154-8. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D140-4. Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D109-11. Weber MJ. New human and mouse microRNA genes found by homology search. You may also want to cite The Wellcome Trust Sanger Institute miRBase and The Laboratoire de Biologie Moleculaire Eucaryote snoRNABase. The following publication provides guidelines on miRNA annotation: Ambros V. et al., A uniform system for microRNA annotation. RNA. 2003;9(3):277-9. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011 Sep 15;25(18):1915-27. PMID: 21890647; PMC: PMC3185964 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5. PMID: 20436464; PMC: PMC3146043 lincRNAsAllCellType lincRNAsCellType lincRNA RNA-Seq reads expression abundances Genes and Gene Predictions lincRNAsCTWhiteBloodCell WhiteBloodCell lincRNAs from whitebloodcell Genes and Gene Predictions lincRNAsCTThyroid Thyroid lincRNAs from thyroid Genes and Gene Predictions lincRNAsCTTestes_R Testes_R lincRNAs from testes_r Genes and Gene Predictions lincRNAsCTTestes Testes lincRNAs from testes Genes and Gene Predictions lincRNAsCTSkeletalMuscle SkeletalMuscle lincRNAs from skeletalmuscle Genes and Gene Predictions lincRNAsCTProstate Prostate lincRNAs from prostate Genes and Gene Predictions lincRNAsCTPlacenta_R Placenta_R lincRNAs from placenta_r Genes and Gene Predictions lincRNAsCTOvary Ovary lincRNAs from ovary Genes and Gene Predictions lincRNAsCTLymphNode LymphNode lincRNAs from lymphnode Genes and Gene Predictions lincRNAsCTLung Lung lincRNAs from lung Genes and Gene Predictions lincRNAsCTLiver Liver lincRNAs from liver Genes and Gene Predictions lincRNAsCTKidney Kidney lincRNAs from kidney Genes and Gene Predictions lincRNAsCThLF_r2 hLF_r2 lincRNAs from hlf_r2 Genes and Gene Predictions lincRNAsCThLF_r1 hLF_r1 lincRNAs from hlf_r1 Genes and Gene Predictions lincRNAsCTHeart Heart lincRNAs from heart Genes and Gene Predictions lincRNAsCTForeskin_R Foreskin_R lincRNAs from foreskin_r Genes and Gene Predictions lincRNAsCTColon Colon lincRNAs from colon Genes and Gene Predictions lincRNAsCTBreast Breast lincRNAs from breast Genes and Gene Predictions lincRNAsCTBrain_R Brain_R lincRNAs from brain_r Genes and Gene Predictions lincRNAsCTBrain Brain lincRNAs from brain Genes and Gene Predictions lincRNAsCTAdrenal Adrenal lincRNAs from adrenal Genes and Gene Predictions lincRNAsCTAdipose Adipose lincRNAs from adipose Genes and Gene Predictions spliceAIindels SpliceAI indels SpliceAI Indels (unmasked) Phenotype and Literature Important: The SpliceAI data on the UCSC Genome Browser is directly from Illumina (See Data Access below). However, since SpliceAI refers to the algorithm, and not the computed dataset, the data on the Broad server or other sources may have some differences between them. Description SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a lookup tool is provided by the Broad institute. Why are some variants not scored by SpliceAI? SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. What are the differeneces between masked and unmasked tracks? The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative splicing analysis and masked tracks for variant interpretation. Display Conventions and Interpretation Variants are colored according to Walker et al. 2023 splicing imact: Predicted impact on splicing: Score >= 0.2 Not informative: Score < 0.2 and > 0.1 No impact on splicing: Score <= 0.1 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up a table with this information. The scores range from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Methods The data were downloaded from Illumina. The spliceAI scores are represented in the VCF INFO field as SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31 Here, the pipe-separated fields contain ALT allele Gene name Acceptor gain score Acceptor loss score Donor gain score Donor loss score Relative location of affected cryptic acceptor Relative location of affected acceptor Relative location of affected cryptic donor Relative location of affected donor Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. The complete processing of this track can be found in the makedoc. Data Access These data are not available for download from the Genome Browser. The raw data can be found directly on Illumina. See below for a copy of the license restrictions pertaining to these data. License FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. References Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023 Jul 6;110(7):1046-1067. PMID: 37352859; PMC: PMC10357475 wgEncodeRegTxn Transcription Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Regulation Description This track shows transcription levels for several cell types as assayed by high-throughput sequencing of polyadenylated RNA (RNA-seq). Additional views of this dataset and additional documentation on the methods used for this track are available at the ENCODE Caltech RNA-seq page. The data shown here are derived from the Raw Signal view from the paired 75-mer 200 bp insert size reads. The two replicates of the signal were pooled and normalized so that the total genome-wide signal sums to 10 billion. Display Conventions and Configuration By default, this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. The color of these tracks match their versions from their lifted source on the hg19 assembly. The colors are consistent with the other hg19 lifted tracks located in the ENCODE Regulation supertrack, with the exception being the DNase tracks, as they were not lifted from hg19 and are colored to reflect similarity of cell types. Credits This track shows data from the Wold Lab at Caltech, as part of the ENCODE Consortium. Release Notes This is release 2 (July 2012) of this track which includes two new subtracks for HeLa-S3 and HepG2. Data Release Policy Primary ENCODE data produced during the 2007-2012 production phase were subject to a restriction period. However, the data here are past those restrictions and are freely available. The full data release policy for ENCODE is available here. wgEncodeReg ENCODE Regulation Integrated Regulation from ENCODE Regulation Description These tracks contain information relevant to the regulation of transcription from the ENCODE Project. The Transcription track shows transcription levels assayed by sequencing of polyadenylated RNA from a variety of cell types. The Layered H3K4Me1 and Layered H3K27Ac tracks show where modification of histone proteins is suggestive of enhancer and, to a lesser extent, other regulatory activity. These histone modifications, particularly H3K4Me1, are quite broad. The actual enhancers are typically just a small portion of the area marked by these histone modifications. The Layered H3K4Me3 track shows a histone mark associated with promoters. The DNase I Hypersensitivity tracks indicate where chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase-sensitive, and promoters are particularly DNase-sensitive. The Txn Factor ChIP tracks show DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq). These tracks complement each other and together can shed much light on regulatory DNA. The histone marks are informative at a high level, but they have a resolution of just ~200 bases and do not provide much in the way of functional detail. The DNase hypersensitivity assay is higher in resolution at the DNA level and can be done on a large number of cell types since it's just a single assay. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, but provides little information beyond that. The transcription factor ChIP assay has a high resolution at the DNA level and, due to the very specific nature of the transcription factors, is often informative with respect to functional detail. However, since each transcription factor must be assayed separately, the information is only available for a limited number of transcription factors on a limited number of cell lines. Though each assay has its strengths and weaknesses, the fact that all of these assays are relatively independent of each other gives increased confidence when multiple tracks are suggesting a regulatory function for a region. For additional information, please click on the hyperlinks for the individual tracks above. Also note that additional histone marks and transcription information is available in other ENCODE tracks. This integrative supertrack just shows a selection of the most informative data of most general interest. Display Conventions By default, the transcription and histone mark displays use a transparent overlay method of displaying data from a number of cell lines in a single track. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. The color of the transcription and histone mark tracks match their versions from their lifted source on the hg19 assembly. The DNase tracks, which were not lifted from hg19, are colored differently to reflect similarity of cell types. There are three DNase tracks starting with a transparent overlay DNase Signal Track to allow viewing signals from all 95 cell types in one track. The individual signals and the same coloring scheme can also be found in the DNase HS Track where processed peaks and hotspots are also called out as gray boxes with the darkness of each box reflecting the underlying signal value. Lastly, in the DNase Clusters track all observed hypersensitive regions in the different cell lines at the same location were clustered into a single box where a number to the left of the box indicates how many cell types showed a hypersensitivity region and the darkness of the grey box is proportional to the the maximum value seen from one of the underlying cell lines. Clicking on these item takes you to a details page where additional information displays, such as the list of cell types that combined to form the cluster in the DNase Clusters track. Data Access The raw data for ENCODE 3 Regulation tracks can be accessed from Table Browser or combined with other data-sets through Data Integrator. For automated analysis and downloads, the track data files can be downloaded from our downloads server or queried using the JSON API or the Public SQL Individual regions or the whole genome annotation can be accessed as text using our utility bigBedToBed. Instructions for downloading the utility can be found here. That utility can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/wgEncodeRegDnase/wgEncodeRegDnaseUwA549Hotspot.broadPeak.bb -chrom=chr21 -start=0 -end=100000000 stdout For sorting transcription factor binding sites by cell type, we recommend you use the following download file for hg38. Credits Specific labs and contributors for these datasets are listed in the Credits section of the individual tracks in this super-track. The integrative view presented here was developed by Jim Kent at UCSC. Data Use Policy Users may freely download, analyze and publish results based on any ENCODE data without restrictions. Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional. Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE production laboratory(s) that generated the datasets used, as described in Citing ENCODE. wgEncodeRegTxnCaltechRnaSeqNhlfR2x75Il200SigPooled NHLF Transcription of NHLF cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqNhekR2x75Il200SigPooled NHEK Transcription of NHEK cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqK562R2x75Il200SigPooled K562 Transcription of K562 cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqHuvecR2x75Il200SigPooled HUVEC Transcription of HUVEC cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqHsmmR2x75Il200SigPooled HSMM Transcription of HSMM cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqHepg2R2x75Il200SigPooled HepG2 Transcription of HepG2 cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqHelas3R2x75Il200SigPooled HeLa-S3 Transcription of HeLa-S3 cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqH1hescR2x75Il200SigPooled H1-hESC Transcription of H1-hESC cells from ENCODE Regulation wgEncodeRegTxnCaltechRnaSeqGm12878R2x75Il200SigPooled GM12878 Transcription of GM12878 cells from ENCODE Regulation wgEncodeRegMarkH3k4me1 Layered H3K4Me1 H3K4Me1 Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE Regulation Description Chemical modifications (e.g., methylation and acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me1 histone mark across the genome as determined by a ChIP-seq assay. The H3K4me1 histone mark is the mono-methylation of lysine 4 of the H3 histone protein, and it is associated with enhancers and with DNA regions downstream of transcription starts. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display Conventions and Configuration By default, this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. The color of these tracks match their versions from their lifted source on the hg19 assembly. The colors are consistent with the other hg19 lifted tracks located in the ENCODE Regulation supertrack, with the exception being the DNase tracks, as they were not lifted from hg19 and are colored to reflect similarity of cell types. Credits This track shows data from the Bernstein Lab at the Broad Institute, as part of the ENCODE Consortium. Data Release Policy Primary ENCODE data produced during the 2007-2012 production phase were subject to a restriction period. However, the data here are past those restrictions and are freely available. The full data release policy for ENCODE is available here. wgEncodeRegMarkH3k4me1Nhlf NHLF H3K4Me1 Mark (Often Found Near Regulatory Elements) on NHLF Cells from ENCODE Regulation wgEncodeRegMarkH3k4me1Nhek NHEK H3K4Me1 Mark (Often Found Near Regulatory Elements) on NHEK Cells from ENCODE Regulation wgEncodeRegMarkH3k4me1K562 K562 H3K4Me1 Mark (Often Found Near Regulatory Elements) on K562 Cells from ENCODE Regulation wgEncodeRegMarkH3k4me1Huvec HUVEC H3K4Me1 Mark (Often Found Near Regulatory Elements) on HUVEC Cells from ENCODE Regulation wgEncodeRegMarkH3k4me1Hsmm HSMM H3K4Me1 Mark (Often Found Near Regulatory Elements) on HSMM Cells from ENCODE Regulation wgEncodeRegMarkH3k4me1H1hesc H1-hESC H3K4Me1 Mark (Often Found Near Regulatory Elements) on H1-hESC Cells from ENCODE Regulation wgEncodeBroadHistoneGm12878H3k4me1StdSig GM12878 H3K4Me1 Mark (Often Found Near Regulatory Elements) on GM12878 Cells from ENCODE Regulation spliceAIsnvsMasked SpliceAI SNVs (masked) SpliceAI SNVs (masked) Phenotype and Literature Important: The SpliceAI data on the UCSC Genome Browser is directly from Illumina (See Data Access below). However, since SpliceAI refers to the algorithm, and not the computed dataset, the data on the Broad server or other sources may have some differences between them. Description SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a lookup tool is provided by the Broad institute. Why are some variants not scored by SpliceAI? SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. What are the differeneces between masked and unmasked tracks? The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative splicing analysis and masked tracks for variant interpretation. Display Conventions and Interpretation Variants are colored according to Walker et al. 2023 splicing imact: Predicted impact on splicing: Score >= 0.2 Not informative: Score < 0.2 and > 0.1 No impact on splicing: Score <= 0.1 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up a table with this information. The scores range from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Methods The data were downloaded from Illumina. The spliceAI scores are represented in the VCF INFO field as SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31 Here, the pipe-separated fields contain ALT allele Gene name Acceptor gain score Acceptor loss score Donor gain score Donor loss score Relative location of affected cryptic acceptor Relative location of affected acceptor Relative location of affected cryptic donor Relative location of affected donor Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. The complete processing of this track can be found in the makedoc. Data Access These data are not available for download from the Genome Browser. The raw data can be found directly on Illumina. See below for a copy of the license restrictions pertaining to these data. License FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. References Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023 Jul 6;110(7):1046-1067. PMID: 37352859; PMC: PMC10357475 robustPeaks TSS peaks FANTOM5: DPI peak, robust set Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human and mouse primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. They are named "p#@GENE_SYMBOL" if associated with 5'-end of known genes, or "p@CHROM:START..END,STRAND" otherwise. The summary tracks consist of the TSS (CAGE) peaks and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 fantom5 FANTOM5 FANTOM5: Mapped transcription start sites (TSS) and their usage Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human and mouse primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. They are named "p#@GENE_SYMBOL" if associated with 5'-end of known genes, or "p@CHROM:START..END,STRAND" otherwise. The summary tracks consist of the TSS (CAGE) peaks and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 wgEncodeRegMarkH3k4me3 Layered H3K4Me3 H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE Regulation Description Chemical modifications (e.g., methylation and acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me3 histone mark across the genome as determined by a ChIP-seq assay. The H3K4Me3 histone mark is the tri-methylation of lysine 4 of the H3 histone protein, and it is associated with promoters that are active or poised to be activated. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display Conventions and Configuration By default, this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. The color of these tracks match their versions from their lifted source on the hg19 assembly. The colors are consistent with the other hg19 lifted tracks located in the ENCODE Regulation supertrack, with the exception being the DNase tracks, as they were not lifted from hg19 and are colored to reflect similarity of cell types. Credits This track shows data from the Bernstein Lab at the Broad Institute, as part of the ENCODE Consortium. Data Release Policy Primary ENCODE data produced during the 2007-2012 production phase were subject to a restriction period. However, the data here are past those restrictions and are freely available. The full data release policy for ENCODE is available here. wgEncodeRegMarkH3k4me3Nhlf NHLF H3K4Me3 Mark (Often Found Near Promoters) on NHLF Cells from ENCODE Regulation wgEncodeRegMarkH3k4me3Nhek NHEK H3K4Me3 Mark (Often Found Near Promoters) on NHEK Cells from ENCODE Regulation wgEncodeRegMarkH3k4me3K562 K562 H3K4Me3 Mark (Often Found Near Promoters) on K562 Cells from ENCODE Regulation wgEncodeRegMarkH3k4me3Huvec HUVEC H3K4Me3 Mark (Often Found Near Promoters) on HUVEC Cells from ENCODE Regulation wgEncodeRegMarkH3k4me3Hsmm HSMM H3K4Me3 Mark (Often Found Near Promoters) on HSMM Cells from ENCODE Regulation wgEncodeRegMarkH3k4me3H1hesc H1-hESC H3K4Me3 Mark (Often Found Near Promoters) on H1-hESC Cells from ENCODE Regulation wgEncodeBroadHistoneGm12878H3k4me3StdSig GM12878 H3K4Me3 Mark (Often Found Near Regulatory Elements) on GM12878 Cells from ENCODE Regulation spliceAIindelsMasked SpliceAI indels (masked) SpliceAI Indels (masked) Phenotype and Literature Important: The SpliceAI data on the UCSC Genome Browser is directly from Illumina (See Data Access below). However, since SpliceAI refers to the algorithm, and not the computed dataset, the data on the Broad server or other sources may have some differences between them. Description SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a lookup tool is provided by the Broad institute. Why are some variants not scored by SpliceAI? SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. What are the differeneces between masked and unmasked tracks? The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative splicing analysis and masked tracks for variant interpretation. Display Conventions and Interpretation Variants are colored according to Walker et al. 2023 splicing imact: Predicted impact on splicing: Score >= 0.2 Not informative: Score < 0.2 and > 0.1 No impact on splicing: Score <= 0.1 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up a table with this information. The scores range from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Methods The data were downloaded from Illumina. The spliceAI scores are represented in the VCF INFO field as SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31 Here, the pipe-separated fields contain ALT allele Gene name Acceptor gain score Acceptor loss score Donor gain score Donor loss score Relative location of affected cryptic acceptor Relative location of affected acceptor Relative location of affected cryptic donor Relative location of affected donor Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. The complete processing of this track can be found in the makedoc. Data Access These data are not available for download from the Genome Browser. The raw data can be found directly on Illumina. See below for a copy of the license restrictions pertaining to these data. License FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. References Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023 Jul 6;110(7):1046-1067. PMID: 37352859; PMC: PMC10357475 Total_counts_multiwig Total counts of CAGE reads FANTOM5: Total counts of CAGE reads Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human and mouse primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. They are named "p#@GENE_SYMBOL" if associated with 5'-end of known genes, or "p@CHROM:START..END,STRAND" otherwise. The summary tracks consist of the TSS (CAGE) peaks and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 TotalCounts_Rev Total counts of CAGE reads (rev) Total counts of CAGE reads reverse Regulation TotalCounts_Fwd Total counts of CAGE reads (fwd) Total counts of CAGE reads forward Regulation wgEncodeRegMarkH3k27ac Layered H3K27Ac H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE Regulation Description Chemical modifications (e.g., methylation and acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K27Ac histone mark across the genome as determined by a ChIP-seq assay. The H3K27Ac histone mark is the acetylation of lysine 27 of the H3 histone protein, and it is thought to enhance transcription possibly by blocking the spread of the repressive histone mark H3K27Me3. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display Conventions and Configuration By default, this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. The color of these tracks match their versions from their lifted source on the hg19 assembly. The colors are consistent with the other hg19 lifted tracks located in the ENCODE Regulation supertrack, with the exception being the DNase tracks, as they were not lifted from hg19 and are colored to reflect similarity of cell types. Credits This track shows data from the Bernstein Lab at the Broad Institute, as part of the ENCODE Consortium. Data Release Policy Primary ENCODE data produced during the 2007-2012 production phase were subject to a restriction period. However, the data here are past those restrictions and are freely available. The full data release policy for ENCODE is available here. wgEncodeRegMarkH3k27acNhlf NHLF H3K27Ac Mark (Often Found Near Regulatory Elements) on NHLF Cells from ENCODE Regulation wgEncodeRegMarkH3k27acNhek NHEK H3K27Ac Mark (Often Found Near Regulatory Elements) on NHEK Cells from ENCODE Regulation wgEncodeRegMarkH3k27acK562 K562 H3K27Ac Mark (Often Found Near Regulatory Elements) on K562 Cells from ENCODE Regulation wgEncodeRegMarkH3k27acHuvec HUVEC H3K27Ac Mark (Often Found Near Regulatory Elements) on HUVEC Cells from ENCODE Regulation wgEncodeRegMarkH3k27acHsmm HSMM H3K27Ac Mark (Often Found Near Regulatory Elements) on HSMM Cells from ENCODE Regulation wgEncodeRegMarkH3k27acH1hesc H1-hESC H3K27Ac Mark (Often Found Near Regulatory Elements) on H1-hESC Cells from ENCODE Regulation wgEncodeRegMarkH3k27acGm12878 GM12878 H3K27Ac Mark (Often Found Near Regulatory Elements) on GM12878 Cells from ENCODE Regulation Max_counts_multiwig Max counts of CAGE reads FANTOM5: Max counts of CAGE reads Regulation Description The FANTOM5 track shows mapped transcription start sites (TSS) and their usage in primary cells, cell lines, and tissues to produce a comprehensive overview of gene expression across the human body by using single molecule sequencing. Display Conventions and Configuration Items in this track are colored according to their strand orientation. Blue indicates alignment to the negative strand, and red indicates alignment to the positive strand. Methods Protocol Individual biological states are profiled by HeliScopeCAGE, which is a variation of the CAGE (Cap Analysis Gene Expression) protocol based on a single molecule sequencer. The standard protocol requiring 5 µg of total RNA as a starting material is referred to as hCAGE, and an optimized version for a lower quantity (~ 100 ng) is referred to as LQhCAGE (Kanamori-Katyama et al. 2011). hCAGE LQhCAGE Samples Transcription start sites (TSSs) were mapped and their usage in human and mouse primary cells, cell lines, and tissues was to produce a comprehensive overview of mammalian gene expression across the human body. 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. Individual samples shown in "TSS activity" tracks are grouped as below. Primary cell Tissue Cell Line Time course Fractionation TSS peaks TSS (CAGE) peaks across the panel of the biological states (samples) are identified by DPI (decomposition based peak identification, Forrest et al. 2014), where each of the peaks consists of neighboring and related TSSs. The peaks are used as anchors to define promoters and units of promoter-level expression analysis. Two subsets of the peaks are defined based on evidence of read counts, depending on scopes of subsequent analyses, and the first subset (referred as a robust set of the peaks, thresholded for expression analysis is shown as TSS peaks. They are named "p#@GENE_SYMBOL" if associated with 5'-end of known genes, or "p@CHROM:START..END,STRAND" otherwise. The summary tracks consist of the TSS (CAGE) peaks and summary profiles of TSS activities (total and maximum values). The summary track consists of the following tracks. TSS (CAGE) peaks the robust peaks TSS summary profiles Total counts and TPM (tags per million) in all the samples Maximum counts and TPM among the samples TSS activity 5′-end of the mapped CAGE reads are counted at a single base pair resolution (CTSS, CAGE tag starting sites) on the genomic coordinates, which represent TSS activities in the sample. The read counts tracks indicate raw counts of CAGE reads, and the TPM tracks indicate normalized counts as TPM (tags per million). Categories of individual samples - Cell Line hCAGE - Cell Line LQhCAGE - fractionation hCAGE - Primary cell hCAGE - Primary cell LQhCAGE - Time course hCAGE - Tissue hCAGE Data Access FANTOM5 data can be explored interactively with the Table Browser and cross-referenced with the Data Integrator. For programmatic access, the track can be accessed using the Genome Browser's REST API. ReMap annotations can be downloaded from the Genome Browser's download server as a bigBed file. This compressed binary format can be remotely queried through command line utilities. Please note that some of the download files can be quite large. The FANTOM5 reprocessed data can be found and downloaded on the FANTOM website. Credits Thanks to the FANTOM5 consortium, the Large Scale Data Managing Unit and Preventive Medicine and Applied Genomics Unit, the Center for Integrative Medical Sciences (IMS), and RIKEN for providing this data and its analysis. References FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M et al. A promoter-level mammalian expression atlas. Nature. 2014 Mar 27;507(7493):462-70. PMID: 24670764; PMC: PMC4529748 Kanamori-Katayama M, Itoh M, Kawaji H, Lassmann T, Katayama S, Kojima M, Bertin N, Kaiho A, Ninomiya N, Daub CO et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 2011 Jul;21(7):1150-9. PMID: 21596820; PMC: PMC3129257 Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015 Jan 5;16(1):22. PMID: 25723102; PMC: PMC4310165 MaxCounts_Rev Max counts of CAGE reads (rev) Max counts of CAGE reads reverse Regulation MaxCounts_Fwd Max counts of CAGE reads (fwd) Max counts of CAGE reads forward Regulation wgEncodeRegDnaseClustered DNase Clusters DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) Regulation Description This track shows clusters of DNaseI hypersensitivity derived from assays in 95 cell types by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011, as part of the ENCODE project first production phase. Regulatory regions in general, and promoters in particular, tend to be DNase-sensitive. Additional views of this data sites are displayed from the DNaseI HS track. The peaks in that track are the basis for the clusters shown here, which combine data from peaks from the different cell lines. Please note that track colors for the DNase tracks are based on similiarity of cell types, while there is different coloring for cell types on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly. Display Conventions and Configuration A gray box indicates the extent of the hypersensitive region. The darkness is proportional to the maximum signal strength observed in any cell line. The number to the left of the box shows how many cell lines are hypersensitive in the region. The track can be configured to restrict the display to elements above a specified score in the range 1-1000 (where score is based on signal strength). Methods Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline (July 2014 specification), diagrammed here: Credit: Qian Alvin Qin, X. Liu lab Briefly, sequence files were aligned to the hg38 (GRCh38) genome assembly augmented with 'sponge' sequence (ref). Multi-mapped reads were removed, as were reads that aligned to 'sponge' or mitochondiral sequence. Results from all replicates were pooled, and further processed by the Hotspot program to call peaks. Peaks of DNaseI hypersensitivity from the ENCODE DNase Analysis Pipeline at UCSC were assigned normalized scores (by UCSC regClusterMakeTableOfTables) in the range 0-1000 based on the narrowPeak signalValue and then clustered on score (by UCSC regCluster) to generate singly-linked clusters. Additional documentation on the methods used to identify hypersensitive sites are available from the DNaseI HS track. Credits This track is based on sequence data from the University of Washington ENCODE group, with subsequent processing by UCSC. For additional credits and references, see the DNaseI HS track. wgEncodeRegDnaseWig DNase Signal DNase I Hypersensitivity Signal Colored by Similarity from ENCODE Regulation Description This track provides an integrated display of DNase hypersensitivity in multiple cell types using overlapping colored graphs of signal density with graph colors assigned to cell types based on similarity of signal. The track is based on results of experiments performed by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011 as part of the ENCODE project first production phase. The signal graphs displayed here are also included in the comprehensive DNaseI HS track, which also provides peak and region calls and uses the same coloring based on similiarity of cell types (please note there is different coloring on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly). Methods Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline described in the DNaseI HS track description. Signal graphs were normalized so the average value genome-wide is 1. Colors for the signal graphs were assigned by the UCSC BigWigCluster tool. The cell types were clustered into a binary tree, a rainbow was cast to the leaf nodes providing coloring based on similarity. Credit: Chris Eisenhart, J. Kent lab Credits The processed data for this track were generated at UCSC. Credits for the primary data underlying this track are included in the DNaseI HS track description. References Miga KH, Eisenhart C, Kent WJ. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res. 2015 Nov 16;43(20):e133. PMID: 26163063 Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PMID: 22955617; PMC: PMC3721348 See also the references in the DNaseI HS track. wgEncodeRegDnaseUwBe2cWig BE2_C Sg BE2_C neuroblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWerirb1Wig WERI-Rb-1 Sg WERI-Rb-1 retinoblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiol100nm1hrWig MCF-7 estr 1h Sg MCF-7 mammary adenocarcinoma cell line (estradi 1h) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiolctrl0hrWig MCF-7 estr 0h Sg MCF-7 mammary adenocarcinoma cell line (estradi 0h) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Wig MCF-7 Sg MCF-7 mammary adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSknmcWig SK-N-MC Sg SK-N-MC neuroepithelioma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHelas3Wig HeLa-S3 Sg HeLa-S3 cervical epithelial adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyadWig HMVEC-dLy-Ad Sg HMVEC-dLy-Ad dermal MV endothelial cell, lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrpepicWig HRPEpiC Sg HRPEpiC retinal pigment epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwRptecWig RPTEC Sg RPTEC renal proximal tubule epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota14dWig H7-ES diff 14d Sg H7-hESC embryonic stem cell (diff 14d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota5dWig H7-ES diff 5d Sg H7-hESC embryonic stem cell (diff 5d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescWig H7-ES Sg H7-hESC embryonic stem cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNb4Wig NB4 Sg NB4 acute promyelocytic leukemia (APL) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHl60Wig HL-60 Sg HL-60 acute promyelocytic leukemia (APL) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMonocytescd14ro01746Wig Monocyte-CD14+ Sg Monocytes-CD14+_RO01746 monocyte, CD14+ DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm12865Wig GM12865 Sg GM12865 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm12878Wig GM12878 Sg GM12878 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwJurkatWig Jurkat Sg Jurkat T-lymphocyte acute leukemia cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh1wb54553204Wig Th1_Wb54553204 Sg Th1_Wb54553204 T-lymphocyte, helper type 1 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh2Wig Th2 Sg Th2 T-lymphocyte, helper type 2 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh1Wig Th1 Sg Th1 T-lymphocyte, helper type 1 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwCd20ro01778Wig CD20+_RO01778 Sg CD20+_RO01778 B-lymphocyte, CD20+ DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSknshraWig SK-N-SH_RA Sg SK-N-SH_RA neuroblastoma cell line, RA treated DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwCaco2Wig Caco-2 Sg Caco-2 colon adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHepg2Wig HepG2 Sg HepG2 hepatocellular carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm06990Wig GM06990 Sg GM06990 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHeepicWig HEEpiC Sg HEEpiC esophageal epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwPrecWig PrEC Sg PrEC prostate epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSaecWig SAEC Sg SAEC small airway epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhekWig NHEK Sg NHEK epidermal keratinocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHreWig HRE Sg HRE renal epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrcepicWig HRCEpiC Sg HRCEpiC renal cortical epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdadWig HMVEC-dAd Sg HMVEC-dAd dermal microvascular endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdneoWig HMVEC-dNeo Sg HMVEC-dNeo dermal MV endothelial cell, neonate DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecllyWig HMVEC-LLy Sg HMVEC-LLy lung microvascular endothelial cell, lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrgecWig HRGEC Sg HRGEC renal glomerular endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdblneoWig HMVEC-dBl-Neo Sg HMVEC-dBl-Neo dermal MV endo cell, neonate blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyneoWig HMVEC-dLy-Neo Sg HMVEC-dLy-Neo dermal MV endo cell, neonate lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdbladWig HMVEC-dBl-Ad Sg HMVEC-dBl-Ad dermal MV endothelial cell, blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmveclblWig HMVEC-LBl Sg HMVEC-LBl lung microvascular epithelium. blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHuvecWig HUVEC Sg HUVEC umbilical vein endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHsmmtubeWig HSMMtube Sg HSMMtube skeletal muscle myotube DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Diff4dWig LHCN-M2 diff4d Sg LHCN-M2 skeletal myoblast (diff 4d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Wig LHCN-M2 Sg LHCN-M2 skeletal myoblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHsmmWig HSMM Sg HSMM skeletal muscle myoblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhdfadWig NHDF-Ad Sg NHDF-Ad dermal fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWi384ohtam20nm72hrWig WI-38 40HTAM Sg WI-38 embryonic lung fibroblast cell line (40HTAM) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcfaaWig HCFaa Sg HCFaa cardiac fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHaspWig HA-sp Sg HA-sp spinal cord astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwRpmi7951Wig RPMI-7951 Sg RPMI-7951 melanoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwM059jWig M059J Sg M059J glioblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHahWig HA-h Sg HA-h hippocampal astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg04450Wig AG04450 Sg AG04450 fetal lung fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg04449Wig AG04449 Sg AG04449 fetal skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHbvsmcWig HBVSMC Sg HBVSMC brain vascular smooth muscle DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSkmcWig SKMC Sg SKMC skeletal muscle cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHaepicWig HAEpiC Sg HAEpiC amniotic epithelium (AEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhdfneoWig NHDF-neo Sg NHDF-neo dermal fibroblast, neonate DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHgfWig HGF Sg HGF gingival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmfWig HMF Sg HMF mammary fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg10803Wig AG10803 Sg AG10803 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwBonemarrowmscWig bonemarrow_MSC Sg bone_marrow_MSC bone marrow fibroblastoid DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHipepicWig HIPEpiC Sg HIPEpiC iris pigment epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHvmfWig HVMF Sg HVMF villous mesenchymal fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHacWig HAc Sg HAc cerebellar astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHconfWig HConF Sg HConF conjunctival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpfWig HPF Sg HPF pulmonary fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcpepicWig HCPEpiC Sg HCPEpiC choroid plexus epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAoafWig AoAF Sg AoAF aorta fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpafWig HPAF Sg HPAF pulmonary artery fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcmWig HCM Sg HCM cardiac myocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcfWig HCF Sg HCF cardiac fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpdlfWig HPdLF Sg HPdLF periodontal ligament fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg09319Wig AG09319 Sg AG09319 gingival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHbmecWig HBMEC Sg HBMEC brain microvascular endothelial cell (MEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhaWig NH-A Sg NH-A astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhlfWig NHLF Sg NHLF lung fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm04504Wig GM04504 Sg GM04504 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm04503Wig GM04503 Sg GM04503 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWi38Wig WI-38 Sg WI-38 embryonic lung fibroblast cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHnpcepicWig HNPCEpiC Sg HNPCEpiC non-pigmented ciliary epithelium (NPCEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg09309Wig AG09309 Sg AG09309 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwBjWig BJ Sg BJ foreskin fibroblast cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNt2d1Wig NT2-D1 Sg NT2-D1 embryonal carcinoma (NTera2) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHffmycWig HFF-Myc Sg HFF-Myc foreskin fibroblast cell line, cMyc DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHffWig HFF Sg HFF foreskin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhberaWig NHBE_RA Sg NHBE_RA bronchial epithelium, RA treated DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHct116Wig HCT-116 Sg HCT-116 colorectal carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwPanc1Wig PANC-1 Sg PANC-1 pancreatic carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwT47dWig T-47D Sg T-47D mammary ductal carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmecWig HMEC Sg HMEC mammary epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLncapWig LNCaP Sg LNCaP prostate adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwA549Wig A549 Sg A549 lung adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwK562Wig K562 Sg K562 lymphoblast chronic myeloid leukemia cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnase DNase HS DNase I Hypersensitivity in 95 cell types from ENCODE Regulation Description These tracks contain the results of DNase I hypersensitivity experiments performed by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011, as part of the ENCODE project first production phase. Colors were assigned to cell types based on similarity of signal. Other views of this data (along with additional documentation) are available from the hg19 ENCODE UW DNaseI HS track. Display Conventions and Configuration This track is a composite annotation track containing multiple subtracks, one for each cell type. The display mode and filtering of each subtrack can be individually controlled. For more information about track configuration, see Configuring Multi-View Tracks. Methods Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline (July 2014 specification), diagrammed here: Credit: Qian Alvin Qin, X. Liu lab Briefly, sequence files were aligned to the hg38 (GRCh38) genome assembly augmented with 'sponge' sequence (ref). Multi-mapped reads were removed, as were reads that aligned to 'sponge' or mitochondiral sequence. Results from all replicates were pooled, and further processed by the Hotspot program to call peaks as well as broader regions of activity ('hotspots'), and to create signal density graphs. Signal graphs were normalized so the average value genome-wide is 1. The cell types were clustered into a binary tree, a rainbow was cast to the leaf nodes providing coloring based on similarity. Credit: Chris Eisenhart, J. Kent lab (Please note there is different coloring on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly). Credits The processed data for this track were produced by UCSC. Credits for the primary data underlying this track are included in the ENCODE UW DNaseI HS track description. References Miga KH, Eisenhart C, Kent WJ. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res. 2015 Nov 16;43(20):e133. PMID: 26163063 Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PMID: 22955617; PMC: PMC3721348 See also the references in the ENCODE UW DNaseI HS track. wgEncodeRegDnaseSignal Signal HotSpot5 signal on BWA. Dupe, sponge and mitochondria filtered Regulation Description This track provides an integrated display of DNase hypersensitivity in multiple cell types using overlapping colored graphs of signal density with graph colors assigned to cell types based on similarity of signal. The track is based on results of experiments performed by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011 as part of the ENCODE project first production phase. The signal graphs displayed here are also included in the comprehensive DNaseI HS track, which also provides peak and region calls and uses the same coloring based on similiarity of cell types (please note there is different coloring on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly). Methods Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline described in the DNaseI HS track description. Signal graphs were normalized so the average value genome-wide is 1. Colors for the signal graphs were assigned by the UCSC BigWigCluster tool. The cell types were clustered into a binary tree, a rainbow was cast to the leaf nodes providing coloring based on similarity. Credit: Chris Eisenhart, J. Kent lab Credits The processed data for this track were generated at UCSC. Credits for the primary data underlying this track are included in the DNaseI HS track description. References Miga KH, Eisenhart C, Kent WJ. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res. 2015 Nov 16;43(20):e133. PMID: 26163063 Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PMID: 22955617; PMC: PMC3721348 See also the references in the DNaseI HS track. wgEncodeRegDnaseUwBe2cSignal BE2_C Sg BE2_C neuroblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWerirb1Signal WERI-Rb-1 Sg WERI-Rb-1 retinoblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiol100nm1hrSignal MCF-7 estr 1h Sg MCF-7 mammary adenocarcinoma cell line (estradi 1h) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiolctrl0hrSignal MCF-7 estr 0h Sg MCF-7 mammary adenocarcinoma cell line (estradi 0h) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMcf7Signal MCF-7 Sg MCF-7 mammary adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSknmcSignal SK-N-MC Sg SK-N-MC neuroepithelioma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHelas3Signal HeLa-S3 Sg HeLa-S3 cervical epithelial adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyadSignal HMVEC-dLy-Ad Sg HMVEC-dLy-Ad dermal MV endothelial cell, lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrpepicSignal HRPEpiC Sg HRPEpiC retinal pigment epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwRptecSignal RPTEC Sg RPTEC renal proximal tubule epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota14dSignal H7-ES diff 14d Sg H7-hESC embryonic stem cell (diff 14d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota5dSignal H7-ES diff 5d Sg H7-hESC embryonic stem cell (diff 5d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwH7hescSignal H7-ES Sg H7-hESC embryonic stem cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNb4Signal NB4 Sg NB4 acute promyelocytic leukemia (APL) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHl60Signal HL-60 Sg HL-60 acute promyelocytic leukemia (APL) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwMonocytescd14ro01746Signal Monocyte-CD14+ Sg Monocytes-CD14+_RO01746 monocyte, CD14+ DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm12865Signal GM12865 Sg GM12865 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm12878Signal GM12878 Sg GM12878 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwJurkatSignal Jurkat Sg Jurkat T-lymphocyte acute leukemia cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh1wb54553204Signal Th1_Wb54553204 Sg Th1_Wb54553204 T-lymphocyte, helper type 1 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh2Signal Th2 Sg Th2 T-lymphocyte, helper type 2 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwTh1Signal Th1 Sg Th1 T-lymphocyte, helper type 1 DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwCd20ro01778Signal CD20+_RO01778 Sg CD20+_RO01778 B-lymphocyte, CD20+ DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSknshraSignal SK-N-SH_RA Sg SK-N-SH_RA neuroblastoma cell line, RA treated DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwCaco2Signal Caco-2 Sg Caco-2 colon adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHepg2Signal HepG2 Sg HepG2 hepatocellular carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm06990Signal GM06990 Sg GM06990 B-lymphocyte, lymphoblastoid cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHeepicSignal HEEpiC Sg HEEpiC esophageal epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwPrecSignal PrEC Sg PrEC prostate epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSaecSignal SAEC Sg SAEC small airway epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhekSignal NHEK Sg NHEK epidermal keratinocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHreSignal HRE Sg HRE renal epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrcepicSignal HRCEpiC Sg HRCEpiC renal cortical epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdadSignal HMVEC-dAd Sg HMVEC-dAd dermal microvascular endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdneoSignal HMVEC-dNeo Sg HMVEC-dNeo dermal MV endothelial cell, neonate DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecllySignal HMVEC-LLy Sg HMVEC-LLy lung microvascular endothelial cell, lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHrgecSignal HRGEC Sg HRGEC renal glomerular endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdblneoSignal HMVEC-dBl-Neo Sg HMVEC-dBl-Neo dermal MV endo cell, neonate blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyneoSignal HMVEC-dLy-Neo Sg HMVEC-dLy-Neo dermal MV end cell, neonate lymph DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmvecdbladSignal HMVEC-dBl-Ad Sg HMVEC-dBl-Ad dermal MV endothelial cell, blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmveclblSignal HMVEC-LBl Sg HMVEC-LBl lung microvascular epithelium. blood DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHuvecSignal HUVEC Sg HUVEC umbilical vein endothelial cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHsmmtubeSignal HSMMtube Sg HSMMtube skeletal muscle myotube DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Diff4dSignal LHCN-M2 diff4d Sg LHCN-M2 skeletal myoblast (diff 4d) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Signal LHCN-M2 Sg LHCN-M2 skeletal myoblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHsmmSignal HSMM Sg HSMM skeletal muscle myoblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhdfadSignal NHDF-Ad Sg NHDF-Ad dermal fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWi384ohtam20nm72hrSignal WI-38 40HTAM Sg WI-38 embryonic lung fibroblast cell line (40HTAM) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcfaaSignal HCFaa Sg HCFaa cardiac fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHaspSignal HA-sp Sg HA-sp spinal cord astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwRpmi7951Signal RPMI-7951 Sg RPMI-7951 melanoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwM059jSignal M059J Sg M059J glioblastoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHahSignal HA-h Sg HA-h hippocampal astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg04450Signal AG04450 Sg AG04450 fetal lung fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg04449Signal AG04449 Sg AG04449 fetal skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHbvsmcSignal HBVSMC Sg HBVSMC brain vascular smooth muscle DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwSkmcSignal SKMC Sg SKMC skeletal muscle cell DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHaepicSignal HAEpiC Sg HAEpiC amniotic epithelium (AEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhdfneoSignal NHDF-neo Sg NHDF-neo dermal fibroblast, neonate DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHgfSignal HGF Sg HGF gingival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmfSignal HMF Sg HMF mammary fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg10803Signal AG10803 Sg AG10803 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwBonemarrowmscSignal bonemarrow_MSC Sg bone_marrow_MSC bone marrow fibroblastoid DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHipepicSignal HIPEpiC Sg HIPEpiC iris pigment epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHvmfSignal HVMF Sg HVMF villous mesenchymal fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHacSignal HAc Sg HAc cerebellar astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHconfSignal HConF Sg HConF conjunctival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpfSignal HPF Sg HPF pulmonary fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcpepicSignal HCPEpiC Sg HCPEpiC choroid plexus epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAoafSignal AoAF Sg AoAF aorta fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpafSignal HPAF Sg HPAF pulmonary artery fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcmSignal HCM Sg HCM cardiac myocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHcfSignal HCF Sg HCF cardiac fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHpdlfSignal HPdLF Sg HPdLF periodontal ligament fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg09319Signal AG09319 Sg AG09319 gingival fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHbmecSignal HBMEC Sg HBMEC brain microvascular endothelial cell (MEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhaSignal NH-A Sg NH-A astrocyte DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhlfSignal NHLF Sg NHLF lung fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm04504Signal GM04504 Sg GM04504 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwGm04503Signal GM04503 Sg GM04503 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwWi38Signal WI-38 Sg WI-38 embryonic lung fibroblast cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHnpcepicSignal HNPCEpiC Sg HNPCEpiC non-pigmented ciliary epithelium (NPCEC) DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwAg09309Signal AG09309 Sg AG09309 skin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwBjSignal BJ Sg BJ foreskin fibroblast cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNt2d1Signal NT2-D1 Sg NT2-D1 embryonal carcinoma (NTera2) cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHffmycSignal HFF-Myc Sg HFF-Myc foreskin fibroblast cell line, cMyc DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHffSignal HFF Sg HFF foreskin fibroblast DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwNhberaSignal NHBE_RA Sg NHBE_RA bronchial epithelium, RA treated DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHct116Signal HCT-116 Sg HCT-116 colorectal carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwPanc1Signal PANC-1 Sg PANC-1 pancreatic carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwT47dSignal T-47D Sg T-47D mammary ductal carcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwHmecSignal HMEC Sg HMEC mammary epithelium DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwLncapSignal LNCaP Sg LNCaP prostate adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwA549Signal A549 Sg A549 lung adenocarcinoma cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnaseUwK562Signal K562 Sg K562 lymphoblast chronic myeloid leukemia cell line DNaseI Signal from ENCODE Regulation wgEncodeRegDnasePeak Peaks HotSpot5 peak calls on BWA. Dupe, sponge and mitochondria filtered Regulation wgEncodeRegDnaseUwBe2cPeak BE2_C Pk BE2_C neuroblastoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwWerirb1Peak WERI-Rb-1 Pk WERI-Rb-1 retinoblastoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiol100nm1hrPeak MCF-7 estr 1h Pk MCF-7 mammary adenocarcinoma cell line (estradi 1h) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiolctrl0hrPeak MCF-7 estr 0h Pk MCF-7 mammary adenocarcinoma cell line (estradi 0h) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwMcf7Peak MCF-7 Pk MCF-7 mammary adenocarcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwSknmcPeak SK-N-MC Pk SK-N-MC neuroepithelioma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHelas3Peak HeLa-S3 Pk HeLa-S3 cervical epithelial adenocarcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyadPeak HMVEC-dLy-Ad Pk HMVEC-dLy-Ad dermal MV endothelial cell, lymph DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHrpepicPeak HRPEpiC Pk HRPEpiC retinal pigment epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwRptecPeak RPTEC Pk RPTEC renal proximal tubule epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota14dPeak H7-ES diff 14d Pk H7-hESC embryonic stem cell (diff 14d) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota5dPeak H7-ES diff 5d Pk H7-hESC embryonic stem cell (diff 5d) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwH7hescPeak H7-ES Pk H7-hESC embryonic stem cell DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNb4Peak NB4 Pk NB4 acute promyelocytic leukemia (APL) cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHl60Peak HL-60 Pk HL-60 acute promyelocytic leukemia (APL) cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwMonocytescd14ro01746Peak Monocyte-CD14+ Pk Monocytes-CD14+_RO01746 monocyte, CD14+ DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwGm12865Peak GM12865 Pk GM12865 B-lymphocyte, lymphoblastoid cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwGm12878Peak GM12878 Pk GM12878 B-lymphocyte, lymphoblastoid cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwJurkatPeak Jurkat Pk Jurkat T-lymphocyte acute leukemia cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwTh1wb54553204Peak Th1_Wb54553204 Pk Th1_Wb54553204 T-lymphocyte, helper type 1 DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwTh2Peak Th2 Pk Th2 T-lymphocyte, helper type 2 DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwTh1Peak Th1 Pk Th1 T-lymphocyte, helper type 1 DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwCd20ro01778Peak CD20+_RO01778 Pk CD20+_RO01778 B-lymphocyte, CD20+ DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwSknshraPeak SK-N-SH_RA Pk SK-N-SH_RA neuroblastoma cell line, RA treated DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwCaco2Peak Caco-2 Pk Caco-2 colon adenocarcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHepg2Peak HepG2 Pk HepG2 hepatocellular carcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwGm06990Peak GM06990 Pk GM06990 B-lymphocyte, lymphoblastoid cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHeepicPeak HEEpiC Pk HEEpiC esophageal epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwPrecPeak PrEC Pk PrEC prostate epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwSaecPeak SAEC Pk SAEC small airway epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhekPeak NHEK Pk NHEK epidermal keratinocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHrePeak HRE Pk HRE renal epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHrcepicPeak HRCEpiC Pk HRCEpiC renal cortical epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdadPeak HMVEC-dAd Pk HMVEC-dAd dermal microvascular endothelial cell DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdneoPeak HMVEC-dNeo Pk HMVEC-dNeo dermal MV endothelial cell, neonate DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecllyPeak HMVEC-LLy Pk HMVEC-LLy lung microvascular endothelial cell, lymph DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHrgecPeak HRGEC Pk HRGEC renal glomerular endothelial cell DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdblneoPeak HMVEC-dBl-Neo Pk HMVEC-dBl-Neo dermal MV endothelial cell, neonate blood DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyneoPeak HMVEC-dLy-Neo Pk HMVEC-dLy-Neo dermal MV endothelial cell, neonate lymph DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmvecdbladPeak HMVEC-dBl-Ad Pk HMVEC-dBl-Ad dermal MV endothelial cell, blood DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmveclblPeak HMVEC-LBl Pk HMVEC-LBl lung microvascular epithelium. blood DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHuvecPeak HUVEC Pk HUVEC umbilical vein endothelial cell DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHsmmtubePeak HSMMtube Pk HSMMtube skeletal muscle myotube DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Diff4dPeak LHCN-M2 diff4d Pk LHCN-M2 skeletal myoblast (diff 4d) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Peak LHCN-M2 Pk LHCN-M2 skeletal myoblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHsmmPeak HSMM Pk HSMM skeletal muscle myoblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhdfadPeak NHDF-Ad Pk NHDF-Ad dermal fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwWi384ohtam20nm72hrPeak WI-38 40HTAM Pk WI-38 embryonic lung fibroblast cell line (40HTAM) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHcfaaPeak HCFaa Pk HCFaa cardiac fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHaspPeak HA-sp Pk HA-sp spinal cord astrocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwRpmi7951Peak RPMI-7951 Pk RPMI-7951 melanoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwM059jPeak M059J Pk M059J glioblastoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHahPeak HA-h Pk HA-h hippocampal astrocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAg04450Peak AG04450 Pk AG04450 fetal lung fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAg04449Peak AG04449 Pk AG04449 fetal skin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHbvsmcPeak HBVSMC Pk HBVSMC brain vascular smooth muscle DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwSkmcPeak SKMC Pk SKMC skeletal muscle cell DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHaepicPeak HAEpiC Pk HAEpiC amniotic epithelium (AEC) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhdfneoPeak NHDF-neo Pk NHDF-neo dermal fibroblast, neonate DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHgfPeak HGF Pk HGF gingival fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmfPeak HMF Pk HMF mammary fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAg10803Peak AG10803 Pk AG10803 skin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwBonemarrowmscPeak bonemarrow_MSC Pk bone_marrow_MSC bone marrow fibroblastoid DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHipepicPeak HIPEpiC Pk HIPEpiC iris pigment epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHvmfPeak HVMF Pk HVMF villous mesenchymal fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHacPeak HAc Pk HAc cerebellar astrocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHconfPeak HConF Pk HConF conjunctival fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHpfPeak HPF Pk HPF pulmonary fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHcpepicPeak HCPEpiC Pk HCPEpiC choroid plexus epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAoafPeak AoAF Pk AoAF aorta fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHpafPeak HPAF Pk HPAF pulmonary artery fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHcmPeak HCM Pk HCM cardiac myocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHcfPeak HCF Pk HCF cardiac fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHpdlfPeak HPdLF Pk HPdLF periodontal ligament fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAg09319Peak AG09319 Pk AG09319 gingival fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHbmecPeak HBMEC Pk HBMEC brain microvascular endothelial cell (MEC) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhaPeak NH-A Pk NH-A astrocyte DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhlfPeak NHLF Pk NHLF lung fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwGm04504Peak GM04504 Pk GM04504 skin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwGm04503Peak GM04503 Pk GM04503 skin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwWi38Peak WI-38 Pk WI-38 embryonic lung fibroblast cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHnpcepicPeak HNPCEpiC Pk HNPCEpiC non-pigmented ciliary epithelium (NPCEC) DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwAg09309Peak AG09309 Pk AG09309 skin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwBjPeak BJ Pk BJ foreskin fibroblast cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNt2d1Peak NT2-D1 Pk NT2-D1 embryonal carcinoma (NTera2) cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHffmycPeak HFF-Myc Pk HFF-Myc foreskin fibroblast cell line, cMyc DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHffPeak HFF Pk HFF foreskin fibroblast DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwNhberaPeak NHBE_RA Pk NHBE_RA bronchial epithelium, RA treated DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHct116Peak HCT-116 Pk HCT-116 colorectal carcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwPanc1Peak PANC-1 Pk PANC-1 pancreatic carcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwT47dPeak T-47D Pk T-47D mammary ductal carcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwHmecPeak HMEC Pk HMEC mammary epithelium DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwLncapPeak LNCaP Pk LNCaP prostate adenocarcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwA549Peak A549 Pk A549 lung adenocarcinoma cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseUwK562Peak K562 Pk K562 lymphoblast chronic myeloid leukemia cell line DNaseI Peaks from ENCODE Regulation wgEncodeRegDnaseHotspot Hotspots Hotspot5 hotspot calls on BWA. Dupe, sponge and mitochondria filtered Regulation wgEncodeRegDnaseUwBe2cHotspot BE2_C Ht BE2_C neuroblastoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwWerirb1Hotspot WERI-Rb-1 Ht WERI-Rb-1 retinoblastoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiol100nm1hrHotspot MCF-7 estr 1h Ht MCF-7 mammary adenocarcinoma cell line (estradi 1h) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwMcf7Estradiolctrl0hrHotspot MCF-7 estr 0h Ht MCF-7 mammary adenocarcinoma cell line (estradi 0h) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwMcf7Hotspot MCF-7 Ht MCF-7 mammary adenocarcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwSknmcHotspot SK-N-MC Ht SK-N-MC neuroepithelioma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHelas3Hotspot HeLa-S3 Ht HeLa-S3 cervical epithelial adenocarcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyadHotspot HMVEC-dLy-Ad Ht HMVEC-dLy-Ad dermal MV endothelial cell, lymph DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHrpepicHotspot HRPEpiC Ht HRPEpiC retinal pigment epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwRptecHotspot RPTEC Ht RPTEC renal proximal tubule epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota14dHotspot H7-ES diff 14d Ht H7-hESC embryonic stem cell (diff 14d) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwH7hescDiffprota5dHotspot H7-ES diff 5d Ht H7-hESC embryonic stem cell (diff 5d) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwH7hescHotspot H7-ES Ht H7-hESC embryonic stem cell DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNb4Hotspot NB4 Ht NB4 acute promyelocytic leukemia (APL) cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHl60Hotspot HL-60 Ht HL-60 acute promyelocytic leukemia (APL) cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwMonocytescd14ro01746Hotspot Monocyte-CD14+ Ht Monocytes-CD14+_RO01746 monocyte, CD14+ DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwGm12865Hotspot GM12865 Ht GM12865 B-lymphocyte, lymphoblastoid cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwGm12878Hotspot GM12878 Ht GM12878 B-lymphocyte, lymphoblastoid cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwJurkatHotspot Jurkat Ht Jurkat T-lymphocyte acute leukemia cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwTh1wb54553204Hotspot Th1_Wb54553204 Ht Th1_Wb54553204 T-lymphocyte, helper type 1 DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwTh2Hotspot Th2 Ht Th2 T-lymphocyte, helper type 2 DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwTh1Hotspot Th1 Ht Th1 T-lymphocyte, helper type 1 DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwCd20ro01778Hotspot CD20+_RO01778 Ht CD20+_RO01778 B-lymphocyte, CD20+ DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwSknshraHotspot SK-N-SH_RA Ht SK-N-SH_RA neuroblastoma cell line, RA treated DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwCaco2Hotspot Caco-2 Ht Caco-2 colon adenocarcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHepg2Hotspot HepG2 Ht HepG2 hepatocellular carcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwGm06990Hotspot GM06990 Ht GM06990 B-lymphocyte, lymphoblastoid cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHeepicHotspot HEEpiC Ht HEEpiC esophageal epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwPrecHotspot PrEC Ht PrEC prostate epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwSaecHotspot SAEC Ht SAEC small airway epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhekHotspot NHEK Ht NHEK epidermal keratinocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHreHotspot HRE Ht HRE renal epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHrcepicHotspot HRCEpiC Ht HRCEpiC renal cortical epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdadHotspot HMVEC-dAd Ht HMVEC-dAd dermal microvascular endothelial cell DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdneoHotspot HMVEC-dNeo Ht HMVEC-dNeo dermal microvascular endo cell, neonate DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecllyHotspot HMVEC-LLy Ht HMVEC-LLy lung microvascular endothelial cell, lymph DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHrgecHotspot HRGEC Ht HRGEC renal glomerular endothelial cell DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdblneoHotspot HMVEC-dBl-Neo Ht HMVEC-dBl-Neo dermal MV endo cell, neonate blood DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdlyneoHotspot HMVEC-dLy-Neo Ht HMVEC-dLy-Neo dermal MV endo cell, neonate lymph DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmvecdbladHotspot HMVEC-dBl-Ad Ht HMVEC-dBl-Ad dermal MV endothelial cell, blood DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmveclblHotspot HMVEC-LBl Ht HMVEC-LBl lung microvascular epithelium. blood DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHuvecHotspot HUVEC Ht HUVEC umbilical vein endothelial cell DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHsmmtubeHotspot HSMMtube Ht HSMMtube skeletal muscle myotube DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Diff4dHotspot LHCN-M2 diff4d Ht LHCN-M2 skeletal myoblast (diff 4d) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwLhcnm2Hotspot LHCN-M2 Ht LHCN-M2 skeletal myoblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHsmmHotspot HSMM Ht HSMM skeletal muscle myoblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhdfadHotspot NHDF-Ad Ht NHDF-Ad dermal fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwWi384ohtam20nm72hrHotspot WI-38 40HTAM Ht WI-38 embryonic lung fibroblast cell line (40HTAM) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHcfaaHotspot HCFaa Ht HCFaa cardiac fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHaspHotspot HA-sp Ht HA-sp spinal cord astrocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwRpmi7951Hotspot RPMI-7951 Ht RPMI-7951 melanoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwM059jHotspot M059J Ht M059J glioblastoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHahHotspot HA-h Ht HA-h hippocampal astrocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAg04450Hotspot AG04450 Ht AG04450 fetal lung fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAg04449Hotspot AG04449 Ht AG04449 fetal skin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHbvsmcHotspot HBVSMC Ht HBVSMC brain vascular smooth muscle DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwSkmcHotspot SKMC Ht SKMC skeletal muscle cell DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHaepicHotspot HAEpiC Ht HAEpiC amniotic epithelium (AEC) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhdfneoHotspot NHDF-neo Ht NHDF-neo dermal fibroblast, neonate DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHgfHotspot HGF Ht HGF gingival fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmfHotspot HMF Ht HMF mammary fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAg10803Hotspot AG10803 Ht AG10803 skin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwBonemarrowmscHotspot bonemarrow_MSC Ht bone_marrow_MSC bone marrow fibroblastoid DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHipepicHotspot HIPEpiC Ht HIPEpiC iris pigment epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHvmfHotspot HVMF Ht HVMF villous mesenchymal fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHacHotspot HAc Ht HAc cerebellar astrocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHconfHotspot HConF Ht HConF conjunctival fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHpfHotspot HPF Ht HPF pulmonary fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHcpepicHotspot HCPEpiC Ht HCPEpiC choroid plexus epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAoafHotspot AoAF Ht AoAF aorta fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHpafHotspot HPAF Ht HPAF pulmonary artery fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHcmHotspot HCM Ht HCM cardiac myocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHcfHotspot HCF Ht HCF cardiac fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHpdlfHotspot HPdLF Ht HPdLF periodontal ligament fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAg09319Hotspot AG09319 Ht AG09319 gingival fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHbmecHotspot HBMEC Ht HBMEC brain microvascular endothelial cell (MEC) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhaHotspot NH-A Ht NH-A astrocyte DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhlfHotspot NHLF Ht NHLF lung fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwGm04504Hotspot GM04504 Ht GM04504 skin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwGm04503Hotspot GM04503 Ht GM04503 skin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwWi38Hotspot WI-38 Ht WI-38 embryonic lung fibroblast cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHnpcepicHotspot HNPCEpiC Ht HNPCEpiC non-pigmented ciliary epithelium (NPCEC) DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwAg09309Hotspot AG09309 Ht AG09309 skin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwBjHotspot BJ Ht BJ foreskin fibroblast cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNt2d1Hotspot NT2-D1 Ht NT2-D1 embryonal carcinoma (NTera2) cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHffmycHotspot HFF-Myc Ht HFF-Myc foreskin fibroblast cell line, cMyc DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHffHotspot HFF Ht HFF foreskin fibroblast DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwNhberaHotspot NHBE_RA Ht NHBE_RA bronchial epithelium, RA treated DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHct116Hotspot HCT-116 Ht HCT-116 colorectal carcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwPanc1Hotspot PANC-1 Ht PANC-1 pancreatic carcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwT47dHotspot T-47D Ht T-47D mammary ductal carcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwHmecHotspot HMEC Ht HMEC mammary epithelium DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwLncapHotspot LNCaP Ht LNCaP prostate adenocarcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwA549Hotspot A549 Ht A549 lung adenocarcinoma cell line DNaseI Hotspots from ENCODE Regulation wgEncodeRegDnaseUwK562Hotspot K562 Ht K562 lymphoblast chronic myeloid leukemia cell line DNaseI Hotspots from ENCODE Regulation encRegTfbsClustered TF Clusters Transcription Factor ChIP-seq Clusters (340 factors, 129 cell types) from ENCODE 3 Regulation Description This track shows regions of transcription factor binding derived from a large collection of ChIP-seq experiments performed by the ENCODE project between February 2011 and November 2018, spanning the first production phase of ENCODE ("ENCODE 2") through the second full production phase ("ENCODE 3"). Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to specific short DNA sequences ('motifs'); others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation followed by sequencing, or 'ChIP-seq') can be used to identify regions of TF binding genome-wide. These regions are commonly called ChIP-seq peaks. ENCODE TF ChIP-seq data were processed using the ENCODE Transcription Factor ChIP-seq Processing Pipeline to generate peaks of TF binding. Peaks from 1264 experiments (1256 in hg38) representing 338 transcription factors (340 in hg38) in 130 cell types (129 in hg38) are combined here into clusters to produce a summary display showing occupancy regions for each factor. The underlying ChIP-seq peak data are available from the ENCODE 3 TF ChIP Peaks tracks ( hg19, hg38) Display Conventions A gray box encloses each peak cluster of transcription factor occupancy, with the darkness of the box being proportional to the maximum signal strength observed in any cell type contributing to the cluster. The HGNC gene name for the transcription factor is shown to the left of each cluster. To the right of the cluster a configurable label can optionally display information about the cell types contributing to the cluster and how many cell types were assayed for the factor (count where detected / count where assayed). For brevity in the display, each cell type is abbreviated to a single letter. The darkness of the letter is proportional to the signal strength observed in the cell line. Abbreviations starting with capital letters designate ENCODE cell types initially identified for intensive study, while those starting with lowercase letters designate cell lines added later in the project. Click on a peak cluster to see more information about the TF/cell assays contributing to the cluster and the cell line abbreviation table. Methods Peaks of transcription factor occupancy ("optimal peak set") from ENCODE ChIP-seq datasets were clustered using the UCSC hgBedsToBedExps tool. Scores were assigned to peaks by multiplying the input signal values by a normalization factor calculated as the ratio of the maximum score value (1000) to the signal value at one standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the effect of distributing scores up to mean plus one 1 standard deviation across the score range, but assigning all above to the maximum score. The cluster score is the highest score for any peak contributing to the cluster. Data Access The raw data for the ENCODE3 TF Clusters track can be accessed from the Table Browser or combined with other datasets through the Data Integrator. This data is stored internally as a BED5+3 MySQL table with additional metadata tables. For automated analysis and download, the encRegTfbsClusteredWithCells.hg38.bed.gz track data file can be downloaded from our downloads server, which has 5 fields of BED data followed by a comma-separated list of cell types. The data can also be queried using the JSON API or the Public SQL server. Credits Thanks to the ENCODE Consortium, the ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center for generating and processing the TF ChIP-seq datasets used here. The ENCODE accession numbers of the constituent datasets are available from the peak details page. Special thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI, at the ENCODE Data Analysis Center (ZLab at UMass Medical Center) for providing the peak datasets, metadata, and guidance developing this track. Please check the ZLab ENCODE Public Hubs for the most updated data. The integrative view presented here was developed by Jim Kent at UCSC. References ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585 ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153 Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32. PMID: 26527727; PMC: PMC4702836 Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100. PMID: 22955619 Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012 Sep;22(9):1798-812. PMID: 22955990; PMC: PMC3431495 Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. PMID: 23203885; PMC: PMC3531197 Data Use Policy Users may freely download, analyze and publish results based on any ENCODE data without restrictions. Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional. Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE production laboratory(s) that generated the datasets used, as described in Citing ENCODE. encTfChipPk TF ChIP Transcription Factor ChIP-seq Peaks (340 factors in 129 cell types) from ENCODE 3 Regulation Description This track represents a comprehensive set of human transcription factor binding sites based on ChIP-seq experiments generated by production groups in the ENCODE Consortium between February 2011 and November 2018. Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to specific short DNA sequences ('motifs'); others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation followed by sequencing, or 'ChIP-seq') can be used to identify regions of TF binding genome-wide. These regions are commonly called ChIP-seq peaks. The related Transcription Factor ChIP-seq Clusters tracks (hg19, hg38) provide summary views of this data. Display and File Conventions and Configuration The display for this track shows site location with the point-source of the peak marked with a colored vertical bar and the level of enrichment at the site indicated by the darkness of the item. The subtracks are colored by UCSC ENCODE 2 cell type color conventions on the hg19 assembly, and by similarity of cell types in DNaseI hypersensitivity assays (as in the DNase Signal) track in the hg38 assembly. The display can be filtered to higher valued items, using the Score range: configuration item. The score values were computed at UCSC based on signal values assigned by the ENCODE pipeline. The input signal values were multiplied by a normalization factor calculated as the ratio of the maximum score value (1000) to the signal value at 1 standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the effect of distributing scores up to mean + 1std across the score range, but assigning all above to the maximum score. Methods The ChIP-seq peaks in this track were generated by the the ENCODE Transcription Factor ChIP-seq Processing Pipeline. Methods documentation and full metadata for each track can be found at the ENCODE project portal, using The ENCODE file accession (ENCFF*) listed in the track label. Credits Thanks to the ENCODE Consortium, the ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center for generating and processing the datasets used here. Special thanks to Henry Pratt, Jill Moore, Michael Purcaro, and Zhiping Weng, PI, at the ENCODE Data Analysis Center (ZLab at UMass Medical Center) for providing the peak datasets, metadata, and guidance developing this track. Please check the ZLab ENCODE Public Hubs for the most updated data. References ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 Apr;9(4):e1001046. PMID: 21526222; PMCID: PMC3079585 ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMCID: PMC3439153 Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I, Narayanan AK, Ho M, Lee BT et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32. PMID: 26527727; PMC: PMC4702836 Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100. PMID: 22955619 Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012 Sep;22(9):1798-812. PMID: 22955990; PMC: PMC3431495 Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. PMID: 23203885; PMC: PMC3531197 Data Use Policy Users may freely download, analyze and publish results based on any ENCODE data without restrictions. Researchers using unpublished ENCODE data are encouraged to contact the data producers to discuss possible coordinated publications; however, this is optional. Users of ENCODE datasets are requested to cite the ENCODE Consortium and ENCODE production laboratory(s) that generated the datasets used, as described in Citing ENCODE. encTfChipPkENCFF635MUK vagina POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in vagina from ENCODE 3 (ENCFF635MUK) Regulation encTfChipPkENCFF865QLX vagina POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in vagina from ENCODE 3 (ENCFF865QLX) Regulation encTfChipPkENCFF242HMY vagina EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in vagina from ENCODE 3 (ENCFF242HMY) Regulation encTfChipPkENCFF116VEG vagina EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in vagina from ENCODE 3 (ENCFF116VEG) Regulation encTfChipPkENCFF508LRF vagina CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in vagina from ENCODE 3 (ENCFF508LRF) Regulation encTfChipPkENCFF579GUD vagina CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in vagina from ENCODE 3 (ENCFF579GUD) Regulation encTfChipPkENCFF198EUQ uterus POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in uterus from ENCODE 3 (ENCFF198EUQ) Regulation encTfChipPkENCFF236XBY uterus POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in uterus from ENCODE 3 (ENCFF236XBY) Regulation encTfChipPkENCFF179YWB uterus CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in uterus from ENCODE 3 (ENCFF179YWB) Regulation encTfChipPkENCFF866EIC uterus CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in uterus from ENCODE 3 (ENCFF866EIC) Regulation encTfChipPkENCFF834DID lungUpLb POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in upper_lobe_of_left_lung from ENCODE 3 (ENCFF834DID) Regulation encTfChipPkENCFF626AFW lungUpLb POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in upper_lobe_of_left_lung from ENCODE 3 (ENCFF626AFW) Regulation encTfChipPkENCFF468AEV lungUpLb POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in upper_lobe_of_left_lung from ENCODE 3 (ENCFF468AEV) Regulation encTfChipPkENCFF665TLS lungUpLb POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in upper_lobe_of_left_lung from ENCODE 3 (ENCFF665TLS) Regulation encTfChipPkENCFF567XKZ lungUpLbe EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in upper_lobe_of_left_lung from ENCODE 3 (ENCFF567XKZ) Regulation encTfChipPkENCFF833NHM lungUpLbe EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in upper_lobe_of_left_lung from ENCODE 3 (ENCFF833NHM) Regulation encTfChipPkENCFF676WYA lungUpLbe EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in upper_lobe_of_left_lung from ENCODE 3 (ENCFF676WYA) Regulation encTfChipPkENCFF348MWL lungUpLbe EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in upper_lobe_of_left_lung from ENCODE 3 (ENCFF348MWL) Regulation encTfChipPkENCFF716XFO lungUpLobe CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in upper_lobe_of_left_lung from ENCODE 3 (ENCFF716XFO) Regulation encTfChipPkENCFF254NYT lungUpLobe CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in upper_lobe_of_left_lung from ENCODE 3 (ENCFF254NYT) Regulation encTfChipPkENCFF749CMN lungUpLobe CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in upper_lobe_of_left_lung from ENCODE 3 (ENCFF749CMN) Regulation encTfChipPkENCFF869YGK lungUpLobe CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in upper_lobe_of_left_lung from ENCODE 3 (ENCFF869YGK) Regulation encTfChipPkENCFF028RZP trnsvCln POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in transverse_colon from ENCODE 3 (ENCFF028RZP) Regulation encTfChipPkENCFF228NVN trnsvCln POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in transverse_colon from ENCODE 3 (ENCFF228NVN) Regulation encTfChipPkENCFF211VGU trnsvCln POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in transverse_colon from ENCODE 3 (ENCFF211VGU) Regulation encTfChipPkENCFF185LTG trnsvCln POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in transverse_colon from ENCODE 3 (ENCFF185LTG) Regulation encTfChipPkENCFF580NSJ transvCln EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in transverse_colon from ENCODE 3 (ENCFF580NSJ) Regulation encTfChipPkENCFF079CRY transvCln EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in transverse_colon from ENCODE 3 (ENCFF079CRY) Regulation encTfChipPkENCFF244FQD transvCln EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in transverse_colon from ENCODE 3 (ENCFF244FQD) Regulation encTfChipPkENCFF693TBO trnsvColon CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in transverse_colon from ENCODE 3 (ENCFF693TBO) Regulation encTfChipPkENCFF538QPY trnsvColon CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in transverse_colon from ENCODE 3 (ENCFF538QPY) Regulation encTfChipPkENCFF607VAP trnsvColon CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in transverse_colon from ENCODE 3 (ENCFF607VAP) Regulation encTfChipPkENCFF907KEJ trnsvColon CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in transverse_colon from ENCODE 3 (ENCFF907KEJ) Regulation encTfChipPkENCFF663DIG tblNerve POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in tibial_nerve from ENCODE 3 (ENCFF663DIG) Regulation encTfChipPkENCFF355SDI tblNerve POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in tibial_nerve from ENCODE 3 (ENCFF355SDI) Regulation encTfChipPkENCFF049JNK tbialNerv EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in tibial_nerve from ENCODE 3 (ENCFF049JNK) Regulation encTfChipPkENCFF848EEE tbialNerv EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in tibial_nerve from ENCODE 3 (ENCFF848EEE) Regulation encTfChipPkENCFF691IPU tibialNerve CTCF Transcription Factor ChIP-seq Peaks of CTCF in tibial_nerve from ENCODE 3 (ENCFF691IPU) Regulation encTfChipPkENCFF611MLV tbialNerve CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in tibial_nerve from ENCODE 3 (ENCFF611MLV) Regulation encTfChipPkENCFF237VLQ tbialNerve CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in tibial_nerve from ENCODE 3 (ENCFF237VLQ) Regulation encTfChipPkENCFF960TOX tibialArtery CTCF Transcription Factor ChIP-seq Peaks of CTCF in tibial_artery from ENCODE 3 (ENCFF960TOX) Regulation encTfChipPkENCFF445NPR thyroid POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in thyroid_gland from ENCODE 3 (ENCFF445NPR) Regulation encTfChipPkENCFF710ZQC thyroid POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in thyroid_gland from ENCODE 3 (ENCFF710ZQC) Regulation encTfChipPkENCFF989JUA thyroid CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in thyroid_gland from ENCODE 3 (ENCFF989JUA) Regulation encTfChipPkENCFF026ZWL thyroid CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in thyroid_gland from ENCODE 3 (ENCFF026ZWL) Regulation encTfChipPkENCFF728IYI thyroid CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in thyroid_gland from ENCODE 3 (ENCFF728IYI) Regulation encTfChipPkENCFF535DHF testis POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in testis from ENCODE 3 (ENCFF535DHF) Regulation encTfChipPkENCFF046VTZ testis EP300 Transcription Factor ChIP-seq Peaks of EP300 in testis from ENCODE 3 (ENCFF046VTZ) Regulation encTfChipPkENCFF644JKD testis CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in testis from ENCODE 3 (ENCFF644JKD) Regulation encTfChipPkENCFF788RFY testis CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in testis from ENCODE 3 (ENCFF788RFY) Regulation encTfChipPkENCFF480OTT sprpSkin POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in suprapubic_skin from ENCODE 3 (ENCFF480OTT) Regulation encTfChipPkENCFF401DJJ sprpSkin POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in suprapubic_skin from ENCODE 3 (ENCFF401DJJ) Regulation encTfChipPkENCFF266KJH suprpSkin EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in suprapubic_skin from ENCODE 3 (ENCFF266KJH) Regulation encTfChipPkENCFF104UOC suprpSkin EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in suprapubic_skin from ENCODE 3 (ENCFF104UOC) Regulation encTfChipPkENCFF079BIZ suprpSkin EP300 Transcription Factor ChIP-seq Peaks of EP300 in suprapubic_skin from ENCODE 3 (ENCFF079BIZ) Regulation encTfChipPkENCFF783HDF suprpbSkin EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in suprapubic_skin from ENCODE 3 (ENCFF783HDF) Regulation encTfChipPkENCFF102XCU suprpbSkin CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in suprapubic_skin from ENCODE 3 (ENCFF102XCU) Regulation encTfChipPkENCFF687WWO suprpbSkin CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in suprapubic_skin from ENCODE 3 (ENCFF687WWO) Regulation encTfChipPkENCFF085MWN subcutAdp EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF085MWN) Regulation encTfChipPkENCFF434OJH subcutAdp EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF434OJH) Regulation encTfChipPkENCFF191VCL subcutAdp EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF191VCL) Regulation encTfChipPkENCFF042DNR subcutAdp EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF042DNR) Regulation encTfChipPkENCFF688KFE subcutAdip CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF688KFE) Regulation encTfChipPkENCFF719VDM subcutAdip CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in subcutaneous_adipose_tissue from ENCODE 3 (ENCFF719VDM) Regulation encTfChipPkENCFF280GHS stomach POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in stomach from ENCODE 3 (ENCFF280GHS) Regulation encTfChipPkENCFF905CUU stomach POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in stomach from ENCODE 3 (ENCFF905CUU) Regulation encTfChipPkENCFF880FUR stomach POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in stomach from ENCODE 3 (ENCFF880FUR) Regulation encTfChipPkENCFF827SHP stomach POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in stomach from ENCODE 3 (ENCFF827SHP) Regulation encTfChipPkENCFF469SGL stomach EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in stomach from ENCODE 3 (ENCFF469SGL) Regulation encTfChipPkENCFF904COM stomach EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in stomach from ENCODE 3 (ENCFF904COM) Regulation encTfChipPkENCFF856BRS stomach EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in stomach from ENCODE 3 (ENCFF856BRS) Regulation encTfChipPkENCFF831BFL stomach CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in stomach from ENCODE 3 (ENCFF831BFL) Regulation encTfChipPkENCFF220VAH stomach CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in stomach from ENCODE 3 (ENCFF220VAH) Regulation encTfChipPkENCFF825XAC stomach CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in stomach from ENCODE 3 (ENCFF825XAC) Regulation encTfChipPkENCFF481CNC stomach CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in stomach from ENCODE 3 (ENCFF481CNC) Regulation encTfChipPkENCFF379SGB spleen POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in spleen from ENCODE 3 (ENCFF379SGB) Regulation encTfChipPkENCFF323FPP spleen POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in spleen from ENCODE 3 (ENCFF323FPP) Regulation encTfChipPkENCFF128AIK spleen POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in spleen from ENCODE 3 (ENCFF128AIK) Regulation encTfChipPkENCFF290BOD spleen POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in spleen from ENCODE 3 (ENCFF290BOD) Regulation encTfChipPkENCFF068YLN spleen CTCF 5 Transcription Factor ChIP-seq Peaks of CTCF in spleen from ENCODE 3 (ENCFF068YLN) Regulation encTfChipPkENCFF340BQM spleen CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in spleen from ENCODE 3 (ENCFF340BQM) Regulation encTfChipPkENCFF248QUD spleen CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in spleen from ENCODE 3 (ENCFF248QUD) Regulation encTfChipPkENCFF234VTM spleen CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in spleen from ENCODE 3 (ENCFF234VTM) Regulation encTfChipPkENCFF540DVR spleen CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in spleen from ENCODE 3 (ENCFF540DVR) Regulation encTfChipPkENCFF141MTA smoothMuscle CTCF Transcription Factor ChIP-seq Peaks of CTCF in smooth_muscle_cell from ENCODE 3 (ENCFF141MTA) Regulation encTfChipPkENCFF928ZSB sigmdCln POLR2A 5 Transcription Factor ChIP-seq Peaks of POLR2A in sigmoid_colon from ENCODE 3 (ENCFF928ZSB) Regulation encTfChipPkENCFF191BTJ sigmdCln POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in sigmoid_colon from ENCODE 3 (ENCFF191BTJ) Regulation encTfChipPkENCFF680JEG sigmdCln POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in sigmoid_colon from ENCODE 3 (ENCFF680JEG) Regulation encTfChipPkENCFF182ETN sigmdCln POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in sigmoid_colon from ENCODE 3 (ENCFF182ETN) Regulation encTfChipPkENCFF328BTO sigmdCln POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in sigmoid_colon from ENCODE 3 (ENCFF328BTO) Regulation encTfChipPkENCFF091KSY sigmdCln EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in sigmoid_colon from ENCODE 3 (ENCFF091KSY) Regulation encTfChipPkENCFF169FFA sigmdCln EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in sigmoid_colon from ENCODE 3 (ENCFF169FFA) Regulation encTfChipPkENCFF231LOU sigmdCln EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in sigmoid_colon from ENCODE 3 (ENCFF231LOU) Regulation encTfChipPkENCFF616YFR sigmdCln EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in sigmoid_colon from ENCODE 3 (ENCFF616YFR) Regulation encTfChipPkENCFF615AFS sigmdColon CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in sigmoid_colon from ENCODE 3 (ENCFF615AFS) Regulation encTfChipPkENCFF782FTD sigmdColon CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in sigmoid_colon from ENCODE 3 (ENCFF782FTD) Regulation encTfChipPkENCFF668SIT sigmdColon CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in sigmoid_colon from ENCODE 3 (ENCFF668SIT) Regulation encTfChipPkENCFF070ILT sigmdColon CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in sigmoid_colon from ENCODE 3 (ENCFF070ILT) Regulation encTfChipPkENCFF113NNM liverRLobe CTCF 1 Transcription Factor ChIP-seq Peaks of POLR2A in right_lobe_of_liver from ENCODE 3 (ENCFF113NNM) Regulation encTfChipPkENCFF136LAP liverRLobe CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in right_lobe_of_liver from ENCODE 3 (ENCFF136LAP) Regulation encTfChipPkENCFF409DTL retinPgmtEpi CTCF Transcription Factor ChIP-seq Peaks of CTCF in retinal_pigment_epithelial_cell from ENCODE 3 (ENCFF409DTL) Regulation encTfChipPkENCFF674MDG prostate POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in prostate_gland from ENCODE 3 (ENCFF674MDG) Regulation encTfChipPkENCFF160SYU prostate POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in prostate_gland from ENCODE 3 (ENCFF160SYU) Regulation encTfChipPkENCFF341UHT prostate CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in prostate_gland from ENCODE 3 (ENCFF341UHT) Regulation encTfChipPkENCFF142JXX prostate CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in prostate_gland from ENCODE 3 (ENCFF142JXX) Regulation encTfChipPkENCFF016APK ovary POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in ovary from ENCODE 3 (ENCFF016APK) Regulation encTfChipPkENCFF353CLB ovary EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in ovary from ENCODE 3 (ENCFF353CLB) Regulation encTfChipPkENCFF970XBE ovary EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in ovary from ENCODE 3 (ENCFF970XBE) Regulation encTfChipPkENCFF886WWT ovary CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in ovary from ENCODE 3 (ENCFF886WWT) Regulation encTfChipPkENCFF006YGI ovary CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in ovary from ENCODE 3 (ENCFF006YGI) Regulation encTfChipPkENCFF454DZP omntalFat EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in omental_fat_pad from ENCODE 3 (ENCFF454DZP) Regulation encTfChipPkENCFF102IIP omntalFat EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in omental_fat_pad from ENCODE 3 (ENCFF102IIP) Regulation encTfChipPkENCFF895RTD omntalFat EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in omental_fat_pad from ENCODE 3 (ENCFF895RTD) Regulation encTfChipPkENCFF199FCD omntalFat EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in omental_fat_pad from ENCODE 3 (ENCFF199FCD) Regulation encTfChipPkENCFF157OEN omentalFat CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in omental_fat_pad from ENCODE 3 (ENCFF157OEN) Regulation encTfChipPkENCFF399NTP omentalFat CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in omental_fat_pad from ENCODE 3 (ENCFF399NTP) Regulation encTfChipPkENCFF668UDC omentalFat CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in omental_fat_pad from ENCODE 3 (ENCFF668UDC) Regulation encTfChipPkENCFF122IMV neutrophil CTCF Transcription Factor ChIP-seq Peaks of CTCF in neutrophil from ENCODE 3 (ENCFF122IMV) Regulation encTfChipPkENCFF295HQJ neurlProgntr EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in neural_progenitor_cell from ENCODE 3 (ENCFF295HQJ) Regulation encTfChipPkENCFF560GGY neurlProgntr CTCF Transcription Factor ChIP-seq Peaks of CTCF in neural_progenitor_cell from ENCODE 3 (ENCFF560GGY) Regulation encTfChipPkENCFF944KJO neuralCell SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in neural_cell from ENCODE 3 (ENCFF944KJO) Regulation encTfChipPkENCFF454TRL neuralCell RAD21 Transcription Factor ChIP-seq Peaks of RAD21 in neural_cell from ENCODE 3 (ENCFF454TRL) Regulation encTfChipPkENCFF255WJM neuralCell MXI1 Transcription Factor ChIP-seq Peaks of MXI1 in neural_cell from ENCODE 3 (ENCFF255WJM) Regulation encTfChipPkENCFF108BSU neuralCell EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in neural_cell from ENCODE 3 (ENCFF108BSU) Regulation encTfChipPkENCFF459ARL neuralCell EP300 Transcription Factor ChIP-seq Peaks of EP300 in neural_cell from ENCODE 3 (ENCFF459ARL) Regulation encTfChipPkENCFF372JOV neuralCell CTCF Transcription Factor ChIP-seq Peaks of CTCF in neural_cell from ENCODE 3 (ENCFF372JOV) Regulation encTfChipPkENCFF719TNH myotube CTCF Transcription Factor ChIP-seq Peaks of CTCF in myotube from ENCODE 3 (ENCFF719TNH) Regulation encTfChipPkENCFF845NAG medlblastoma CTCF Transcription Factor ChIP-seq Peaks of CTCF in medulloblastoma from ENCODE 3 (ENCFF845NAG) Regulation encTfChipPkENCFF493HJH mammaryEpith CTCF Transcription Factor ChIP-seq Peaks of CTCF in mammary_epithelial_cell from ENCODE 3 (ENCFF493HJH) Regulation encTfChipPkENCFF072MPX lwrLgSkn POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in lower_leg_skin from ENCODE 3 (ENCFF072MPX) Regulation encTfChipPkENCFF818GNJ lwrLgSkn POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in lower_leg_skin from ENCODE 3 (ENCFF818GNJ) Regulation encTfChipPkENCFF916FGF lwrLegSkin CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in lower_leg_skin from ENCODE 3 (ENCFF916FGF) Regulation encTfChipPkENCFF992DNN lwrLegSkin CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in lower_leg_skin from ENCODE 3 (ENCFF992DNN) Regulation encTfChipPkENCFF846VQK lwrLegSkin CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in lower_leg_skin from ENCODE 3 (ENCFF846VQK) Regulation encTfChipPkENCFF912XIE lwrLegSkin CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in lower_leg_skin from ENCODE 3 (ENCFF912XIE) Regulation encTfChipPkENCFF727ZIT liver ZBTB33 2 Transcription Factor ChIP-seq Peaks of ZBTB33 in liver from ENCODE 3 (ENCFF727ZIT) Regulation encTfChipPkENCFF882UHR liver ZBTB33 1 Transcription Factor ChIP-seq Peaks of ZBTB33 in liver from ENCODE 3 (ENCFF882UHR) Regulation encTfChipPkENCFF459TWF liver YY1 2 Transcription Factor ChIP-seq Peaks of YY1 in liver from ENCODE 3 (ENCFF459TWF) Regulation encTfChipPkENCFF838VFX liver YY1 1 Transcription Factor ChIP-seq Peaks of YY1 in liver from ENCODE 3 (ENCFF838VFX) Regulation encTfChipPkENCFF214OJW liver TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in liver from ENCODE 3 (ENCFF214OJW) Regulation encTfChipPkENCFF978TMH liver SP1 2 Transcription Factor ChIP-seq Peaks of SP1 in liver from ENCODE 3 (ENCFF978TMH) Regulation encTfChipPkENCFF433EFF liver SP1 1 Transcription Factor ChIP-seq Peaks of SP1 in liver from ENCODE 3 (ENCFF433EFF) Regulation encTfChipPkENCFF572MCI liver RXRA 2 Transcription Factor ChIP-seq Peaks of RXRA in liver from ENCODE 3 (ENCFF572MCI) Regulation encTfChipPkENCFF201KGJ liver RXRA 1 Transcription Factor ChIP-seq Peaks of RXRA in liver from ENCODE 3 (ENCFF201KGJ) Regulation encTfChipPkENCFF288XHG liver REST 2 Transcription Factor ChIP-seq Peaks of REST in liver from ENCODE 3 (ENCFF288XHG) Regulation encTfChipPkENCFF178WRO liver REST 1 Transcription Factor ChIP-seq Peaks of REST in liver from ENCODE 3 (ENCFF178WRO) Regulation encTfChipPkENCFF315BSV liver RAD21 3 Transcription Factor ChIP-seq Peaks of RAD21 in liver from ENCODE 3 (ENCFF315BSV) Regulation encTfChipPkENCFF295GOD liver RAD21 2 Transcription Factor ChIP-seq Peaks of RAD21 in liver from ENCODE 3 (ENCFF295GOD) Regulation encTfChipPkENCFF229WFR liver RAD21 1 Transcription Factor ChIP-seq Peaks of RAD21 in liver from ENCODE 3 (ENCFF229WFR) Regulation encTfChipPkENCFF819WNB liver NR2F2 2 Transcription Factor ChIP-seq Peaks of NR2F2 in liver from ENCODE 3 (ENCFF819WNB) Regulation encTfChipPkENCFF379TVQ liver NR2F2 1 Transcription Factor ChIP-seq Peaks of NR2F2 in liver from ENCODE 3 (ENCFF379TVQ) Regulation encTfChipPkENCFF669BQN liver MAX 2 Transcription Factor ChIP-seq Peaks of MAX in liver from ENCODE 3 (ENCFF669BQN) Regulation encTfChipPkENCFF493ZMX liver MAX 1 Transcription Factor ChIP-seq Peaks of MAX in liver from ENCODE 3 (ENCFF493ZMX) Regulation encTfChipPkENCFF229COM liver JUND 2 Transcription Factor ChIP-seq Peaks of JUND in liver from ENCODE 3 (ENCFF229COM) Regulation encTfChipPkENCFF420PED liver JUND 1 Transcription Factor ChIP-seq Peaks of JUND in liver from ENCODE 3 (ENCFF420PED) Regulation encTfChipPkENCFF497MUF liver HNF4G Transcription Factor ChIP-seq Peaks of HNF4G in liver from ENCODE 3 (ENCFF497MUF) Regulation encTfChipPkENCFF905JAC liver HNF4A 2 Transcription Factor ChIP-seq Peaks of HNF4A in liver from ENCODE 3 (ENCFF905JAC) Regulation encTfChipPkENCFF837QHJ liver HNF4A 1 Transcription Factor ChIP-seq Peaks of HNF4A in liver from ENCODE 3 (ENCFF837QHJ) Regulation encTfChipPkENCFF280YAF liver GABPA 2 Transcription Factor ChIP-seq Peaks of GABPA in liver from ENCODE 3 (ENCFF280YAF) Regulation encTfChipPkENCFF344XWK liver GABPA 1 Transcription Factor ChIP-seq Peaks of GABPA in liver from ENCODE 3 (ENCFF344XWK) Regulation encTfChipPkENCFF293LRQ liver FOXA2 2 Transcription Factor ChIP-seq Peaks of FOXA2 in liver from ENCODE 3 (ENCFF293LRQ) Regulation encTfChipPkENCFF168JLI liver FOXA2 1 Transcription Factor ChIP-seq Peaks of FOXA2 in liver from ENCODE 3 (ENCFF168JLI) Regulation encTfChipPkENCFF324QGE liver FOXA1 2 Transcription Factor ChIP-seq Peaks of FOXA1 in liver from ENCODE 3 (ENCFF324QGE) Regulation encTfChipPkENCFF951VPZ liver FOXA1 1 Transcription Factor ChIP-seq Peaks of FOXA1 in liver from ENCODE 3 (ENCFF951VPZ) Regulation encTfChipPkENCFF617JQS liver EGR1 2 Transcription Factor ChIP-seq Peaks of EGR1 in liver from ENCODE 3 (ENCFF617JQS) Regulation encTfChipPkENCFF808WST liver EGR1 1 Transcription Factor ChIP-seq Peaks of EGR1 in liver from ENCODE 3 (ENCFF808WST) Regulation encTfChipPkENCFF143HEE liver CTCF Transcription Factor ChIP-seq Peaks of CTCF in liver from ENCODE 3 (ENCFF143HEE) Regulation encTfChipPkENCFF146URA liver ATF3 2 Transcription Factor ChIP-seq Peaks of ATF3 in liver from ENCODE 3 (ENCFF146URA) Regulation encTfChipPkENCFF782SGI liver ATF3 1 Transcription Factor ChIP-seq Peaks of ATF3 in liver from ENCODE 3 (ENCFF782SGI) Regulation encTfChipPkENCFF674KUN kidneyEpith CTCF Transcription Factor ChIP-seq Peaks of CTCF in kidney_epithelial_cell from ENCODE 3 (ENCFF674KUN) Regulation encTfChipPkENCFF028IIR keratinocyte CTCF Transcription Factor ChIP-seq Peaks of CTCF in keratinocyte from ENCODE 3 (ENCFF028IIR) Regulation encTfChipPkENCFF324UNA hepatocyte EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in hepatocyte from ENCODE 3 (ENCFF324UNA) Regulation encTfChipPkENCFF846FYU hepatocyte CTCF Transcription Factor ChIP-seq Peaks of CTCF in hepatocyte from ENCODE 3 (ENCFF846FYU) Regulation encTfChipPkENCFF226GKH hrtLfVnt POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in heart_left_ventricle from ENCODE 3 (ENCFF226GKH) Regulation encTfChipPkENCFF156SPI hrtLfVnt POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in heart_left_ventricle from ENCODE 3 (ENCFF156SPI) Regulation encTfChipPkENCFF552XDP heartLftVent CTCF Transcription Factor ChIP-seq Peaks of CTCF in heart_left_ventricle from ENCODE 3 (ENCFF552XDP) Regulation encTfChipPkENCFF530FGP gEsphSph POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in gastroesophageal_sphincter from ENCODE 3 (ENCFF530FGP) Regulation encTfChipPkENCFF128UUT gEsphSph POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in gastroesophageal_sphincter from ENCODE 3 (ENCFF128UUT) Regulation encTfChipPkENCFF835VAP gEsphSph POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in gastroesophageal_sphincter from ENCODE 3 (ENCFF835VAP) Regulation encTfChipPkENCFF291RDN gsEsphSph EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in gastroesophageal_sphincter from ENCODE 3 (ENCFF291RDN) Regulation encTfChipPkENCFF481USU gsEsphSph EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in gastroesophageal_sphincter from ENCODE 3 (ENCFF481USU) Regulation encTfChipPkENCFF992XPI gsEsphSph EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in gastroesophageal_sphincter from ENCODE 3 (ENCFF992XPI) Regulation encTfChipPkENCFF951SRP gstEsphSph CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in gastroesophageal_sphincter from ENCODE 3 (ENCFF951SRP) Regulation encTfChipPkENCFF973KKY gstEsphSph CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in gastroesophageal_sphincter from ENCODE 3 (ENCFF973KKY) Regulation encTfChipPkENCFF227YCI gstrcMed POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in gastrocnemius_medialis from ENCODE 3 (ENCFF227YCI) Regulation encTfChipPkENCFF089XKW gstrcMed POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in gastrocnemius_medialis from ENCODE 3 (ENCFF089XKW) Regulation encTfChipPkENCFF100SKI gastrocMed CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in gastrocnemius_medialis from ENCODE 3 (ENCFF100SKI) Regulation encTfChipPkENCFF016OGE gastrocMed CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in gastrocnemius_medialis from ENCODE 3 (ENCFF016OGE) Regulation encTfChipPkENCFF281XHU gastrocMed CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in gastrocnemius_medialis from ENCODE 3 (ENCFF281XHU) Regulation encTfChipPkENCFF060WTK frskinKrtn CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in foreskin_keratinocyte from ENCODE 3 (ENCFF060WTK) Regulation encTfChipPkENCFF236RJT frskinKrtn CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in foreskin_keratinocyte from ENCODE 3 (ENCFF236RJT) Regulation encTfChipPkENCFF349RNE frskinKrtn CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in foreskin_keratinocyte from ENCODE 3 (ENCFF349RNE) Regulation encTfChipPkENCFF273NIW frsknFibro CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in foreskin_fibroblast from ENCODE 3 (ENCFF273NIW) Regulation encTfChipPkENCFF178FRI frsknFibro CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in foreskin_fibroblast from ENCODE 3 (ENCFF178FRI) Regulation encTfChipPkENCFF032BJW vlMesenFibro CTCF Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_villous_mesenchyme from ENCODE 3 (ENCFF032BJW) Regulation encTfChipPkENCFF322FBH aortaAdFibro CTCF Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_the_aortic_adventitia from ENCODE 3 (ENCFF322FBH) Regulation encTfChipPkENCFF093QTY plArtryFibro CTCF Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_pulmonary_artery from ENCODE 3 (ENCFF093QTY) Regulation encTfChipPkENCFF196CRQ mamryGlFibro CTCF Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_mammary_gland from ENCODE 3 (ENCFF196CRQ) Regulation encTfChipPkENCFF218LOB lungFibro CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_lung from ENCODE 3 (ENCFF218LOB) Regulation encTfChipPkENCFF777ODE lungFibro CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in fibroblast_of_lung from ENCODE 3 (ENCFF777ODE) Regulation encTfChipPkENCFF930NQQ esphSqEp POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in esophagus_squamous_epithelium from ENCODE 3 (ENCFF930NQQ) Regulation encTfChipPkENCFF691ARB esphSqEp POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in esophagus_squamous_epithelium from ENCODE 3 (ENCFF691ARB) Regulation encTfChipPkENCFF542QLV esphSqEp POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in esophagus_squamous_epithelium from ENCODE 3 (ENCFF542QLV) Regulation encTfChipPkENCFF157FXA esphSqEp POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in esophagus_squamous_epithelium from ENCODE 3 (ENCFF157FXA) Regulation encTfChipPkENCFF505VMB esphSquEpi CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_squamous_epithelium from ENCODE 3 (ENCFF505VMB) Regulation encTfChipPkENCFF661IIS esphSquEpi CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_squamous_epithelium from ENCODE 3 (ENCFF661IIS) Regulation encTfChipPkENCFF350AMQ esphSquEpi CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_squamous_epithelium from ENCODE 3 (ENCFF350AMQ) Regulation encTfChipPkENCFF898JJD esphSquEpi CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_squamous_epithelium from ENCODE 3 (ENCFF898JJD) Regulation encTfChipPkENCFF906CSG esophMscMc POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF906CSG) Regulation encTfChipPkENCFF087RBS esphMscMc EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF087RBS) Regulation encTfChipPkENCFF287SLI esphMscMc EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF287SLI) Regulation encTfChipPkENCFF261OWX esphMscMc EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF261OWX) Regulation encTfChipPkENCFF081YBG esphMscMc EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF081YBG) Regulation encTfChipPkENCFF725FJK esphMscMuc CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF725FJK) Regulation encTfChipPkENCFF373DVN esphMscMuc CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF373DVN) Regulation encTfChipPkENCFF897UFD esphMscMuc CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in esophagus_muscularis_mucosa from ENCODE 3 (ENCFF897UFD) Regulation encTfChipPkENCFF180BYN erythblst GATA1 2 Transcription Factor ChIP-seq Peaks of GATA1 in erythroblast from ENCODE 3 (ENCFF180BYN) Regulation encTfChipPkENCFF789ZAT erythblst GATA1 1 Transcription Factor ChIP-seq Peaks of GATA1 in erythroblast from ENCODE 3 (ENCFF789ZAT) Regulation encTfChipPkENCFF712LFQ prostateEpi CTCF Transcription Factor ChIP-seq Peaks of CTCF in epithelial_cell_of_prostate from ENCODE 3 (ENCFF712LFQ) Regulation encTfChipPkENCFF796AAX esophagEpi CTCF Transcription Factor ChIP-seq Peaks of CTCF in epithelial_cell_of_esophagus from ENCODE 3 (ENCFF796AAX) Regulation encTfChipPkENCFF387VGY umbilVein POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in endothelial_cell_of_umbilical_vein from ENCODE 3 (ENCFF387VGY) Regulation encTfChipPkENCFF987YIJ umbilVein GATA2 Transcription Factor ChIP-seq Peaks of GATA2 in endothelial_cell_of_umbilical_vein from ENCODE 3 (ENCFF987YIJ) Regulation encTfChipPkENCFF327GZX umbilVeinEndo FOS Transcription Factor ChIP-seq Peaks of FOS in endothelial_cell_of_umbilical_vein from ENCODE 3 (ENCFF327GZX) Regulation encTfChipPkENCFF522JCV umbilVenEndo CTCF Transcription Factor ChIP-seq Peaks of CTCF in endothelial_cell_of_umbilical_vein from ENCODE 3 (ENCFF522JCV) Regulation encTfChipPkENCFF136ZAK chorPlexEpi CTCF Transcription Factor ChIP-seq Peaks of CTCF in choroid_plexus_epithelial_cell from ENCODE 3 (ENCFF136ZAK) Regulation encTfChipPkENCFF863ZIN heartMuscl CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in cardiac_muscle_cell from ENCODE 3 (ENCFF863ZIN) Regulation encTfChipPkENCFF301YXM heartMuscl CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in cardiac_muscle_cell from ENCODE 3 (ENCFF301YXM) Regulation encTfChipPkENCFF243AGG heartFibro CTCF Transcription Factor ChIP-seq Peaks of CTCF in cardiac_fibroblast from ENCODE 3 (ENCFF243AGG) Regulation encTfChipPkENCFF607YLT brestEpi POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in breast_epithelium from ENCODE 3 (ENCFF607YLT) Regulation encTfChipPkENCFF294TAI brestEpi POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in breast_epithelium from ENCODE 3 (ENCFF294TAI) Regulation encTfChipPkENCFF906VTL breastEpi EP300 4 Transcription Factor ChIP-seq Peaks of EP300 in breast_epithelium from ENCODE 3 (ENCFF906VTL) Regulation encTfChipPkENCFF614VFU breastEpi EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in breast_epithelium from ENCODE 3 (ENCFF614VFU) Regulation encTfChipPkENCFF757KZD breastEpi EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in breast_epithelium from ENCODE 3 (ENCFF757KZD) Regulation encTfChipPkENCFF978RPI breastEpi EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in breast_epithelium from ENCODE 3 (ENCFF978RPI) Regulation encTfChipPkENCFF113XGW breastEpi CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in breast_epithelium from ENCODE 3 (ENCFF113XGW) Regulation encTfChipPkENCFF167SCX breastEpi CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in breast_epithelium from ENCODE 3 (ENCFF167SCX) Regulation encTfChipPkENCFF338TGS breastEpi CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in breast_epithelium from ENCODE 3 (ENCFF338TGS) Regulation encTfChipPkENCFF427RYJ brainMicEndo CTCF Transcription Factor ChIP-seq Peaks of CTCF in brain_microvascular_endothelial_cell from ENCODE 3 (ENCFF427RYJ) Regulation encTfChipPkENCFF371GSC pancreas POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in body_of_pancreas from ENCODE 3 (ENCFF371GSC) Regulation encTfChipPkENCFF296AFJ pancreas POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in body_of_pancreas from ENCODE 3 (ENCFF296AFJ) Regulation encTfChipPkENCFF306CZZ pancreas POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in body_of_pancreas from ENCODE 3 (ENCFF306CZZ) Regulation encTfChipPkENCFF389ULP pancreas POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in body_of_pancreas from ENCODE 3 (ENCFF389ULP) Regulation encTfChipPkENCFF900GKE pancreas CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in body_of_pancreas from ENCODE 3 (ENCFF900GKE) Regulation encTfChipPkENCFF153EBU pancreas CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in body_of_pancreas from ENCODE 3 (ENCFF153EBU) Regulation encTfChipPkENCFF610UCL pancreas CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in body_of_pancreas from ENCODE 3 (ENCFF610UCL) Regulation encTfChipPkENCFF872XQU pancreas CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in body_of_pancreas from ENCODE 3 (ENCFF872XQU) Regulation encTfChipPkENCFF984VPB biplNeuron ZEB1 Transcription Factor ChIP-seq Peaks of ZEB1 in bipolar_neuron from ENCODE 3 (ENCFF984VPB) Regulation encTfChipPkENCFF482JUI bipNeuron SMARCA4 Transcription Factor ChIP-seq Peaks of SMARCA4 in bipolar_neuron from ENCODE 3 (ENCFF482JUI) Regulation encTfChipPkENCFF203ZIS biplNeuron CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in bipolar_neuron from ENCODE 3 (ENCFF203ZIS) Regulation encTfChipPkENCFF904CNB biplNeuron CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in bipolar_neuron from ENCODE 3 (ENCFF904CNB) Regulation encTfChipPkENCFF600CYD spinlAstrcyt CTCF Transcription Factor ChIP-seq Peaks of CTCF in astrocyte_of_the_spinal_cord from ENCODE 3 (ENCFF600CYD) Regulation encTfChipPkENCFF515KNI cerebAstrcyt CTCF Transcription Factor ChIP-seq Peaks of CTCF in astrocyte_of_the_cerebellum from ENCODE 3 (ENCFF515KNI) Regulation encTfChipPkENCFF148BSH astrocyte CTCF Transcription Factor ChIP-seq Peaks of CTCF in astrocyte from ENCODE 3 (ENCFF148BSH) Regulation encTfChipPkENCFF374MIO ascendAorta CTCF Transcription Factor ChIP-seq Peaks of CTCF in ascending_aorta from ENCODE 3 (ENCFF374MIO) Regulation encTfChipPkENCFF967EOL adrnlGld POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in adrenal_gland from ENCODE 3 (ENCFF967EOL) Regulation encTfChipPkENCFF363GNR adrnlGld POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in adrenal_gland from ENCODE 3 (ENCFF363GNR) Regulation encTfChipPkENCFF412TMX adrenlGlnd CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in adrenal_gland from ENCODE 3 (ENCFF412TMX) Regulation encTfChipPkENCFF574FIL adrenlGlnd CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in adrenal_gland from ENCODE 3 (ENCFF574FIL) Regulation encTfChipPkENCFF174CEI adrenlGlnd CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in adrenal_gland from ENCODE 3 (ENCFF174CEI) Regulation encTfChipPkENCFF114FNT adrenlGlnd CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in adrenal_gland from ENCODE 3 (ENCFF114FNT) Regulation encTfChipPkENCFF447UZC 22Rv1 ZFX Transcription Factor ChIP-seq Peaks of ZFX in 22Rv1 from ENCODE 3 (ENCFF447UZC) Regulation encTfChipPkENCFF147YCW 22Rv1 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in 22Rv1 from ENCODE 3 (ENCFF147YCW) Regulation encTfChipPkENCFF730MQM 22Rv1 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in 22Rv1 from ENCODE 3 (ENCFF730MQM) Regulation encTfChipPkENCFF695MEK WI38 CTCF Transcription Factor ChIP-seq Peaks of CTCF in WI38 from ENCODE 3 (ENCFF695MEK) Regulation encTfChipPkENCFF262ZOT WERI-Rb-1 CTCF Transcription Factor ChIP-seq Peaks of CTCF in WERI-Rb-1 from ENCODE 3 (ENCFF262ZOT) Regulation encTfChipPkENCFF078XBU VCaP CTCF Transcription Factor ChIP-seq Peaks of CTCF in VCaP from ENCODE 3 (ENCFF078XBU) Regulation encTfChipPkENCFF946YUA T47D JUND Transcription Factor ChIP-seq Peaks of JUND in T47D from ENCODE 3 (ENCFF946YUA) Regulation encTfChipPkENCFF574HSR T47D GATA3 Transcription Factor ChIP-seq Peaks of GATA3 in T47D from ENCODE 3 (ENCFF574HSR) Regulation encTfChipPkENCFF420MLJ T47D FOXA1 Transcription Factor ChIP-seq Peaks of FOXA1 in T47D from ENCODE 3 (ENCFF420MLJ) Regulation encTfChipPkENCFF396TFS T47D ESR1 3 Transcription Factor ChIP-seq Peaks of ESR1 in T47D from ENCODE 3 (ENCFF396TFS) Regulation encTfChipPkENCFF637WCT T47D ESR1 2 Transcription Factor ChIP-seq Peaks of ESR1 in T47D from ENCODE 3 (ENCFF637WCT) Regulation encTfChipPkENCFF433NIE T47D ESR1 1 Transcription Factor ChIP-seq Peaks of ESR1 in T47D from ENCODE 3 (ENCFF433NIE) Regulation encTfChipPkENCFF938CRS SU-DHL-6 CTCF Transcription Factor ChIP-seq Peaks of CTCF in SU-DHL-6 from ENCODE 3 (ENCFF938CRS) Regulation encTfChipPkENCFF363UWP SK-N-SH YY1 Transcription Factor ChIP-seq Peaks of YY1 in SK-N-SH from ENCODE 3 (ENCFF363UWP) Regulation encTfChipPkENCFF261PAC SK-N-SH USF2 Transcription Factor ChIP-seq Peaks of USF2 in SK-N-SH from ENCODE 3 (ENCFF261PAC) Regulation encTfChipPkENCFF452RZW SK-N-SH USF1 Transcription Factor ChIP-seq Peaks of USF1 in SK-N-SH from ENCODE 3 (ENCFF452RZW) Regulation encTfChipPkENCFF423CTO SK-N-SH TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in SK-N-SH from ENCODE 3 (ENCFF423CTO) Regulation encTfChipPkENCFF663RUS SK-N-SH SIN3A Transcription Factor ChIP-seq Peaks of SIN3A in SK-N-SH from ENCODE 3 (ENCFF663RUS) Regulation encTfChipPkENCFF502JJJ SK-N-SH RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in SK-N-SH from ENCODE 3 (ENCFF502JJJ) Regulation encTfChipPkENCFF796YFZ SK-N-SH REST 2 Transcription Factor ChIP-seq Peaks of REST in SK-N-SH from ENCODE 3 (ENCFF796YFZ) Regulation encTfChipPkENCFF540FXB SK-N-SH REST 1 Transcription Factor ChIP-seq Peaks of REST in SK-N-SH from ENCODE 3 (ENCFF540FXB) Regulation encTfChipPkENCFF073ADA SK-N-SH RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in SK-N-SH from ENCODE 3 (ENCFF073ADA) Regulation encTfChipPkENCFF557OCR SK-N-SH RAD21 Transcription Factor ChIP-seq Peaks of RAD21 in SK-N-SH from ENCODE 3 (ENCFF557OCR) Regulation encTfChipPkENCFF116RCK SK-N-SH MXI1 Transcription Factor ChIP-seq Peaks of MXI1 in SK-N-SH from ENCODE 3 (ENCFF116RCK) Regulation encTfChipPkENCFF187QQB SK-N-SH JUND 2 Transcription Factor ChIP-seq Peaks of JUND in SK-N-SH from ENCODE 3 (ENCFF187QQB) Regulation encTfChipPkENCFF246HKM SK-N-SH JUND 1 Transcription Factor ChIP-seq Peaks of JUND in SK-N-SH from ENCODE 3 (ENCFF246HKM) Regulation encTfChipPkENCFF917TPE SK-N-SH IRF3 Transcription Factor ChIP-seq Peaks of IRF3 in SK-N-SH from ENCODE 3 (ENCFF917TPE) Regulation encTfChipPkENCFF540DWT SK-N-SH CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in SK-N-SH from ENCODE 3 (ENCFF540DWT) Regulation encTfChipPkENCFF685KTA SK-N-SH CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in SK-N-SH from ENCODE 3 (ENCFF685KTA) Regulation encTfChipPkENCFF049UCF SK-N-SH CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in SK-N-SH from ENCODE 3 (ENCFF049UCF) Regulation encTfChipPkENCFF035WFT SK-N-MC EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in SK-N-MC from ENCODE 3 (ENCFF035WFT) Regulation encTfChipPkENCFF626MUS SH-SY5Y GATA3 Transcription Factor ChIP-seq Peaks of GATA3 in SH-SY5Y from ENCODE 3 (ENCFF626MUS) Regulation encTfChipPkENCFF064YWN SH-SY5Y GATA2 Transcription Factor ChIP-seq Peaks of GATA2 in SH-SY5Y from ENCODE 3 (ENCFF064YWN) Regulation encTfChipPkENCFF798HCA Raji POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in Raji from ENCODE 3 (ENCFF798HCA) Regulation encTfChipPkENCFF855KNL RWPE2 CTCF Transcription Factor ChIP-seq Peaks of CTCF in RWPE2 from ENCODE 3 (ENCFF855KNL) Regulation encTfChipPkENCFF273HTX RWPE1 CTCF Transcription Factor ChIP-seq Peaks of CTCF in RWPE1 from ENCODE 3 (ENCFF273HTX) Regulation encTfChipPkENCFF563GSK PeyrPtch POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in Peyer's_patch from ENCODE 3 (ENCFF563GSK) Regulation encTfChipPkENCFF797OLU PeyrPtch POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in Peyer's_patch from ENCODE 3 (ENCFF797OLU) Regulation encTfChipPkENCFF486UBE PeyerPatch CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in Peyer's_patch from ENCODE 3 (ENCFF486UBE) Regulation encTfChipPkENCFF072UWP PeyerPatch CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in Peyer's_patch from ENCODE 3 (ENCFF072UWP) Regulation encTfChipPkENCFF579XTC PeyerPatch CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in Peyer's_patch from ENCODE 3 (ENCFF579XTC) Regulation encTfChipPkENCFF805FIF PeyerPatch CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in Peyer's_patch from ENCODE 3 (ENCFF805FIF) Regulation encTfChipPkENCFF177UJN parathyAdn CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in Parathyroid_adenoma from ENCODE 3 (ENCFF177UJN) Regulation encTfChipPkENCFF509NRY parathyAdn CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in Parathyroid_adenoma from ENCODE 3 (ENCFF509NRY) Regulation encTfChipPkENCFF171XUS Panc1 TCF7L2 Transcription Factor ChIP-seq Peaks of TCF7L2 in Panc1 from ENCODE 3 (ENCFF171XUS) Regulation encTfChipPkENCFF713ZPE Panc1 REST Transcription Factor ChIP-seq Peaks of REST in Panc1 from ENCODE 3 (ENCFF713ZPE) Regulation encTfChipPkENCFF753HNR Panc1 CTCF Transcription Factor ChIP-seq Peaks of CTCF in Panc1 from ENCODE 3 (ENCFF753HNR) Regulation encTfChipPkENCFF213CYP PFSK-1 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in PFSK-1 from ENCODE 3 (ENCFF213CYP) Regulation encTfChipPkENCFF896RCP PFSK-1 REST Transcription Factor ChIP-seq Peaks of REST in PFSK-1 from ENCODE 3 (ENCFF896RCP) Regulation encTfChipPkENCFF476NAK PC-9 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in PC-9 from ENCODE 3 (ENCFF476NAK) Regulation encTfChipPkENCFF616KNI PC-9 CTCF Transcription Factor ChIP-seq Peaks of CTCF in PC-9 from ENCODE 3 (ENCFF616KNI) Regulation encTfChipPkENCFF702LEL PC-3 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in PC-3 from ENCODE 3 (ENCFF702LEL) Regulation encTfChipPkENCFF232FXZ PC-3 CTCF Transcription Factor ChIP-seq Peaks of CTCF in PC-3 from ENCODE 3 (ENCFF232FXZ) Regulation encTfChipPkENCFF186NOM OCI-LY7 CTCF Transcription Factor ChIP-seq Peaks of CTCF in OCI-LY7 from ENCODE 3 (ENCFF186NOM) Regulation encTfChipPkENCFF588MSD OCI-LY3 CTCF Transcription Factor ChIP-seq Peaks of CTCF in OCI-LY3 from ENCODE 3 (ENCFF588MSD) Regulation encTfChipPkENCFF520VKN OCI-LY1 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in OCI-LY1 from ENCODE 3 (ENCFF520VKN) Regulation encTfChipPkENCFF713PIC OCI-LY1 CTCF Transcription Factor ChIP-seq Peaks of CTCF in OCI-LY1 from ENCODE 3 (ENCFF713PIC) Regulation encTfChipPkENCFF597KMH NT2/D1 ZNF274 Transcription Factor ChIP-seq Peaks of ZNF274 in NT2/D1 from ENCODE 3 (ENCFF597KMH) Regulation encTfChipPkENCFF226OCL NT2/D1 YY1 Transcription Factor ChIP-seq Peaks of YY1 in NT2/D1 from ENCODE 3 (ENCFF226OCL) Regulation encTfChipPkENCFF259KAD NCI-H929 CTCF Transcription Factor ChIP-seq Peaks of CTCF in NCI-H929 from ENCODE 3 (ENCFF259KAD) Regulation encTfChipPkENCFF456PDQ NB4 CTCF Transcription Factor ChIP-seq Peaks of CTCF in NB4 from ENCODE 3 (ENCFF456PDQ) Regulation encTfChipPkENCFF253WCQ MM.1S EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in MM.1S from ENCODE 3 (ENCFF253WCQ) Regulation encTfChipPkENCFF825ZYC MM.1S CTCF Transcription Factor ChIP-seq Peaks of CTCF in MM.1S from ENCODE 3 (ENCFF825ZYC) Regulation encTfChipPkENCFF014OJI MCF_10A STAT3 3 Transcription Factor ChIP-seq Peaks of STAT3 in MCF_10A from ENCODE 3 (ENCFF014OJI) Regulation encTfChipPkENCFF199CQN MCF_10A STAT3 2 Transcription Factor ChIP-seq Peaks of STAT3 in MCF_10A from ENCODE 3 (ENCFF199CQN) Regulation encTfChipPkENCFF854RVF MCF_10A STAT3 1 Transcription Factor ChIP-seq Peaks of STAT3 in MCF_10A from ENCODE 3 (ENCFF854RVF) Regulation encTfChipPkENCFF875HHT MCF_10A POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in MCF_10A from ENCODE 3 (ENCFF875HHT) Regulation encTfChipPkENCFF326DTU MCF_10A POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in MCF_10A from ENCODE 3 (ENCFF326DTU) Regulation encTfChipPkENCFF443CEL MCF_10A MYC Transcription Factor ChIP-seq Peaks of MYC in MCF_10A from ENCODE 3 (ENCFF443CEL) Regulation encTfChipPkENCFF558PJH MCF_10A FOS 4 Transcription Factor ChIP-seq Peaks of FOS in MCF_10A from ENCODE 3 (ENCFF558PJH) Regulation encTfChipPkENCFF222ZHH MCF_10A FOS 3 Transcription Factor ChIP-seq Peaks of FOS in MCF_10A from ENCODE 3 (ENCFF222ZHH) Regulation encTfChipPkENCFF353OBA MCF_10A FOS 2 Transcription Factor ChIP-seq Peaks of FOS in MCF_10A from ENCODE 3 (ENCFF353OBA) Regulation encTfChipPkENCFF436WHK MCF_10A FOS 1 Transcription Factor ChIP-seq Peaks of FOS in MCF_10A from ENCODE 3 (ENCFF436WHK) Regulation encTfChipPkENCFF525RRP MCF-7 ZNF8 Transcription Factor ChIP-seq Peaks of ZNF8 in MCF-7 from ENCODE 3 (ENCFF525RRP) Regulation encTfChipPkENCFF329QYZ MCF-7 ZNF687 Transcription Factor ChIP-seq Peaks of ZNF687 in MCF-7 from ENCODE 3 (ENCFF329QYZ) Regulation encTfChipPkENCFF541HRT MCF-7 ZNF592 2 Transcription Factor ChIP-seq Peaks of ZNF592 in MCF-7 from ENCODE 3 (ENCFF541HRT) Regulation encTfChipPkENCFF720PZA MCF-7 ZNF592 1 Transcription Factor ChIP-seq Peaks of ZNF592 in MCF-7 from ENCODE 3 (ENCFF720PZA) Regulation encTfChipPkENCFF306PBX MCF-7 ZNF579 Transcription Factor ChIP-seq Peaks of ZNF579 in MCF-7 from ENCODE 3 (ENCFF306PBX) Regulation encTfChipPkENCFF290LSS MCF-7 ZNF574 Transcription Factor ChIP-seq Peaks of ZNF574 in MCF-7 from ENCODE 3 (ENCFF290LSS) Regulation encTfChipPkENCFF414EYO MCF-7 ZNF512B 2 Transcription Factor ChIP-seq Peaks of ZNF512B in MCF-7 from ENCODE 3 (ENCFF414EYO) Regulation encTfChipPkENCFF209TEF MCF-7 ZNF512B 1 Transcription Factor ChIP-seq Peaks of ZNF512B in MCF-7 from ENCODE 3 (ENCFF209TEF) Regulation encTfChipPkENCFF675SAG MCF-7 ZNF507 Transcription Factor ChIP-seq Peaks of ZNF507 in MCF-7 from ENCODE 3 (ENCFF675SAG) Regulation encTfChipPkENCFF786XJV MCF-7 ZNF444 Transcription Factor ChIP-seq Peaks of ZNF444 in MCF-7 from ENCODE 3 (ENCFF786XJV) Regulation encTfChipPkENCFF619BFO MCF-7 ZNF24 Transcription Factor ChIP-seq Peaks of ZNF24 in MCF-7 from ENCODE 3 (ENCFF619BFO) Regulation encTfChipPkENCFF246ZMG MCF-7 ZNF217 2 Transcription Factor ChIP-seq Peaks of ZNF217 in MCF-7 from ENCODE 3 (ENCFF246ZMG) Regulation encTfChipPkENCFF620RPM MCF-7 ZNF217 1 Transcription Factor ChIP-seq Peaks of ZNF217 in MCF-7 from ENCODE 3 (ENCFF620RPM) Regulation encTfChipPkENCFF621ZSK MCF-7 ZNF207 Transcription Factor ChIP-seq Peaks of ZNF207 in MCF-7 from ENCODE 3 (ENCFF621ZSK) Regulation encTfChipPkENCFF687REM MCF-7 ZKSCAN1 Transcription Factor ChIP-seq Peaks of ZKSCAN1 in MCF-7 from ENCODE 3 (ENCFF687REM) Regulation encTfChipPkENCFF694ZRC MCF-7 ZHX2 Transcription Factor ChIP-seq Peaks of ZHX2 in MCF-7 from ENCODE 3 (ENCFF694ZRC) Regulation encTfChipPkENCFF775BWJ MCF-7 ZFX Transcription Factor ChIP-seq Peaks of ZFX in MCF-7 from ENCODE 3 (ENCFF775BWJ) Regulation encTfChipPkENCFF794UEM MCF-7 ZBTB7B Transcription Factor ChIP-seq Peaks of ZBTB7B in MCF-7 from ENCODE 3 (ENCFF794UEM) Regulation encTfChipPkENCFF932XEU MCF-7 ZBTB40 Transcription Factor ChIP-seq Peaks of ZBTB40 in MCF-7 from ENCODE 3 (ENCFF932XEU) Regulation encTfChipPkENCFF780WLS MCF-7 ZBTB33 Transcription Factor ChIP-seq Peaks of ZBTB33 in MCF-7 from ENCODE 3 (ENCFF780WLS) Regulation encTfChipPkENCFF496RVC MCF-7 ZBTB11 Transcription Factor ChIP-seq Peaks of ZBTB11 in MCF-7 from ENCODE 3 (ENCFF496RVC) Regulation encTfChipPkENCFF589MVU MCF-7 ZBTB1 Transcription Factor ChIP-seq Peaks of ZBTB1 in MCF-7 from ENCODE 3 (ENCFF589MVU) Regulation encTfChipPkENCFF452VLA MCF-7 TRIM22 Transcription Factor ChIP-seq Peaks of TRIM22 in MCF-7 from ENCODE 3 (ENCFF452VLA) Regulation encTfChipPkENCFF762MGC MCF-7 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in MCF-7 from ENCODE 3 (ENCFF762MGC) Regulation encTfChipPkENCFF258ZVN MCF-7 SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in MCF-7 from ENCODE 3 (ENCFF258ZVN) Regulation encTfChipPkENCFF275WAD MCF-7 SREBF1 Transcription Factor ChIP-seq Peaks of SREBF1 in MCF-7 from ENCODE 3 (ENCFF275WAD) Regulation encTfChipPkENCFF577EMC MCF-7 SP1 Transcription Factor ChIP-seq Peaks of SP1 in MCF-7 from ENCODE 3 (ENCFF577EMC) Regulation encTfChipPkENCFF761NKP MCF-7 SMARCE1 Transcription Factor ChIP-seq Peaks of SMARCE1 in MCF-7 from ENCODE 3 (ENCFF761NKP) Regulation encTfChipPkENCFF618JNX MCF-7 SMARCA5 Transcription Factor ChIP-seq Peaks of SMARCA5 in MCF-7 from ENCODE 3 (ENCFF618JNX) Regulation encTfChipPkENCFF441UHA MCF-7 SIX4 Transcription Factor ChIP-seq Peaks of SIX4 in MCF-7 from ENCODE 3 (ENCFF441UHA) Regulation encTfChipPkENCFF220RUS MCF-7 SIN3A Transcription Factor ChIP-seq Peaks of SIN3A in MCF-7 from ENCODE 3 (ENCFF220RUS) Regulation encTfChipPkENCFF103MPW MCF-7 RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in MCF-7 from ENCODE 3 (ENCFF103MPW) Regulation encTfChipPkENCFF150PTQ MCF-7 RFX1 2 Transcription Factor ChIP-seq Peaks of RFX1 in MCF-7 from ENCODE 3 (ENCFF150PTQ) Regulation encTfChipPkENCFF928YTD MCF-7 RFX1 1 Transcription Factor ChIP-seq Peaks of RFX1 in MCF-7 from ENCODE 3 (ENCFF928YTD) Regulation encTfChipPkENCFF838LXI MCF-7 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in MCF-7 from ENCODE 3 (ENCFF838LXI) Regulation encTfChipPkENCFF091AYX MCF-7 RAD51 Transcription Factor ChIP-seq Peaks of RAD51 in MCF-7 from ENCODE 3 (ENCFF091AYX) Regulation encTfChipPkENCFF964EVA MCF-7 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in MCF-7 from ENCODE 3 (ENCFF964EVA) Regulation encTfChipPkENCFF105PFS MCF-7 PKNOX1 Transcription Factor ChIP-seq Peaks of PKNOX1 in MCF-7 from ENCODE 3 (ENCFF105PFS) Regulation encTfChipPkENCFF473UHQ MCF-7 PAX8 Transcription Factor ChIP-seq Peaks of PAX8 in MCF-7 from ENCODE 3 (ENCFF473UHQ) Regulation encTfChipPkENCFF269RME MCF-7 NRF1 Transcription Factor ChIP-seq Peaks of NRF1 in MCF-7 from ENCODE 3 (ENCFF269RME) Regulation encTfChipPkENCFF927DIO MCF-7 NFXL1 Transcription Factor ChIP-seq Peaks of NFXL1 in MCF-7 from ENCODE 3 (ENCFF927DIO) Regulation encTfChipPkENCFF895MJB MCF-7 NFRKB Transcription Factor ChIP-seq Peaks of NFRKB in MCF-7 from ENCODE 3 (ENCFF895MJB) Regulation encTfChipPkENCFF385WUL MCF-7 NFIB 2 Transcription Factor ChIP-seq Peaks of NFIB in MCF-7 from ENCODE 3 (ENCFF385WUL) Regulation encTfChipPkENCFF519XTN MCF-7 NFIB 1 Transcription Factor ChIP-seq Peaks of NFIB in MCF-7 from ENCODE 3 (ENCFF519XTN) Regulation encTfChipPkENCFF059LJD MCF-7 NEUROD1 Transcription Factor ChIP-seq Peaks of NEUROD1 in MCF-7 from ENCODE 3 (ENCFF059LJD) Regulation encTfChipPkENCFF510UNI MCF-7 NCOA3 2 Transcription Factor ChIP-seq Peaks of NCOA3 in MCF-7 from ENCODE 3 (ENCFF510UNI) Regulation encTfChipPkENCFF320TAN MCF-7 NCOA3 1 Transcription Factor ChIP-seq Peaks of NCOA3 in MCF-7 from ENCODE 3 (ENCFF320TAN) Regulation encTfChipPkENCFF209WRW MCF-7 NBN Transcription Factor ChIP-seq Peaks of NBN in MCF-7 from ENCODE 3 (ENCFF209WRW) Regulation encTfChipPkENCFF370EQJ MCF-7 MYC 3 Transcription Factor ChIP-seq Peaks of MYC in MCF-7 from ENCODE 3 (ENCFF370EQJ) Regulation encTfChipPkENCFF658XME MCF-7 MYC 2 Transcription Factor ChIP-seq Peaks of MYC in MCF-7 from ENCODE 3 (ENCFF658XME) Regulation encTfChipPkENCFF300OKR MCF-7 MYC 1 Transcription Factor ChIP-seq Peaks of MYC in MCF-7 from ENCODE 3 (ENCFF300OKR) Regulation encTfChipPkENCFF083AZM MCF-7 MTA3 Transcription Factor ChIP-seq Peaks of MTA3 in MCF-7 from ENCODE 3 (ENCFF083AZM) Regulation encTfChipPkENCFF180XXZ MCF-7 MTA2 Transcription Factor ChIP-seq Peaks of MTA2 in MCF-7 from ENCODE 3 (ENCFF180XXZ) Regulation encTfChipPkENCFF225VFR MCF-7 MTA1 Transcription Factor ChIP-seq Peaks of MTA1 in MCF-7 from ENCODE 3 (ENCFF225VFR) Regulation encTfChipPkENCFF432GSK MCF-7 MNT 2 Transcription Factor ChIP-seq Peaks of MNT in MCF-7 from ENCODE 3 (ENCFF432GSK) Regulation encTfChipPkENCFF403BWK MCF-7 MNT 1 Transcription Factor ChIP-seq Peaks of MNT in MCF-7 from ENCODE 3 (ENCFF403BWK) Regulation encTfChipPkENCFF578NMN MCF-7 MLLT1 Transcription Factor ChIP-seq Peaks of MLLT1 in MCF-7 from ENCODE 3 (ENCFF578NMN) Regulation encTfChipPkENCFF464QAL MCF-7 MBD2 Transcription Factor ChIP-seq Peaks of MBD2 in MCF-7 from ENCODE 3 (ENCFF464QAL) Regulation encTfChipPkENCFF873SVI MCF-7 MAFK Transcription Factor ChIP-seq Peaks of MAFK in MCF-7 from ENCODE 3 (ENCFF873SVI) Regulation encTfChipPkENCFF569ZCY MCF-7 JUND Transcription Factor ChIP-seq Peaks of JUND in MCF-7 from ENCODE 3 (ENCFF569ZCY) Regulation encTfChipPkENCFF907UNK MCF-7 JUN Transcription Factor ChIP-seq Peaks of JUN in MCF-7 from ENCODE 3 (ENCFF907UNK) Regulation encTfChipPkENCFF708ACK MCF-7 HSF1 Transcription Factor ChIP-seq Peaks of HSF1 in MCF-7 from ENCODE 3 (ENCFF708ACK) Regulation encTfChipPkENCFF144OPN MCF-7 HES1 Transcription Factor ChIP-seq Peaks of HES1 in MCF-7 from ENCODE 3 (ENCFF144OPN) Regulation encTfChipPkENCFF401IAI MCF-7 HCFC1 Transcription Factor ChIP-seq Peaks of HCFC1 in MCF-7 from ENCODE 3 (ENCFF401IAI) Regulation encTfChipPkENCFF046BRP MCF-7 GATAD2B 2 Transcription Factor ChIP-seq Peaks of GATAD2B in MCF-7 from ENCODE 3 (ENCFF046BRP) Regulation encTfChipPkENCFF191SBE MCF-7 GATAD2B 1 Transcription Factor ChIP-seq Peaks of GATAD2B in MCF-7 from ENCODE 3 (ENCFF191SBE) Regulation encTfChipPkENCFF625IUE MCF-7 GATA3 Transcription Factor ChIP-seq Peaks of GATA3 in MCF-7 from ENCODE 3 (ENCFF625IUE) Regulation encTfChipPkENCFF899MQW MCF-7 FOXK2 Transcription Factor ChIP-seq Peaks of FOXK2 in MCF-7 from ENCODE 3 (ENCFF899MQW) Regulation encTfChipPkENCFF160RLI MCF-7 FOXA1 Transcription Factor ChIP-seq Peaks of FOXA1 in MCF-7 from ENCODE 3 (ENCFF160RLI) Regulation encTfChipPkENCFF170POB MCF-7 FOS Transcription Factor ChIP-seq Peaks of FOS in MCF-7 from ENCODE 3 (ENCFF170POB) Regulation encTfChipPkENCFF541DRZ MCF-7 ESRRA 2 Transcription Factor ChIP-seq Peaks of ESRRA in MCF-7 from ENCODE 3 (ENCFF541DRZ) Regulation encTfChipPkENCFF519TRJ MCF-7 ESRRA 1 Transcription Factor ChIP-seq Peaks of ESRRA in MCF-7 from ENCODE 3 (ENCFF519TRJ) Regulation encTfChipPkENCFF408TWV MCF-7 ELK1 Transcription Factor ChIP-seq Peaks of ELK1 in MCF-7 from ENCODE 3 (ENCFF408TWV) Regulation encTfChipPkENCFF020UCD MCF-7 ELF1 Transcription Factor ChIP-seq Peaks of ELF1 in MCF-7 from ENCODE 3 (ENCFF020UCD) Regulation encTfChipPkENCFF347USC MCF-7 E4F1 Transcription Factor ChIP-seq Peaks of E4F1 in MCF-7 from ENCODE 3 (ENCFF347USC) Regulation encTfChipPkENCFF072VGV MCF-7 E2F8 Transcription Factor ChIP-seq Peaks of E2F8 in MCF-7 from ENCODE 3 (ENCFF072VGV) Regulation encTfChipPkENCFF042AWM MCF-7 DPF2 Transcription Factor ChIP-seq Peaks of DPF2 in MCF-7 from ENCODE 3 (ENCFF042AWM) Regulation encTfChipPkENCFF762CDY MCF-7 CUX1 Transcription Factor ChIP-seq Peaks of CUX1 in MCF-7 from ENCODE 3 (ENCFF762CDY) Regulation encTfChipPkENCFF785NTC MCF-7 CTCF 6 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF785NTC) Regulation encTfChipPkENCFF628EUU MCF-7 CTCF 5 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF628EUU) Regulation encTfChipPkENCFF685HMV MCF-7 CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF685HMV) Regulation encTfChipPkENCFF942TCG MCF-7 CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF942TCG) Regulation encTfChipPkENCFF867BUQ MCF-7 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF867BUQ) Regulation encTfChipPkENCFF476DVJ MCF-7 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in MCF-7 from ENCODE 3 (ENCFF476DVJ) Regulation encTfChipPkENCFF456MGR MCF-7 CTBP1 Transcription Factor ChIP-seq Peaks of CTBP1 in MCF-7 from ENCODE 3 (ENCFF456MGR) Regulation encTfChipPkENCFF883LRJ MCF-7 CREB1 2 Transcription Factor ChIP-seq Peaks of CREB1 in MCF-7 from ENCODE 3 (ENCFF883LRJ) Regulation encTfChipPkENCFF495PCJ MCF-7 CREB1 1 Transcription Factor ChIP-seq Peaks of CREB1 in MCF-7 from ENCODE 3 (ENCFF495PCJ) Regulation encTfChipPkENCFF682WFF MCF-7 COPS2 Transcription Factor ChIP-seq Peaks of COPS2 in MCF-7 from ENCODE 3 (ENCFF682WFF) Regulation encTfChipPkENCFF305CRL MCF-7 CLOCK 2 Transcription Factor ChIP-seq Peaks of CLOCK in MCF-7 from ENCODE 3 (ENCFF305CRL) Regulation encTfChipPkENCFF025SMR MCF-7 CLOCK 1 Transcription Factor ChIP-seq Peaks of CLOCK in MCF-7 from ENCODE 3 (ENCFF025SMR) Regulation encTfChipPkENCFF730UAD MCF-7 CHD1 Transcription Factor ChIP-seq Peaks of CHD1 in MCF-7 from ENCODE 3 (ENCFF730UAD) Regulation encTfChipPkENCFF414LXZ MCF-7 BMI1 Transcription Factor ChIP-seq Peaks of BMI1 in MCF-7 from ENCODE 3 (ENCFF414LXZ) Regulation encTfChipPkENCFF760ZVI MCF-7 ATF7 Transcription Factor ChIP-seq Peaks of ATF7 in MCF-7 from ENCODE 3 (ENCFF760ZVI) Regulation encTfChipPkENCFF618NVV MCF-7 ARID3A Transcription Factor ChIP-seq Peaks of ARID3A in MCF-7 from ENCODE 3 (ENCFF618NVV) Regulation encTfChipPkENCFF707BQD Loucy CTCF Transcription Factor ChIP-seq Peaks of CTCF in Loucy from ENCODE 3 (ENCFF707BQD) Regulation encTfChipPkENCFF670NSE LNCaP_FGC CTCF Transcription Factor ChIP-seq Peaks of CTCF in LNCaP_clone_FGC from ENCODE 3 (ENCFF670NSE) Regulation encTfChipPkENCFF501SHB LNCAP CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in LNCAP from ENCODE 3 (ENCFF501SHB) Regulation encTfChipPkENCFF850DQJ LNCAP CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in LNCAP from ENCODE 3 (ENCFF850DQJ) Regulation encTfChipPkENCFF649QKE KMS-11 CTCF Transcription Factor ChIP-seq Peaks of CTCF in KMS-11 from ENCODE 3 (ENCFF649QKE) Regulation encTfChipPkENCFF945HJR K562 ZZZ3 Transcription Factor ChIP-seq Peaks of ZZZ3 in K562 from ENCODE 3 (ENCFF945HJR) Regulation encTfChipPkENCFF979GFF K562 ZSCAN29 2 Transcription Factor ChIP-seq Peaks of ZSCAN29 in K562 from ENCODE 3 (ENCFF979GFF) Regulation encTfChipPkENCFF908ZLN K562 ZSCAN29 1 Transcription Factor ChIP-seq Peaks of ZSCAN29 in K562 from ENCODE 3 (ENCFF908ZLN) Regulation encTfChipPkENCFF979NKM K562 ZNF830 2 Transcription Factor ChIP-seq Peaks of ZNF830 in K562 from ENCODE 3 (ENCFF979NKM) Regulation encTfChipPkENCFF951OSW K562 ZNF830 1 Transcription Factor ChIP-seq Peaks of ZNF830 in K562 from ENCODE 3 (ENCFF951OSW) Regulation encTfChipPkENCFF008JJE K562 ZNF639 2 Transcription Factor ChIP-seq Peaks of ZNF639 in K562 from ENCODE 3 (ENCFF008JJE) Regulation encTfChipPkENCFF404EVY K562 ZNF639 1 Transcription Factor ChIP-seq Peaks of ZNF639 in K562 from ENCODE 3 (ENCFF404EVY) Regulation encTfChipPkENCFF972UGK K562 ZNF592 Transcription Factor ChIP-seq Peaks of ZNF592 in K562 from ENCODE 3 (ENCFF972UGK) Regulation encTfChipPkENCFF538GSS K562 ZNF407 2 Transcription Factor ChIP-seq Peaks of ZNF407 in K562 from ENCODE 3 (ENCFF538GSS) Regulation encTfChipPkENCFF644XES K562 ZNF407 1 Transcription Factor ChIP-seq Peaks of ZNF407 in K562 from ENCODE 3 (ENCFF644XES) Regulation encTfChipPkENCFF106YXG K562 ZNF384 Transcription Factor ChIP-seq Peaks of ZNF384 in K562 from ENCODE 3 (ENCFF106YXG) Regulation encTfChipPkENCFF082RIZ K562 ZNF318 2 Transcription Factor ChIP-seq Peaks of ZNF318 in K562 from ENCODE 3 (ENCFF082RIZ) Regulation encTfChipPkENCFF577LQR K562 ZNF318 1 Transcription Factor ChIP-seq Peaks of ZNF318 in K562 from ENCODE 3 (ENCFF577LQR) Regulation encTfChipPkENCFF056SEM K562 ZNF316 2 Transcription Factor ChIP-seq Peaks of ZNF316 in K562 from ENCODE 3 (ENCFF056SEM) Regulation encTfChipPkENCFF806GUF K562 ZNF316 1 Transcription Factor ChIP-seq Peaks of ZNF316 in K562 from ENCODE 3 (ENCFF806GUF) Regulation encTfChipPkENCFF596JDS K562 ZNF282 Transcription Factor ChIP-seq Peaks of ZNF282 in K562 from ENCODE 3 (ENCFF596JDS) Regulation encTfChipPkENCFF074WRG K562 ZNF280A Transcription Factor ChIP-seq Peaks of ZNF280A in K562 from ENCODE 3 (ENCFF074WRG) Regulation encTfChipPkENCFF498VQZ K562 ZNF274 2 Transcription Factor ChIP-seq Peaks of ZNF274 in K562 from ENCODE 3 (ENCFF498VQZ) Regulation encTfChipPkENCFF323AWS K562 ZNF274 1 Transcription Factor ChIP-seq Peaks of ZNF274 in K562 from ENCODE 3 (ENCFF323AWS) Regulation encTfChipPkENCFF260CBQ K562 ZNF24 3 Transcription Factor ChIP-seq Peaks of ZNF24 in K562 from ENCODE 3 (ENCFF260CBQ) Regulation encTfChipPkENCFF723JDW K562 ZNF24 2 Transcription Factor ChIP-seq Peaks of ZNF24 in K562 from ENCODE 3 (ENCFF723JDW) Regulation encTfChipPkENCFF007EEV K562 ZNF24 1 Transcription Factor ChIP-seq Peaks of ZNF24 in K562 from ENCODE 3 (ENCFF007EEV) Regulation encTfChipPkENCFF760EPB K562 ZNF184 2 Transcription Factor ChIP-seq Peaks of ZNF184 in K562 from ENCODE 3 (ENCFF760EPB) Regulation encTfChipPkENCFF855CUN K562 ZNF184 1 Transcription Factor ChIP-seq Peaks of ZNF184 in K562 from ENCODE 3 (ENCFF855CUN) Regulation encTfChipPkENCFF700GZI K562 ZNF143 Transcription Factor ChIP-seq Peaks of ZNF143 in K562 from ENCODE 3 (ENCFF700GZI) Regulation encTfChipPkENCFF195IFB K562 ZMYM3 Transcription Factor ChIP-seq Peaks of ZMYM3 in K562 from ENCODE 3 (ENCFF195IFB) Regulation encTfChipPkENCFF526PMI K562 ZMIZ1 Transcription Factor ChIP-seq Peaks of ZMIZ1 in K562 from ENCODE 3 (ENCFF526PMI) Regulation encTfChipPkENCFF704VDI K562 ZKSCAN1 Transcription Factor ChIP-seq Peaks of ZKSCAN1 in K562 from ENCODE 3 (ENCFF704VDI) Regulation encTfChipPkENCFF495BPY K562 ZHX1 Transcription Factor ChIP-seq Peaks of ZHX1 in K562 from ENCODE 3 (ENCFF495BPY) Regulation encTfChipPkENCFF150ZBH K562 ZFP91 Transcription Factor ChIP-seq Peaks of ZFP91 in K562 from ENCODE 3 (ENCFF150ZBH) Regulation encTfChipPkENCFF553KIK K562 ZEB2 2 Transcription Factor ChIP-seq Peaks of ZEB2 in K562 from ENCODE 3 (ENCFF553KIK) Regulation encTfChipPkENCFF808NWU K562 ZEB2 1 Transcription Factor ChIP-seq Peaks of ZEB2 in K562 from ENCODE 3 (ENCFF808NWU) Regulation encTfChipPkENCFF328SSL K562 ZBTB8A Transcription Factor ChIP-seq Peaks of ZBTB8A in K562 from ENCODE 3 (ENCFF328SSL) Regulation encTfChipPkENCFF245LRG K562 ZBTB7A Transcription Factor ChIP-seq Peaks of ZBTB7A in K562 from ENCODE 3 (ENCFF245LRG) Regulation encTfChipPkENCFF813GMP K562 ZBTB5 2 Transcription Factor ChIP-seq Peaks of ZBTB5 in K562 from ENCODE 3 (ENCFF813GMP) Regulation encTfChipPkENCFF014KUI K562 ZBTB5 1 Transcription Factor ChIP-seq Peaks of ZBTB5 in K562 from ENCODE 3 (ENCFF014KUI) Regulation encTfChipPkENCFF088LZZ K562 ZBTB40 Transcription Factor ChIP-seq Peaks of ZBTB40 in K562 from ENCODE 3 (ENCFF088LZZ) Regulation encTfChipPkENCFF556STK K562 ZBTB33 Transcription Factor ChIP-seq Peaks of ZBTB33 in K562 from ENCODE 3 (ENCFF556STK) Regulation encTfChipPkENCFF189WAO K562 ZBTB2 Transcription Factor ChIP-seq Peaks of ZBTB2 in K562 from ENCODE 3 (ENCFF189WAO) Regulation encTfChipPkENCFF913HCQ K562 ZBTB11 Transcription Factor ChIP-seq Peaks of ZBTB11 in K562 from ENCODE 3 (ENCFF913HCQ) Regulation encTfChipPkENCFF388TYU K562 ZBED1 Transcription Factor ChIP-seq Peaks of ZBED1 in K562 from ENCODE 3 (ENCFF388TYU) Regulation encTfChipPkENCFF635XCI K562 YY1 2 Transcription Factor ChIP-seq Peaks of YY1 in K562 from ENCODE 3 (ENCFF635XCI) Regulation encTfChipPkENCFF024TJO K562 YY1 1 Transcription Factor ChIP-seq Peaks of YY1 in K562 from ENCODE 3 (ENCFF024TJO) Regulation encTfChipPkENCFF929TWP K562 XRCC5 Transcription Factor ChIP-seq Peaks of XRCC5 in K562 from ENCODE 3 (ENCFF929TWP) Regulation encTfChipPkENCFF115PGE K562 XRCC3 Transcription Factor ChIP-seq Peaks of XRCC3 in K562 from ENCODE 3 (ENCFF115PGE) Regulation encTfChipPkENCFF157ZQI K562 WHSC1 Transcription Factor ChIP-seq Peaks of WHSC1 in K562 from ENCODE 3 (ENCFF157ZQI) Regulation encTfChipPkENCFF425FVY K562 USF2 Transcription Factor ChIP-seq Peaks of USF2 in K562 from ENCODE 3 (ENCFF425FVY) Regulation encTfChipPkENCFF403TAF K562 UBTF 2 Transcription Factor ChIP-seq Peaks of UBTF in K562 from ENCODE 3 (ENCFF403TAF) Regulation encTfChipPkENCFF345RRM K562 UBTF 1 Transcription Factor ChIP-seq Peaks of UBTF in K562 from ENCODE 3 (ENCFF345RRM) Regulation encTfChipPkENCFF134HBP K562 U2AF2 Transcription Factor ChIP-seq Peaks of U2AF2 in K562 from ENCODE 3 (ENCFF134HBP) Regulation encTfChipPkENCFF482DRO K562 U2AF1 Transcription Factor ChIP-seq Peaks of U2AF1 in K562 from ENCODE 3 (ENCFF482DRO) Regulation encTfChipPkENCFF534VQL K562 TRIP13 Transcription Factor ChIP-seq Peaks of TRIP13 in K562 from ENCODE 3 (ENCFF534VQL) Regulation encTfChipPkENCFF623ELO K562 TRIM28 3 Transcription Factor ChIP-seq Peaks of TRIM28 in K562 from ENCODE 3 (ENCFF623ELO) Regulation encTfChipPkENCFF996AMX K562 TRIM28 2 Transcription Factor ChIP-seq Peaks of TRIM28 in K562 from ENCODE 3 (ENCFF996AMX) Regulation encTfChipPkENCFF168KHS K562 TRIM28 1 Transcription Factor ChIP-seq Peaks of TRIM28 in K562 from ENCODE 3 (ENCFF168KHS) Regulation encTfChipPkENCFF950TOJ K562 TRIM24 2 Transcription Factor ChIP-seq Peaks of TRIM24 in K562 from ENCODE 3 (ENCFF950TOJ) Regulation encTfChipPkENCFF063NXI K562 TRIM24 1 Transcription Factor ChIP-seq Peaks of TRIM24 in K562 from ENCODE 3 (ENCFF063NXI) Regulation encTfChipPkENCFF309DMZ K562 THRA Transcription Factor ChIP-seq Peaks of THRA in K562 from ENCODE 3 (ENCFF309DMZ) Regulation encTfChipPkENCFF130TPD K562 THAP1 Transcription Factor ChIP-seq Peaks of THAP1 in K562 from ENCODE 3 (ENCFF130TPD) Regulation encTfChipPkENCFF547MLB K562 TEAD4 Transcription Factor ChIP-seq Peaks of TEAD4 in K562 from ENCODE 3 (ENCFF547MLB) Regulation encTfChipPkENCFF512IAI K562 TCF7 Transcription Factor ChIP-seq Peaks of TCF7 in K562 from ENCODE 3 (ENCFF512IAI) Regulation encTfChipPkENCFF912LXU K562 TCF12 2 Transcription Factor ChIP-seq Peaks of TCF12 in K562 from ENCODE 3 (ENCFF912LXU) Regulation encTfChipPkENCFF952JIK K562 TCF12 1 Transcription Factor ChIP-seq Peaks of TCF12 in K562 from ENCODE 3 (ENCFF952JIK) Regulation encTfChipPkENCFF370YGS K562 TBP Transcription Factor ChIP-seq Peaks of TBP in K562 from ENCODE 3 (ENCFF370YGS) Regulation encTfChipPkENCFF239WFN K562 TBL1XR1 2 Transcription Factor ChIP-seq Peaks of TBL1XR1 in K562 from ENCODE 3 (ENCFF239WFN) Regulation encTfChipPkENCFF868SWL K562 TBL1XR1 1 Transcription Factor ChIP-seq Peaks of TBL1XR1 in K562 from ENCODE 3 (ENCFF868SWL) Regulation encTfChipPkENCFF475LFH K562 TAL1 2 Transcription Factor ChIP-seq Peaks of TAL1 in K562 from ENCODE 3 (ENCFF475LFH) Regulation encTfChipPkENCFF078OUD K562 TAL1 1 Transcription Factor ChIP-seq Peaks of TAL1 in K562 from ENCODE 3 (ENCFF078OUD) Regulation encTfChipPkENCFF223HDM K562 TAF9B Transcription Factor ChIP-seq Peaks of TAF9B in K562 from ENCODE 3 (ENCFF223HDM) Regulation encTfChipPkENCFF852NOL K562 TAF7 Transcription Factor ChIP-seq Peaks of TAF7 in K562 from ENCODE 3 (ENCFF852NOL) Regulation encTfChipPkENCFF710LLF K562 TAF15 Transcription Factor ChIP-seq Peaks of TAF15 in K562 from ENCODE 3 (ENCFF710LLF) Regulation encTfChipPkENCFF856HYC K562 SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in K562 from ENCODE 3 (ENCFF856HYC) Regulation encTfChipPkENCFF517IXK K562 STAT5A Transcription Factor ChIP-seq Peaks of STAT5A in K562 from ENCODE 3 (ENCFF517IXK) Regulation encTfChipPkENCFF204VQS K562 STAT2 Transcription Factor ChIP-seq Peaks of STAT2 in K562 from ENCODE 3 (ENCFF204VQS) Regulation encTfChipPkENCFF431NLF K562 STAT1 3 Transcription Factor ChIP-seq Peaks of STAT1 in K562 from ENCODE 3 (ENCFF431NLF) Regulation encTfChipPkENCFF747ICD K562 STAT1 2 Transcription Factor ChIP-seq Peaks of STAT1 in K562 from ENCODE 3 (ENCFF747ICD) Regulation encTfChipPkENCFF646MXG K562 STAT1 1 Transcription Factor ChIP-seq Peaks of STAT1 in K562 from ENCODE 3 (ENCFF646MXG) Regulation encTfChipPkENCFF217HAW K562 SRSF9 Transcription Factor ChIP-seq Peaks of SRSF9 in K562 from ENCODE 3 (ENCFF217HAW) Regulation encTfChipPkENCFF550VUN K562 SRSF7 Transcription Factor ChIP-seq Peaks of SRSF7 in K562 from ENCODE 3 (ENCFF550VUN) Regulation encTfChipPkENCFF777MYW K562 SREBF1 Transcription Factor ChIP-seq Peaks of SREBF1 in K562 from ENCODE 3 (ENCFF777MYW) Regulation encTfChipPkENCFF452LDK K562 SP1 Transcription Factor ChIP-seq Peaks of SP1 in K562 from ENCODE 3 (ENCFF452LDK) Regulation encTfChipPkENCFF431STY K562 SOX6 Transcription Factor ChIP-seq Peaks of SOX6 in K562 from ENCODE 3 (ENCFF431STY) Regulation encTfChipPkENCFF206MJS K562 SNRNP70 Transcription Factor ChIP-seq Peaks of SNRNP70 in K562 from ENCODE 3 (ENCFF206MJS) Regulation encTfChipPkENCFF175UEE K562 SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in K562 from ENCODE 3 (ENCFF175UEE) Regulation encTfChipPkENCFF435SZS K562 SMARCE1 Transcription Factor ChIP-seq Peaks of SMARCE1 in K562 from ENCODE 3 (ENCFF435SZS) Regulation encTfChipPkENCFF751ZVX K562 SMARCC2 Transcription Factor ChIP-seq Peaks of SMARCC2 in K562 from ENCODE 3 (ENCFF751ZVX) Regulation encTfChipPkENCFF308QHX K562 SMARCB1 Transcription Factor ChIP-seq Peaks of SMARCB1 in K562 from ENCODE 3 (ENCFF308QHX) Regulation encTfChipPkENCFF481TNF K562 SMARCA5 Transcription Factor ChIP-seq Peaks of SMARCA5 in K562 from ENCODE 3 (ENCFF481TNF) Regulation encTfChipPkENCFF361RWX K562 SMARCA4 3 Transcription Factor ChIP-seq Peaks of SMARCA4 in K562 from ENCODE 3 (ENCFF361RWX) Regulation encTfChipPkENCFF868UOJ K562 SMARCA4 2 Transcription Factor ChIP-seq Peaks of SMARCA4 in K562 from ENCODE 3 (ENCFF868UOJ) Regulation encTfChipPkENCFF703NAE K562 SMARCA4 1 Transcription Factor ChIP-seq Peaks of SMARCA4 in K562 from ENCODE 3 (ENCFF703NAE) Regulation encTfChipPkENCFF069AAY K562 SMAD5 Transcription Factor ChIP-seq Peaks of SMAD5 in K562 from ENCODE 3 (ENCFF069AAY) Regulation encTfChipPkENCFF186MFI K562 SMAD2 Transcription Factor ChIP-seq Peaks of SMAD2 in K562 from ENCODE 3 (ENCFF186MFI) Regulation encTfChipPkENCFF084BUP K562 SMAD1 Transcription Factor ChIP-seq Peaks of SMAD1 in K562 from ENCODE 3 (ENCFF084BUP) Regulation encTfChipPkENCFF254QDM K562 SKIL Transcription Factor ChIP-seq Peaks of SKIL in K562 from ENCODE 3 (ENCFF254QDM) Regulation encTfChipPkENCFF247LOF K562 SIX5 Transcription Factor ChIP-seq Peaks of SIX5 in K562 from ENCODE 3 (ENCFF247LOF) Regulation encTfChipPkENCFF747XDN K562 SIRT6 Transcription Factor ChIP-seq Peaks of SIRT6 in K562 from ENCODE 3 (ENCFF747XDN) Regulation encTfChipPkENCFF543INR K562 SIN3B Transcription Factor ChIP-seq Peaks of SIN3B in K562 from ENCODE 3 (ENCFF543INR) Regulation encTfChipPkENCFF802JAN K562 SIN3A Transcription Factor ChIP-seq Peaks of SIN3A in K562 from ENCODE 3 (ENCFF802JAN) Regulation encTfChipPkENCFF690WNQ K562 SETDB1 Transcription Factor ChIP-seq Peaks of SETDB1 in K562 from ENCODE 3 (ENCFF690WNQ) Regulation encTfChipPkENCFF103RHL K562 SAP30 Transcription Factor ChIP-seq Peaks of SAP30 in K562 from ENCODE 3 (ENCFF103RHL) Regulation encTfChipPkENCFF087DKT K562 SAFB2 Transcription Factor ChIP-seq Peaks of SAFB2 in K562 from ENCODE 3 (ENCFF087DKT) Regulation encTfChipPkENCFF411YVY K562 SAFB Transcription Factor ChIP-seq Peaks of SAFB in K562 from ENCODE 3 (ENCFF411YVY) Regulation encTfChipPkENCFF091MQJ K562 RUNX1 2 Transcription Factor ChIP-seq Peaks of RUNX1 in K562 from ENCODE 3 (ENCFF091MQJ) Regulation encTfChipPkENCFF545WXN K562 RUNX1 1 Transcription Factor ChIP-seq Peaks of RUNX1 in K562 from ENCODE 3 (ENCFF545WXN) Regulation encTfChipPkENCFF462AZY K562 RNF2 4 Transcription Factor ChIP-seq Peaks of RNF2 in K562 from ENCODE 3 (ENCFF462AZY) Regulation encTfChipPkENCFF741CLJ K562 RNF2 3 Transcription Factor ChIP-seq Peaks of RNF2 in K562 from ENCODE 3 (ENCFF741CLJ) Regulation encTfChipPkENCFF820LKT K562 RNF2 2 Transcription Factor ChIP-seq Peaks of RNF2 in K562 from ENCODE 3 (ENCFF820LKT) Regulation encTfChipPkENCFF349MSP K562 RNF2 1 Transcription Factor ChIP-seq Peaks of RNF2 in K562 from ENCODE 3 (ENCFF349MSP) Regulation encTfChipPkENCFF599CBB K562 RLF Transcription Factor ChIP-seq Peaks of RLF in K562 from ENCODE 3 (ENCFF599CBB) Regulation encTfChipPkENCFF201YKU K562 RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in K562 from ENCODE 3 (ENCFF201YKU) Regulation encTfChipPkENCFF193PVX K562 RFX1 2 Transcription Factor ChIP-seq Peaks of RFX1 in K562 from ENCODE 3 (ENCFF193PVX) Regulation encTfChipPkENCFF905GXS K562 RFX1 1 Transcription Factor ChIP-seq Peaks of RFX1 in K562 from ENCODE 3 (ENCFF905GXS) Regulation encTfChipPkENCFF023ZUW K562 REST 2 Transcription Factor ChIP-seq Peaks of REST in K562 from ENCODE 3 (ENCFF023ZUW) Regulation encTfChipPkENCFF290ESJ K562 REST 1 Transcription Factor ChIP-seq Peaks of REST in K562 from ENCODE 3 (ENCFF290ESJ) Regulation encTfChipPkENCFF968SUH K562 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in K562 from ENCODE 3 (ENCFF968SUH) Regulation encTfChipPkENCFF503DIK K562 RBM39 Transcription Factor ChIP-seq Peaks of RBM39 in K562 from ENCODE 3 (ENCFF503DIK) Regulation encTfChipPkENCFF670ILH K562 RBM34 Transcription Factor ChIP-seq Peaks of RBM34 in K562 from ENCODE 3 (ENCFF670ILH) Regulation encTfChipPkENCFF102XVH K562 RBM25 Transcription Factor ChIP-seq Peaks of RBM25 in K562 from ENCODE 3 (ENCFF102XVH) Regulation encTfChipPkENCFF420IBN K562 RBM22 Transcription Factor ChIP-seq Peaks of RBM22 in K562 from ENCODE 3 (ENCFF420IBN) Regulation encTfChipPkENCFF056OIG K562 RBM17 Transcription Factor ChIP-seq Peaks of RBM17 in K562 from ENCODE 3 (ENCFF056OIG) Regulation encTfChipPkENCFF563WDZ K562 RBM15 Transcription Factor ChIP-seq Peaks of RBM15 in K562 from ENCODE 3 (ENCFF563WDZ) Regulation encTfChipPkENCFF320YOI K562 RBM14 Transcription Factor ChIP-seq Peaks of RBM14 in K562 from ENCODE 3 (ENCFF320YOI) Regulation encTfChipPkENCFF232ASB K562 RBFOX2 Transcription Factor ChIP-seq Peaks of RBFOX2 in K562 from ENCODE 3 (ENCFF232ASB) Regulation encTfChipPkENCFF328QZM K562 RB1 Transcription Factor ChIP-seq Peaks of RB1 in K562 from ENCODE 3 (ENCFF328QZM) Regulation encTfChipPkENCFF740OPF K562 RAD51 Transcription Factor ChIP-seq Peaks of RAD51 in K562 from ENCODE 3 (ENCFF740OPF) Regulation encTfChipPkENCFF442XXV K562 PYGO2 Transcription Factor ChIP-seq Peaks of PYGO2 in K562 from ENCODE 3 (ENCFF442XXV) Regulation encTfChipPkENCFF917HXV K562 PTBP1 Transcription Factor ChIP-seq Peaks of PTBP1 in K562 from ENCODE 3 (ENCFF917HXV) Regulation encTfChipPkENCFF417RQZ K562 PRPF4 Transcription Factor ChIP-seq Peaks of PRPF4 in K562 from ENCODE 3 (ENCFF417RQZ) Regulation encTfChipPkENCFF600HPZ K562 PRDM10 Transcription Factor ChIP-seq Peaks of PRDM10 in K562 from ENCODE 3 (ENCFF600HPZ) Regulation encTfChipPkENCFF283CUY K562 POLR2G Transcription Factor ChIP-seq Peaks of POLR2G in K562 from ENCODE 3 (ENCFF283CUY) Regulation encTfChipPkENCFF285MBX K562 POLR2A 7 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF285MBX) Regulation encTfChipPkENCFF668VIK K562 POLR2A 6 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF668VIK) Regulation encTfChipPkENCFF881ONC K562 POLR2A 5 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF881ONC) Regulation encTfChipPkENCFF099NYA K562 POLR2A 4 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF099NYA) Regulation encTfChipPkENCFF730DLS K562 POLR2A 3 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF730DLS) Regulation encTfChipPkENCFF741JES K562 POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF741JES) Regulation encTfChipPkENCFF182YZG K562 POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in K562 from ENCODE 3 (ENCFF182YZG) Regulation encTfChipPkENCFF800QDU K562 PML Transcription Factor ChIP-seq Peaks of PML in K562 from ENCODE 3 (ENCFF800QDU) Regulation encTfChipPkENCFF062VBB K562 PKNOX1 Transcription Factor ChIP-seq Peaks of PKNOX1 in K562 from ENCODE 3 (ENCFF062VBB) Regulation encTfChipPkENCFF952YDR K562 PHF8 Transcription Factor ChIP-seq Peaks of PHF8 in K562 from ENCODE 3 (ENCFF952YDR) Regulation encTfChipPkENCFF657UVA K562 PHF21A Transcription Factor ChIP-seq Peaks of PHF21A in K562 from ENCODE 3 (ENCFF657UVA) Regulation encTfChipPkENCFF259HUS K562 PHF20 Transcription Factor ChIP-seq Peaks of PHF20 in K562 from ENCODE 3 (ENCFF259HUS) Regulation encTfChipPkENCFF988OXX K562 PHB2 Transcription Factor ChIP-seq Peaks of PHB2 in K562 from ENCODE 3 (ENCFF988OXX) Regulation encTfChipPkENCFF941XZW K562 PCBP2 Transcription Factor ChIP-seq Peaks of PCBP2 in K562 from ENCODE 3 (ENCFF941XZW) Regulation encTfChipPkENCFF467RYH K562 PCBP1 Transcription Factor ChIP-seq Peaks of PCBP1 in K562 from ENCODE 3 (ENCFF467RYH) Regulation encTfChipPkENCFF885JMZ K562 NUFIP1 Transcription Factor ChIP-seq Peaks of NUFIP1 in K562 from ENCODE 3 (ENCFF885JMZ) Regulation encTfChipPkENCFF782YFS K562 NRF1 3 Transcription Factor ChIP-seq Peaks of NRF1 in K562 from ENCODE 3 (ENCFF782YFS) Regulation encTfChipPkENCFF626VDA K562 NRF1 2 Transcription Factor ChIP-seq Peaks of NRF1 in K562 from ENCODE 3 (ENCFF626VDA) Regulation encTfChipPkENCFF543STN K562 NRF1 1 Transcription Factor ChIP-seq Peaks of NRF1 in K562 from ENCODE 3 (ENCFF543STN) Regulation encTfChipPkENCFF315MUH K562 NR3C1 2 Transcription Factor ChIP-seq Peaks of NR3C1 in K562 from ENCODE 3 (ENCFF315MUH) Regulation encTfChipPkENCFF821YMC K562 NR3C1 1 Transcription Factor ChIP-seq Peaks of NR3C1 in K562 from ENCODE 3 (ENCFF821YMC) Regulation encTfChipPkENCFF194VBK K562 NR2F6 Transcription Factor ChIP-seq Peaks of NR2F6 in K562 from ENCODE 3 (ENCFF194VBK) Regulation encTfChipPkENCFF118HUH K562 NR2F2 Transcription Factor ChIP-seq Peaks of NR2F2 in K562 from ENCODE 3 (ENCFF118HUH) Regulation encTfChipPkENCFF363IQN K562 NR2F1 Transcription Factor ChIP-seq Peaks of NR2F1 in K562 from ENCODE 3 (ENCFF363IQN) Regulation encTfChipPkENCFF791ZPU K562 NR2C2 Transcription Factor ChIP-seq Peaks of NR2C2 in K562 from ENCODE 3 (ENCFF791ZPU) Regulation encTfChipPkENCFF023XHV K562 NR2C1 Transcription Factor ChIP-seq Peaks of NR2C1 in K562 from ENCODE 3 (ENCFF023XHV) Regulation encTfChipPkENCFF305OOU K562 NR0B1 Transcription Factor ChIP-seq Peaks of NR0B1 in K562 from ENCODE 3 (ENCFF305OOU) Regulation encTfChipPkENCFF329STX K562 NFXL1 Transcription Factor ChIP-seq Peaks of NFXL1 in K562 from ENCODE 3 (ENCFF329STX) Regulation encTfChipPkENCFF779KIS K562 NFRKB 2 Transcription Factor ChIP-seq Peaks of NFRKB in K562 from ENCODE 3 (ENCFF779KIS) Regulation encTfChipPkENCFF158FUG K562 NFRKB 1 Transcription Factor ChIP-seq Peaks of NFRKB in K562 from ENCODE 3 (ENCFF158FUG) Regulation encTfChipPkENCFF092TVM K562 NFIC Transcription Factor ChIP-seq Peaks of NFIC in K562 from ENCODE 3 (ENCFF092TVM) Regulation encTfChipPkENCFF312XHI K562 NFE2 Transcription Factor ChIP-seq Peaks of NFE2 in K562 from ENCODE 3 (ENCFF312XHI) Regulation encTfChipPkENCFF430JFH K562 NFATC3 2 Transcription Factor ChIP-seq Peaks of NFATC3 in K562 from ENCODE 3 (ENCFF430JFH) Regulation encTfChipPkENCFF082EPO K562 NFATC3 1 Transcription Factor ChIP-seq Peaks of NFATC3 in K562 from ENCODE 3 (ENCFF082EPO) Regulation encTfChipPkENCFF755APC K562 NEUROD1 Transcription Factor ChIP-seq Peaks of NEUROD1 in K562 from ENCODE 3 (ENCFF755APC) Regulation encTfChipPkENCFF638IIC K562 NCOR1 3 Transcription Factor ChIP-seq Peaks of NCOR1 in K562 from ENCODE 3 (ENCFF638IIC) Regulation encTfChipPkENCFF007ZUL K562 NCOR1 2 Transcription Factor ChIP-seq Peaks of NCOR1 in K562 from ENCODE 3 (ENCFF007ZUL) Regulation encTfChipPkENCFF856HUK K562 NCOR1 1 Transcription Factor ChIP-seq Peaks of NCOR1 in K562 from ENCODE 3 (ENCFF856HUK) Regulation encTfChipPkENCFF438BWN K562 NCOA6 Transcription Factor ChIP-seq Peaks of NCOA6 in K562 from ENCODE 3 (ENCFF438BWN) Regulation encTfChipPkENCFF749HKV K562 NCOA4 Transcription Factor ChIP-seq Peaks of NCOA4 in K562 from ENCODE 3 (ENCFF749HKV) Regulation encTfChipPkENCFF584SNZ K562 NCOA2 2 Transcription Factor ChIP-seq Peaks of NCOA2 in K562 from ENCODE 3 (ENCFF584SNZ) Regulation encTfChipPkENCFF071SOH K562 NCOA2 1 Transcription Factor ChIP-seq Peaks of NCOA2 in K562 from ENCODE 3 (ENCFF071SOH) Regulation encTfChipPkENCFF382RFJ K562 NCOA1 3 Transcription Factor ChIP-seq Peaks of NCOA1 in K562 from ENCODE 3 (ENCFF382RFJ) Regulation encTfChipPkENCFF474QDS K562 NCOA1 2 Transcription Factor ChIP-seq Peaks of NCOA1 in K562 from ENCODE 3 (ENCFF474QDS) Regulation encTfChipPkENCFF589OOF K562 NCOA1 1 Transcription Factor ChIP-seq Peaks of NCOA1 in K562 from ENCODE 3 (ENCFF589OOF) Regulation encTfChipPkENCFF728KKP K562 NBN Transcription Factor ChIP-seq Peaks of NBN in K562 from ENCODE 3 (ENCFF728KKP) Regulation encTfChipPkENCFF272LLG K562 MYNN Transcription Factor ChIP-seq Peaks of MYNN in K562 from ENCODE 3 (ENCFF272LLG) Regulation encTfChipPkENCFF605WXD K562 MYC 5 Transcription Factor ChIP-seq Peaks of MYC in K562 from ENCODE 3 (ENCFF605WXD) Regulation encTfChipPkENCFF527EGF K562 MYC 4 Transcription Factor ChIP-seq Peaks of MYC in K562 from ENCODE 3 (ENCFF527EGF) Regulation encTfChipPkENCFF492XUU K562 MYC 3 Transcription Factor ChIP-seq Peaks of MYC in K562 from ENCODE 3 (ENCFF492XUU) Regulation encTfChipPkENCFF339AQP K562 MYC 2 Transcription Factor ChIP-seq Peaks of MYC in K562 from ENCODE 3 (ENCFF339AQP) Regulation encTfChipPkENCFF700TLG K562 MYC 1 Transcription Factor ChIP-seq Peaks of MYC in K562 from ENCODE 3 (ENCFF700TLG) Regulation encTfChipPkENCFF905KOD K562 MYBL2 Transcription Factor ChIP-seq Peaks of MYBL2 in K562 from ENCODE 3 (ENCFF905KOD) Regulation encTfChipPkENCFF243QTL K562 MXI1 Transcription Factor ChIP-seq Peaks of MXI1 in K562 from ENCODE 3 (ENCFF243QTL) Regulation encTfChipPkENCFF459XLR K562 MTA3 Transcription Factor ChIP-seq Peaks of MTA3 in K562 from ENCODE 3 (ENCFF459XLR) Regulation encTfChipPkENCFF713ZVD K562 MTA2 2 Transcription Factor ChIP-seq Peaks of MTA2 in K562 from ENCODE 3 (ENCFF713ZVD) Regulation encTfChipPkENCFF558XIL K562 MTA2 1 Transcription Factor ChIP-seq Peaks of MTA2 in K562 from ENCODE 3 (ENCFF558XIL) Regulation encTfChipPkENCFF801KEW K562 MTA1 Transcription Factor ChIP-seq Peaks of MTA1 in K562 from ENCODE 3 (ENCFF801KEW) Regulation encTfChipPkENCFF459DYU K562 MNT 3 Transcription Factor ChIP-seq Peaks of MNT in K562 from ENCODE 3 (ENCFF459DYU) Regulation encTfChipPkENCFF454QQD K562 MNT 2 Transcription Factor ChIP-seq Peaks of MNT in K562 from ENCODE 3 (ENCFF454QQD) Regulation encTfChipPkENCFF926CRV K562 MNT 1 Transcription Factor ChIP-seq Peaks of MNT in K562 from ENCODE 3 (ENCFF926CRV) Regulation encTfChipPkENCFF388LUX K562 MLLT1 2 Transcription Factor ChIP-seq Peaks of MLLT1 in K562 from ENCODE 3 (ENCFF388LUX) Regulation encTfChipPkENCFF010AIG K562 MLLT1 1 Transcription Factor ChIP-seq Peaks of MLLT1 in K562 from ENCODE 3 (ENCFF010AIG) Regulation encTfChipPkENCFF071NYD K562 MITF 2 Transcription Factor ChIP-seq Peaks of MITF in K562 from ENCODE 3 (ENCFF071NYD) Regulation encTfChipPkENCFF262TMM K562 MITF 1 Transcription Factor ChIP-seq Peaks of MITF in K562 from ENCODE 3 (ENCFF262TMM) Regulation encTfChipPkENCFF163YZB K562 MIER1 Transcription Factor ChIP-seq Peaks of MIER1 in K562 from ENCODE 3 (ENCFF163YZB) Regulation encTfChipPkENCFF525MPI K562 MGA Transcription Factor ChIP-seq Peaks of MGA in K562 from ENCODE 3 (ENCFF525MPI) Regulation encTfChipPkENCFF937UEE K562 MEIS2 Transcription Factor ChIP-seq Peaks of MEIS2 in K562 from ENCODE 3 (ENCFF937UEE) Regulation encTfChipPkENCFF310SMW K562 MEF2A Transcription Factor ChIP-seq Peaks of MEF2A in K562 from ENCODE 3 (ENCFF310SMW) Regulation encTfChipPkENCFF288ZRD K562 MCM7 3 Transcription Factor ChIP-seq Peaks of MCM7 in K562 from ENCODE 3 (ENCFF288ZRD) Regulation encTfChipPkENCFF914ELA K562 MCM7 2 Transcription Factor ChIP-seq Peaks of MCM7 in K562 from ENCODE 3 (ENCFF914ELA) Regulation encTfChipPkENCFF159MQI K562 MCM7 1 Transcription Factor ChIP-seq Peaks of MCM7 in K562 from ENCODE 3 (ENCFF159MQI) Regulation encTfChipPkENCFF658SJY K562 MCM5 2 Transcription Factor ChIP-seq Peaks of MCM5 in K562 from ENCODE 3 (ENCFF658SJY) Regulation encTfChipPkENCFF603SXI K562 MCM5 1 Transcription Factor ChIP-seq Peaks of MCM5 in K562 from ENCODE 3 (ENCFF603SXI) Regulation encTfChipPkENCFF672PYP K562 MCM3 Transcription Factor ChIP-seq Peaks of MCM3 in K562 from ENCODE 3 (ENCFF672PYP) Regulation encTfChipPkENCFF571REC K562 MCM2 2 Transcription Factor ChIP-seq Peaks of MCM2 in K562 from ENCODE 3 (ENCFF571REC) Regulation encTfChipPkENCFF043HHG K562 MCM2 1 Transcription Factor ChIP-seq Peaks of MCM2 in K562 from ENCODE 3 (ENCFF043HHG) Regulation encTfChipPkENCFF617QSK K562 MBD2 Transcription Factor ChIP-seq Peaks of MBD2 in K562 from ENCODE 3 (ENCFF617QSK) Regulation encTfChipPkENCFF900NVQ K562 MAX 2 Transcription Factor ChIP-seq Peaks of MAX in K562 from ENCODE 3 (ENCFF900NVQ) Regulation encTfChipPkENCFF618VMC K562 MAX 1 Transcription Factor ChIP-seq Peaks of MAX in K562 from ENCODE 3 (ENCFF618VMC) Regulation encTfChipPkENCFF893SCL K562 MAFK Transcription Factor ChIP-seq Peaks of MAFK in K562 from ENCODE 3 (ENCFF893SCL) Regulation encTfChipPkENCFF498MGH K562 MAFF Transcription Factor ChIP-seq Peaks of MAFF in K562 from ENCODE 3 (ENCFF498MGH) Regulation encTfChipPkENCFF697VRJ K562 LEF1 2 Transcription Factor ChIP-seq Peaks of LEF1 in K562 from ENCODE 3 (ENCFF697VRJ) Regulation encTfChipPkENCFF134HQP K562 LEF1 1 Transcription Factor ChIP-seq Peaks of LEF1 in K562 from ENCODE 3 (ENCFF134HQP) Regulation encTfChipPkENCFF423LPW K562 L3MBTL2 Transcription Factor ChIP-seq Peaks of L3MBTL2 in K562 from ENCODE 3 (ENCFF423LPW) Regulation encTfChipPkENCFF379LKE K562 KLF16 Transcription Factor ChIP-seq Peaks of KLF16 in K562 from ENCODE 3 (ENCFF379LKE) Regulation encTfChipPkENCFF668XLN K562 KDM5B Transcription Factor ChIP-seq Peaks of KDM5B in K562 from ENCODE 3 (ENCFF668XLN) Regulation encTfChipPkENCFF955AOD K562 KDM4B 2 Transcription Factor ChIP-seq Peaks of KDM4B in K562 from ENCODE 3 (ENCFF955AOD) Regulation encTfChipPkENCFF470RHZ K562 KDM4B 1 Transcription Factor ChIP-seq Peaks of KDM4B in K562 from ENCODE 3 (ENCFF470RHZ) Regulation encTfChipPkENCFF483BRD K562 KDM1A 2 Transcription Factor ChIP-seq Peaks of KDM1A in K562 from ENCODE 3 (ENCFF483BRD) Regulation encTfChipPkENCFF796VMI K562 KDM1A 1 Transcription Factor ChIP-seq Peaks of KDM1A in K562 from ENCODE 3 (ENCFF796VMI) Regulation encTfChipPkENCFF207ZEK K562 KAT8 Transcription Factor ChIP-seq Peaks of KAT8 in K562 from ENCODE 3 (ENCFF207ZEK) Regulation encTfChipPkENCFF556XQQ K562 KAT2B Transcription Factor ChIP-seq Peaks of KAT2B in K562 from ENCODE 3 (ENCFF556XQQ) Regulation encTfChipPkENCFF213EYD K562 JUND Transcription Factor ChIP-seq Peaks of JUND in K562 from ENCODE 3 (ENCFF213EYD) Regulation encTfChipPkENCFF739XTO K562 JUNB Transcription Factor ChIP-seq Peaks of JUNB in K562 from ENCODE 3 (ENCFF739XTO) Regulation encTfChipPkENCFF032UMW K562 JUN 5 Transcription Factor ChIP-seq Peaks of JUN in K562 from ENCODE 3 (ENCFF032UMW) Regulation encTfChipPkENCFF394CEC K562 JUN 4 Transcription Factor ChIP-seq Peaks of JUN in K562 from ENCODE 3 (ENCFF394CEC) Regulation encTfChipPkENCFF167WUZ K562 JUN 3 Transcription Factor ChIP-seq Peaks of JUN in K562 from ENCODE 3 (ENCFF167WUZ) Regulation encTfChipPkENCFF672LKE K562 JUN 2 Transcription Factor ChIP-seq Peaks of JUN in K562 from ENCODE 3 (ENCFF672LKE) Regulation encTfChipPkENCFF881AVX K562 JUN 1 Transcription Factor ChIP-seq Peaks of JUN in K562 from ENCODE 3 (ENCFF881AVX) Regulation encTfChipPkENCFF886EVL K562 IRF2 Transcription Factor ChIP-seq Peaks of IRF2 in K562 from ENCODE 3 (ENCFF886EVL) Regulation encTfChipPkENCFF346LMY K562 IRF1 4 Transcription Factor ChIP-seq Peaks of IRF1 in K562 from ENCODE 3 (ENCFF346LMY) Regulation encTfChipPkENCFF938NBD K562 IRF1 3 Transcription Factor ChIP-seq Peaks of IRF1 in K562 from ENCODE 3 (ENCFF938NBD) Regulation encTfChipPkENCFF688XON K562 IRF1 2 Transcription Factor ChIP-seq Peaks of IRF1 in K562 from ENCODE 3 (ENCFF688XON) Regulation encTfChipPkENCFF978BBL K562 IRF1 1 Transcription Factor ChIP-seq Peaks of IRF1 in K562 from ENCODE 3 (ENCFF978BBL) Regulation encTfChipPkENCFF994OQH K562 IKZF1 2 Transcription Factor ChIP-seq Peaks of IKZF1 in K562 from ENCODE 3 (ENCFF994OQH) Regulation encTfChipPkENCFF785BTP K562 IKZF1 1 Transcription Factor ChIP-seq Peaks of IKZF1 in K562 from ENCODE 3 (ENCFF785BTP) Regulation encTfChipPkENCFF991ZSC K562 HNRNPUL1 Transcription Factor ChIP-seq Peaks of HNRNPUL1 in K562 from ENCODE 3 (ENCFF991ZSC) Regulation encTfChipPkENCFF662WPN K562 HNRNPLL Transcription Factor ChIP-seq Peaks of HNRNPLL in K562 from ENCODE 3 (ENCFF662WPN) Regulation encTfChipPkENCFF984ESZ K562 HNRNPL Transcription Factor ChIP-seq Peaks of HNRNPL in K562 from ENCODE 3 (ENCFF984ESZ) Regulation encTfChipPkENCFF984QUV K562 HNRNPK Transcription Factor ChIP-seq Peaks of HNRNPK in K562 from ENCODE 3 (ENCFF984QUV) Regulation encTfChipPkENCFF844QFF K562 HNRNPH1 Transcription Factor ChIP-seq Peaks of HNRNPH1 in K562 from ENCODE 3 (ENCFF844QFF) Regulation encTfChipPkENCFF718DFX K562 HMBOX1 Transcription Factor ChIP-seq Peaks of HMBOX1 in K562 from ENCODE 3 (ENCFF718DFX) Regulation encTfChipPkENCFF010OOE K562 HES1 Transcription Factor ChIP-seq Peaks of HES1 in K562 from ENCODE 3 (ENCFF010OOE) Regulation encTfChipPkENCFF295GBP K562 HDAC6 Transcription Factor ChIP-seq Peaks of HDAC6 in K562 from ENCODE 3 (ENCFF295GBP) Regulation encTfChipPkENCFF742LSD K562 HDAC3 Transcription Factor ChIP-seq Peaks of HDAC3 in K562 from ENCODE 3 (ENCFF742LSD) Regulation encTfChipPkENCFF618YRQ K562 HDAC2 3 Transcription Factor ChIP-seq Peaks of HDAC2 in K562 from ENCODE 3 (ENCFF618YRQ) Regulation encTfChipPkENCFF519RWJ K562 HDAC2 2 Transcription Factor ChIP-seq Peaks of HDAC2 in K562 from ENCODE 3 (ENCFF519RWJ) Regulation encTfChipPkENCFF363GSV K562 HDAC2 1 Transcription Factor ChIP-seq Peaks of HDAC2 in K562 from ENCODE 3 (ENCFF363GSV) Regulation encTfChipPkENCFF557WXK K562 HDAC1 4 Transcription Factor ChIP-seq Peaks of HDAC1 in K562 from ENCODE 3 (ENCFF557WXK) Regulation encTfChipPkENCFF188TBM K562 HDAC1 3 Transcription Factor ChIP-seq Peaks of HDAC1 in K562 from ENCODE 3 (ENCFF188TBM) Regulation encTfChipPkENCFF661VOO K562 HDAC1 2 Transcription Factor ChIP-seq Peaks of HDAC1 in K562 from ENCODE 3 (ENCFF661VOO) Regulation encTfChipPkENCFF758PGF K562 HDAC1 1 Transcription Factor ChIP-seq Peaks of HDAC1 in K562 from ENCODE 3 (ENCFF758PGF) Regulation encTfChipPkENCFF167RXK K562 HCFC1 Transcription Factor ChIP-seq Peaks of HCFC1 in K562 from ENCODE 3 (ENCFF167RXK) Regulation encTfChipPkENCFF678VPQ K562 GMEB1 Transcription Factor ChIP-seq Peaks of GMEB1 in K562 from ENCODE 3 (ENCFF678VPQ) Regulation encTfChipPkENCFF569CMJ K562 GATAD2B Transcription Factor ChIP-seq Peaks of GATAD2B in K562 from ENCODE 3 (ENCFF569CMJ) Regulation encTfChipPkENCFF950ZWP K562 GATAD2A Transcription Factor ChIP-seq Peaks of GATAD2A in K562 from ENCODE 3 (ENCFF950ZWP) Regulation encTfChipPkENCFF173TXA K562 GATA2 Transcription Factor ChIP-seq Peaks of GATA2 in K562 from ENCODE 3 (ENCFF173TXA) Regulation encTfChipPkENCFF148JKK K562 GATA1 Transcription Factor ChIP-seq Peaks of GATA1 in K562 from ENCODE 3 (ENCFF148JKK) Regulation encTfChipPkENCFF700DXR K562 GABPB1 Transcription Factor ChIP-seq Peaks of GABPB1 in K562 from ENCODE 3 (ENCFF700DXR) Regulation encTfChipPkENCFF124HAC K562 GABPA Transcription Factor ChIP-seq Peaks of GABPA in K562 from ENCODE 3 (ENCFF124HAC) Regulation encTfChipPkENCFF688ARM K562 FUS Transcription Factor ChIP-seq Peaks of FUS in K562 from ENCODE 3 (ENCFF688ARM) Regulation encTfChipPkENCFF778PWE K562 FOXM1 Transcription Factor ChIP-seq Peaks of FOXM1 in K562 from ENCODE 3 (ENCFF778PWE) Regulation encTfChipPkENCFF490EQR K562 FOXK2 2 Transcription Factor ChIP-seq Peaks of FOXK2 in K562 from ENCODE 3 (ENCFF490EQR) Regulation encTfChipPkENCFF066CWG K562 FOXK2 1 Transcription Factor ChIP-seq Peaks of FOXK2 in K562 from ENCODE 3 (ENCFF066CWG) Regulation encTfChipPkENCFF765NAN K562 FOXA1 Transcription Factor ChIP-seq Peaks of FOXA1 in K562 from ENCODE 3 (ENCFF765NAN) Regulation encTfChipPkENCFF087MFG K562 FOSL1 Transcription Factor ChIP-seq Peaks of FOSL1 in K562 from ENCODE 3 (ENCFF087MFG) Regulation encTfChipPkENCFF084DTV K562 FIP1L1 Transcription Factor ChIP-seq Peaks of FIP1L1 in K562 from ENCODE 3 (ENCFF084DTV) Regulation encTfChipPkENCFF560CYG K562 EWSR1 Transcription Factor ChIP-seq Peaks of EWSR1 in K562 from ENCODE 3 (ENCFF560CYG) Regulation encTfChipPkENCFF658SGJ K562 ETV6 2 Transcription Factor ChIP-seq Peaks of ETV6 in K562 from ENCODE 3 (ENCFF658SGJ) Regulation encTfChipPkENCFF426GSY K562 ETV6 1 Transcription Factor ChIP-seq Peaks of ETV6 in K562 from ENCODE 3 (ENCFF426GSY) Regulation encTfChipPkENCFF461PRP K562 ETS1 Transcription Factor ChIP-seq Peaks of ETS1 in K562 from ENCODE 3 (ENCFF461PRP) Regulation encTfChipPkENCFF592GWM K562 ESRRA Transcription Factor ChIP-seq Peaks of ESRRA in K562 from ENCODE 3 (ENCFF592GWM) Regulation encTfChipPkENCFF225BXA K562 EP400 Transcription Factor ChIP-seq Peaks of EP400 in K562 from ENCODE 3 (ENCFF225BXA) Regulation encTfChipPkENCFF755HCK K562 EP300 Transcription Factor ChIP-seq Peaks of EP300 in K562 from ENCODE 3 (ENCFF755HCK) Regulation encTfChipPkENCFF119SCQ K562 ELK1 Transcription Factor ChIP-seq Peaks of ELK1 in K562 from ENCODE 3 (ENCFF119SCQ) Regulation encTfChipPkENCFF539SXG K562 ELF4 Transcription Factor ChIP-seq Peaks of ELF4 in K562 from ENCODE 3 (ENCFF539SXG) Regulation encTfChipPkENCFF617ZLL K562 ELF1 Transcription Factor ChIP-seq Peaks of ELF1 in K562 from ENCODE 3 (ENCFF617ZLL) Regulation encTfChipPkENCFF682XPD K562 EHMT2 Transcription Factor ChIP-seq Peaks of EHMT2 in K562 from ENCODE 3 (ENCFF682XPD) Regulation encTfChipPkENCFF561OGS K562 EGR1 3 Transcription Factor ChIP-seq Peaks of EGR1 in K562 from ENCODE 3 (ENCFF561OGS) Regulation encTfChipPkENCFF175VSS K562 EGR1 2 Transcription Factor ChIP-seq Peaks of EGR1 in K562 from ENCODE 3 (ENCFF175VSS) Regulation encTfChipPkENCFF375RDB K562 EGR1 1 Transcription Factor ChIP-seq Peaks of EGR1 in K562 from ENCODE 3 (ENCFF375RDB) Regulation encTfChipPkENCFF752KNU K562 E4F1 Transcription Factor ChIP-seq Peaks of E4F1 in K562 from ENCODE 3 (ENCFF752KNU) Regulation encTfChipPkENCFF171WWF K562 E2F8 Transcription Factor ChIP-seq Peaks of E2F8 in K562 from ENCODE 3 (ENCFF171WWF) Regulation encTfChipPkENCFF013EHI K562 E2F7 Transcription Factor ChIP-seq Peaks of E2F7 in K562 from ENCODE 3 (ENCFF013EHI) Regulation encTfChipPkENCFF533GSH K562 E2F6 Transcription Factor ChIP-seq Peaks of E2F6 in K562 from ENCODE 3 (ENCFF533GSH) Regulation encTfChipPkENCFF445VTT K562 E2F1 2 Transcription Factor ChIP-seq Peaks of E2F1 in K562 from ENCODE 3 (ENCFF445VTT) Regulation encTfChipPkENCFF134JLR K562 E2F1 1 Transcription Factor ChIP-seq Peaks of E2F1 in K562 from ENCODE 3 (ENCFF134JLR) Regulation encTfChipPkENCFF217ZTP K562 DPF2 2 Transcription Factor ChIP-seq Peaks of DPF2 in K562 from ENCODE 3 (ENCFF217ZTP) Regulation encTfChipPkENCFF537VKZ K562 DPF2 1 Transcription Factor ChIP-seq Peaks of DPF2 in K562 from ENCODE 3 (ENCFF537VKZ) Regulation encTfChipPkENCFF549TVW K562 DNMT1 Transcription Factor ChIP-seq Peaks of DNMT1 in K562 from ENCODE 3 (ENCFF549TVW) Regulation encTfChipPkENCFF532HCE K562 DEAF1 Transcription Factor ChIP-seq Peaks of DEAF1 in K562 from ENCODE 3 (ENCFF532HCE) Regulation encTfChipPkENCFF870LJV K562 DACH1 Transcription Factor ChIP-seq Peaks of DACH1 in K562 from ENCODE 3 (ENCFF870LJV) Regulation encTfChipPkENCFF556HMX K562 CUX1 Transcription Factor ChIP-seq Peaks of CUX1 in K562 from ENCODE 3 (ENCFF556HMX) Regulation encTfChipPkENCFF396BZQ K562 CTCF 4 Transcription Factor ChIP-seq Peaks of CTCF in K562 from ENCODE 3 (ENCFF396BZQ) Regulation encTfChipPkENCFF119XFJ K562 CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in K562 from ENCODE 3 (ENCFF119XFJ) Regulation encTfChipPkENCFF519CXF K562 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in K562 from ENCODE 3 (ENCFF519CXF) Regulation encTfChipPkENCFF843VHC K562 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in K562 from ENCODE 3 (ENCFF843VHC) Regulation encTfChipPkENCFF349UTF K562 CTBP1 Transcription Factor ChIP-seq Peaks of CTBP1 in K562 from ENCODE 3 (ENCFF349UTF) Regulation encTfChipPkENCFF021XJN K562 CREM Transcription Factor ChIP-seq Peaks of CREM in K562 from ENCODE 3 (ENCFF021XJN) Regulation encTfChipPkENCFF678FRK K562 CREBBP Transcription Factor ChIP-seq Peaks of CREBBP in K562 from ENCODE 3 (ENCFF678FRK) Regulation encTfChipPkENCFF566HGU K562 CREB3L1 Transcription Factor ChIP-seq Peaks of CREB3L1 in K562 from ENCODE 3 (ENCFF566HGU) Regulation encTfChipPkENCFF552EBC K562 COPS2 Transcription Factor ChIP-seq Peaks of COPS2 in K562 from ENCODE 3 (ENCFF552EBC) Regulation encTfChipPkENCFF919KNQ K562 CHAMP1 2 Transcription Factor ChIP-seq Peaks of CHAMP1 in K562 from ENCODE 3 (ENCFF919KNQ) Regulation encTfChipPkENCFF646MEF K562 CHAMP1 1 Transcription Factor ChIP-seq Peaks of CHAMP1 in K562 from ENCODE 3 (ENCFF646MEF) Regulation encTfChipPkENCFF321KQD K562 CEBPB Transcription Factor ChIP-seq Peaks of CEBPB in K562 from ENCODE 3 (ENCFF321KQD) Regulation encTfChipPkENCFF384ALH K562 CDC5L Transcription Factor ChIP-seq Peaks of CDC5L in K562 from ENCODE 3 (ENCFF384ALH) Regulation encTfChipPkENCFF704PGT K562 CCAR2 Transcription Factor ChIP-seq Peaks of CCAR2 in K562 from ENCODE 3 (ENCFF704PGT) Regulation encTfChipPkENCFF180TUM K562 CC2D1A Transcription Factor ChIP-seq Peaks of CC2D1A in K562 from ENCODE 3 (ENCFF180TUM) Regulation encTfChipPkENCFF403TAE K562 CBX5 Transcription Factor ChIP-seq Peaks of CBX5 in K562 from ENCODE 3 (ENCFF403TAE) Regulation encTfChipPkENCFF951BQB K562 CBX3 2 Transcription Factor ChIP-seq Peaks of CBX3 in K562 from ENCODE 3 (ENCFF951BQB) Regulation encTfChipPkENCFF378YKS K562 CBX3 1 Transcription Factor ChIP-seq Peaks of CBX3 in K562 from ENCODE 3 (ENCFF378YKS) Regulation encTfChipPkENCFF163FLA K562 CBX1 Transcription Factor ChIP-seq Peaks of CBX1 in K562 from ENCODE 3 (ENCFF163FLA) Regulation encTfChipPkENCFF153IFH K562 CBFA2T3 Transcription Factor ChIP-seq Peaks of CBFA2T3 in K562 from ENCODE 3 (ENCFF153IFH) Regulation encTfChipPkENCFF419PEK K562 CBFA2T2 Transcription Factor ChIP-seq Peaks of CBFA2T2 in K562 from ENCODE 3 (ENCFF419PEK) Regulation encTfChipPkENCFF104MXG K562 C11orf30 Transcription Factor ChIP-seq Peaks of C11orf30 in K562 from ENCODE 3 (ENCFF104MXG) Regulation encTfChipPkENCFF411RMT K562 BRD9 Transcription Factor ChIP-seq Peaks of BRD9 in K562 from ENCODE 3 (ENCFF411RMT) Regulation encTfChipPkENCFF806CQB K562 BRD4 Transcription Factor ChIP-seq Peaks of BRD4 in K562 from ENCODE 3 (ENCFF806CQB) Regulation encTfChipPkENCFF652NES K562 BRCA1 Transcription Factor ChIP-seq Peaks of BRCA1 in K562 from ENCODE 3 (ENCFF652NES) Regulation encTfChipPkENCFF352DRR K562 BMI1 Transcription Factor ChIP-seq Peaks of BMI1 in K562 from ENCODE 3 (ENCFF352DRR) Regulation encTfChipPkENCFF477JTV K562 BHLHE40 Transcription Factor ChIP-seq Peaks of BHLHE40 in K562 from ENCODE 3 (ENCFF477JTV) Regulation encTfChipPkENCFF186JKG K562 BCOR Transcription Factor ChIP-seq Peaks of BCOR in K562 from ENCODE 3 (ENCFF186JKG) Regulation encTfChipPkENCFF543FNN K562 BACH1 Transcription Factor ChIP-seq Peaks of BACH1 in K562 from ENCODE 3 (ENCFF543FNN) Regulation encTfChipPkENCFF371SJR K562 ATF7 Transcription Factor ChIP-seq Peaks of ATF7 in K562 from ENCODE 3 (ENCFF371SJR) Regulation encTfChipPkENCFF182MNO K562 ATF4 Transcription Factor ChIP-seq Peaks of ATF4 in K562 from ENCODE 3 (ENCFF182MNO) Regulation encTfChipPkENCFF937OKC K562 ATF3 2 Transcription Factor ChIP-seq Peaks of ATF3 in K562 from ENCODE 3 (ENCFF937OKC) Regulation encTfChipPkENCFF467WOR K562 ATF3 1 Transcription Factor ChIP-seq Peaks of ATF3 in K562 from ENCODE 3 (ENCFF467WOR) Regulation encTfChipPkENCFF803FHN K562 ATF2 Transcription Factor ChIP-seq Peaks of ATF2 in K562 from ENCODE 3 (ENCFF803FHN) Regulation encTfChipPkENCFF958YSG K562 ASH1L Transcription Factor ChIP-seq Peaks of ASH1L in K562 from ENCODE 3 (ENCFF958YSG) Regulation encTfChipPkENCFF913AQF K562 ARNT 3 Transcription Factor ChIP-seq Peaks of ARNT in K562 from ENCODE 3 (ENCFF913AQF) Regulation encTfChipPkENCFF447FIO K562 ARNT 2 Transcription Factor ChIP-seq Peaks of ARNT in K562 from ENCODE 3 (ENCFF447FIO) Regulation encTfChipPkENCFF655EFA K562 ARNT 1 Transcription Factor ChIP-seq Peaks of ARNT in K562 from ENCODE 3 (ENCFF655EFA) Regulation encTfChipPkENCFF757OML K562 ARID3A Transcription Factor ChIP-seq Peaks of ARID3A in K562 from ENCODE 3 (ENCFF757OML) Regulation encTfChipPkENCFF344MKI K562 ARID2 Transcription Factor ChIP-seq Peaks of ARID2 in K562 from ENCODE 3 (ENCFF344MKI) Regulation encTfChipPkENCFF249TYS K562 ARID1B Transcription Factor ChIP-seq Peaks of ARID1B in K562 from ENCODE 3 (ENCFF249TYS) Regulation encTfChipPkENCFF089PKE K562 ARHGAP35 Transcription Factor ChIP-seq Peaks of ARHGAP35 in K562 from ENCODE 3 (ENCFF089PKE) Regulation encTfChipPkENCFF100VYA K562 AGO1 Transcription Factor ChIP-seq Peaks of AGO1 in K562 from ENCODE 3 (ENCFF100VYA) Regulation encTfChipPkENCFF489SKQ K562 AFF1 2 Transcription Factor ChIP-seq Peaks of AFF1 in K562 from ENCODE 3 (ENCFF489SKQ) Regulation encTfChipPkENCFF869BYK K562 AFF1 1 Transcription Factor ChIP-seq Peaks of AFF1 in K562 from ENCODE 3 (ENCFF869BYK) Regulation encTfChipPkENCFF085HJD Ishikawa POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in Ishikawa from ENCODE 3 (ENCFF085HJD) Regulation encTfChipPkENCFF293OHT Ishikawa NR3C1 2 Transcription Factor ChIP-seq Peaks of NR3C1 in Ishikawa from ENCODE 3 (ENCFF293OHT) Regulation encTfChipPkENCFF519BOO Ishikawa NR3C1 1 Transcription Factor ChIP-seq Peaks of NR3C1 in Ishikawa from ENCODE 3 (ENCFF519BOO) Regulation encTfChipPkENCFF778BLL Ishikawa ESR1 3 Transcription Factor ChIP-seq Peaks of ESR1 in Ishikawa from ENCODE 3 (ENCFF778BLL) Regulation encTfChipPkENCFF279JGE Ishikawa ESR1 2 Transcription Factor ChIP-seq Peaks of ESR1 in Ishikawa from ENCODE 3 (ENCFF279JGE) Regulation encTfChipPkENCFF076OFH Ishikawa ESR1 1 Transcription Factor ChIP-seq Peaks of ESR1 in Ishikawa from ENCODE 3 (ENCFF076OFH) Regulation encTfChipPkENCFF675JJV Ishikawa CTCF Transcription Factor ChIP-seq Peaks of CTCF in Ishikawa from ENCODE 3 (ENCFF675JJV) Regulation encTfChipPkENCFF938BOJ IMR-90 USF2 Transcription Factor ChIP-seq Peaks of USF2 in IMR-90 from ENCODE 3 (ENCFF938BOJ) Regulation encTfChipPkENCFF380ZXB IMR-90 SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in IMR-90 from ENCODE 3 (ENCFF380ZXB) Regulation encTfChipPkENCFF139EBY IMR-90 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in IMR-90 from ENCODE 3 (ENCFF139EBY) Regulation encTfChipPkENCFF895JAW IMR-90 RAD21 Transcription Factor ChIP-seq Peaks of RAD21 in IMR-90 from ENCODE 3 (ENCFF895JAW) Regulation encTfChipPkENCFF474PPT IMR-90 NFE2L2 Transcription Factor ChIP-seq Peaks of NFE2L2 in IMR-90 from ENCODE 3 (ENCFF474PPT) Regulation encTfChipPkENCFF351VGZ IMR-90 MAFK Transcription Factor ChIP-seq Peaks of MAFK in IMR-90 from ENCODE 3 (ENCFF351VGZ) Regulation encTfChipPkENCFF217ZMF IMR-90 FOS Transcription Factor ChIP-seq Peaks of FOS in IMR-90 from ENCODE 3 (ENCFF217ZMF) Regulation encTfChipPkENCFF687IUD IMR-90 ELK1 Transcription Factor ChIP-seq Peaks of ELK1 in IMR-90 from ENCODE 3 (ENCFF687IUD) Regulation encTfChipPkENCFF307XFM IMR-90 CTCF Transcription Factor ChIP-seq Peaks of CTCF in IMR-90 from ENCODE 3 (ENCFF307XFM) Regulation encTfChipPkENCFF510QXG IMR-90 CHD1 Transcription Factor ChIP-seq Peaks of CHD1 in IMR-90 from ENCODE 3 (ENCFF510QXG) Regulation encTfChipPkENCFF757KYL IMR-90 CEBPB Transcription Factor ChIP-seq Peaks of CEBPB in IMR-90 from ENCODE 3 (ENCFF757KYL) Regulation encTfChipPkENCFF567GON IMR-90 BHLHE40 Transcription Factor ChIP-seq Peaks of BHLHE40 in IMR-90 from ENCODE 3 (ENCFF567GON) Regulation encTfChipPkENCFF950VAR HepG2 ZNF384 Transcription Factor ChIP-seq Peaks of ZNF384 in HepG2 from ENCODE 3 (ENCFF950VAR) Regulation encTfChipPkENCFF482XNG HepG2 ZNF282 Transcription Factor ChIP-seq Peaks of ZNF282 in HepG2 from ENCODE 3 (ENCFF482XNG) Regulation encTfChipPkENCFF858WPR HepG2 ZNF24 2 Transcription Factor ChIP-seq Peaks of ZNF24 in HepG2 from ENCODE 3 (ENCFF858WPR) Regulation encTfChipPkENCFF904QAD HepG2 ZNF24 1 Transcription Factor ChIP-seq Peaks of ZNF24 in HepG2 from ENCODE 3 (ENCFF904QAD) Regulation encTfChipPkENCFF657ZXY HepG2 ZNF207 Transcription Factor ChIP-seq Peaks of ZNF207 in HepG2 from ENCODE 3 (ENCFF657ZXY) Regulation encTfChipPkENCFF769SEZ HepG2 ZMYM3 Transcription Factor ChIP-seq Peaks of ZMYM3 in HepG2 from ENCODE 3 (ENCFF769SEZ) Regulation encTfChipPkENCFF721NEC HepG2 ZKSCAN1 Transcription Factor ChIP-seq Peaks of ZKSCAN1 in HepG2 from ENCODE 3 (ENCFF721NEC) Regulation encTfChipPkENCFF964KDQ HepG2 ZHX2 Transcription Factor ChIP-seq Peaks of ZHX2 in HepG2 from ENCODE 3 (ENCFF964KDQ) Regulation encTfChipPkENCFF953JQD HepG2 ZBTB7A Transcription Factor ChIP-seq Peaks of ZBTB7A in HepG2 from ENCODE 3 (ENCFF953JQD) Regulation encTfChipPkENCFF624WDI HepG2 ZBTB40 Transcription Factor ChIP-seq Peaks of ZBTB40 in HepG2 from ENCODE 3 (ENCFF624WDI) Regulation encTfChipPkENCFF943WRA HepG2 ZBTB33 Transcription Factor ChIP-seq Peaks of ZBTB33 in HepG2 from ENCODE 3 (ENCFF943WRA) Regulation encTfChipPkENCFF177YDT HepG2 YY1 Transcription Factor ChIP-seq Peaks of YY1 in HepG2 from ENCODE 3 (ENCFF177YDT) Regulation encTfChipPkENCFF790ZAQ HepG2 XRCC5 Transcription Factor ChIP-seq Peaks of XRCC5 in HepG2 from ENCODE 3 (ENCFF790ZAQ) Regulation encTfChipPkENCFF914IFQ HepG2 USF1 Transcription Factor ChIP-seq Peaks of USF1 in HepG2 from ENCODE 3 (ENCFF914IFQ) Regulation encTfChipPkENCFF562ADR HepG2 U2AF2 Transcription Factor ChIP-seq Peaks of U2AF2 in HepG2 from ENCODE 3 (ENCFF562ADR) Regulation encTfChipPkENCFF034KUO HepG2 U2AF1 Transcription Factor ChIP-seq Peaks of U2AF1 in HepG2 from ENCODE 3 (ENCFF034KUO) Regulation encTfChipPkENCFF063GDN HepG2 TRIM22 Transcription Factor ChIP-seq Peaks of TRIM22 in HepG2 from ENCODE 3 (ENCFF063GDN) Regulation encTfChipPkENCFF912SQI HepG2 TFAP4 Transcription Factor ChIP-seq Peaks of TFAP4 in HepG2 from ENCODE 3 (ENCFF912SQI) Regulation encTfChipPkENCFF928MIN HepG2 TCF7 Transcription Factor ChIP-seq Peaks of TCF7 in HepG2 from ENCODE 3 (ENCFF928MIN) Regulation encTfChipPkENCFF299JYV HepG2 TCF12 2 Transcription Factor ChIP-seq Peaks of TCF12 in HepG2 from ENCODE 3 (ENCFF299JYV) Regulation encTfChipPkENCFF820PHL HepG2 TCF12 1 Transcription Factor ChIP-seq Peaks of TCF12 in HepG2 from ENCODE 3 (ENCFF820PHL) Regulation encTfChipPkENCFF887DUY HepG2 TBX3 2 Transcription Factor ChIP-seq Peaks of TBX3 in HepG2 from ENCODE 3 (ENCFF887DUY) Regulation encTfChipPkENCFF654KVO HepG2 TBX3 1 Transcription Factor ChIP-seq Peaks of TBX3 in HepG2 from ENCODE 3 (ENCFF654KVO) Regulation encTfChipPkENCFF534GKQ HepG2 TBP Transcription Factor ChIP-seq Peaks of TBP in HepG2 from ENCODE 3 (ENCFF534GKQ) Regulation encTfChipPkENCFF126KGW HepG2 TBL1XR1 Transcription Factor ChIP-seq Peaks of TBL1XR1 in HepG2 from ENCODE 3 (ENCFF126KGW) Regulation encTfChipPkENCFF718RXL HepG2 TAF15 Transcription Factor ChIP-seq Peaks of TAF15 in HepG2 from ENCODE 3 (ENCFF718RXL) Regulation encTfChipPkENCFF234TBW HepG2 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in HepG2 from ENCODE 3 (ENCFF234TBW) Regulation encTfChipPkENCFF239LRW HepG2 SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in HepG2 from ENCODE 3 (ENCFF239LRW) Regulation encTfChipPkENCFF105XWO HepG2 SRSF9 Transcription Factor ChIP-seq Peaks of SRSF9 in HepG2 from ENCODE 3 (ENCFF105XWO) Regulation encTfChipPkENCFF122FVR HepG2 SRSF4 Transcription Factor ChIP-seq Peaks of SRSF4 in HepG2 from ENCODE 3 (ENCFF122FVR) Regulation encTfChipPkENCFF735WMX HepG2 SP1 2 Transcription Factor ChIP-seq Peaks of SP1 in HepG2 from ENCODE 3 (ENCFF735WMX) Regulation encTfChipPkENCFF175VXL HepG2 SP1 1 Transcription Factor ChIP-seq Peaks of SP1 in HepG2 from ENCODE 3 (ENCFF175VXL) Regulation encTfChipPkENCFF944LNI HepG2 SOX6 Transcription Factor ChIP-seq Peaks of SOX6 in HepG2 from ENCODE 3 (ENCFF944LNI) Regulation encTfChipPkENCFF257QND HepG2 SOX13 Transcription Factor ChIP-seq Peaks of SOX13 in HepG2 from ENCODE 3 (ENCFF257QND) Regulation encTfChipPkENCFF858FBZ HepG2 SNRNP70 Transcription Factor ChIP-seq Peaks of SNRNP70 in HepG2 from ENCODE 3 (ENCFF858FBZ) Regulation encTfChipPkENCFF035YWE HepG2 SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in HepG2 from ENCODE 3 (ENCFF035YWE) Regulation encTfChipPkENCFF210HAA HepG2 SMARCE1 Transcription Factor ChIP-seq Peaks of SMARCE1 in HepG2 from ENCODE 3 (ENCFF210HAA) Regulation encTfChipPkENCFF150NHK HepG2 SMARCC2 Transcription Factor ChIP-seq Peaks of SMARCC2 in HepG2 from ENCODE 3 (ENCFF150NHK) Regulation encTfChipPkENCFF035ZFO HepG2 SKI Transcription Factor ChIP-seq Peaks of SKI in HepG2 from ENCODE 3 (ENCFF035ZFO) Regulation encTfChipPkENCFF193DQZ HepG2 SIN3B Transcription Factor ChIP-seq Peaks of SIN3B in HepG2 from ENCODE 3 (ENCFF193DQZ) Regulation encTfChipPkENCFF635YMI HepG2 SIN3A Transcription Factor ChIP-seq Peaks of SIN3A in HepG2 from ENCODE 3 (ENCFF635YMI) Regulation encTfChipPkENCFF105TFM HepG2 RXRA Transcription Factor ChIP-seq Peaks of RXRA in HepG2 from ENCODE 3 (ENCFF105TFM) Regulation encTfChipPkENCFF380SYL HepG2 RNF2 Transcription Factor ChIP-seq Peaks of RNF2 in HepG2 from ENCODE 3 (ENCFF380SYL) Regulation encTfChipPkENCFF059GWW HepG2 RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in HepG2 from ENCODE 3 (ENCFF059GWW) Regulation encTfChipPkENCFF788CJF HepG2 RFX1 Transcription Factor ChIP-seq Peaks of RFX1 in HepG2 from ENCODE 3 (ENCFF788CJF) Regulation encTfChipPkENCFF986RRJ HepG2 REST 2 Transcription Factor ChIP-seq Peaks of REST in HepG2 from ENCODE 3 (ENCFF986RRJ) Regulation encTfChipPkENCFF669XCW HepG2 REST 1 Transcription Factor ChIP-seq Peaks of REST in HepG2 from ENCODE 3 (ENCFF669XCW) Regulation encTfChipPkENCFF987VKU HepG2 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in HepG2 from ENCODE 3 (ENCFF987VKU) Regulation encTfChipPkENCFF420ALF HepG2 RBM39 Transcription Factor ChIP-seq Peaks of RBM39 in HepG2 from ENCODE 3 (ENCFF420ALF) Regulation encTfChipPkENCFF305WYD HepG2 RBM22 Transcription Factor ChIP-seq Peaks of RBM22 in HepG2 from ENCODE 3 (ENCFF305WYD) Regulation encTfChipPkENCFF871YRG HepG2 RBFOX2 Transcription Factor ChIP-seq Peaks of RBFOX2 in HepG2 from ENCODE 3 (ENCFF871YRG) Regulation encTfChipPkENCFF859MBC HepG2 RAD51 Transcription Factor ChIP-seq Peaks of RAD51 in HepG2 from ENCODE 3 (ENCFF859MBC) Regulation encTfChipPkENCFF874VFZ HepG2 RAD21 2 Transcription Factor ChIP-seq Peaks of RAD21 in HepG2 from ENCODE 3 (ENCFF874VFZ) Regulation encTfChipPkENCFF093XOJ HepG2 RAD21 1 Transcription Factor ChIP-seq Peaks of RAD21 in HepG2 from ENCODE 3 (ENCFF093XOJ) Regulation encTfChipPkENCFF875ZPV HepG2 PTBP1 Transcription Factor ChIP-seq Peaks of PTBP1 in HepG2 from ENCODE 3 (ENCFF875ZPV) Regulation encTfChipPkENCFF908QCS HepG2 PRPF4 Transcription Factor ChIP-seq Peaks of PRPF4 in HepG2 from ENCODE 3 (ENCFF908QCS) Regulation encTfChipPkENCFF551IJP HepG2 POLR2G Transcription Factor ChIP-seq Peaks of POLR2G in HepG2 from ENCODE 3 (ENCFF551IJP) Regulation encTfChipPkENCFF565SUC HepG2 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in HepG2 from ENCODE 3 (ENCFF565SUC) Regulation encTfChipPkENCFF873OHG HepG2 PLRG1 Transcription Factor ChIP-seq Peaks of PLRG1 in HepG2 from ENCODE 3 (ENCFF873OHG) Regulation encTfChipPkENCFF202WIO HepG2 PHF8 Transcription Factor ChIP-seq Peaks of PHF8 in HepG2 from ENCODE 3 (ENCFF202WIO) Regulation encTfChipPkENCFF882RPA HepG2 PHB2 Transcription Factor ChIP-seq Peaks of PHB2 in HepG2 from ENCODE 3 (ENCFF882RPA) Regulation encTfChipPkENCFF642XRH HepG2 PCBP2 Transcription Factor ChIP-seq Peaks of PCBP2 in HepG2 from ENCODE 3 (ENCFF642XRH) Regulation encTfChipPkENCFF487WAN HepG2 PCBP1 Transcription Factor ChIP-seq Peaks of PCBP1 in HepG2 from ENCODE 3 (ENCFF487WAN) Regulation encTfChipPkENCFF313RFR HepG2 NRF1 2 Transcription Factor ChIP-seq Peaks of NRF1 in HepG2 from ENCODE 3 (ENCFF313RFR) Regulation encTfChipPkENCFF418DKQ HepG2 NRF1 1 Transcription Factor ChIP-seq Peaks of NRF1 in HepG2 from ENCODE 3 (ENCFF418DKQ) Regulation encTfChipPkENCFF350CKI HepG2 NR2F6 Transcription Factor ChIP-seq Peaks of NR2F6 in HepG2 from ENCODE 3 (ENCFF350CKI) Regulation encTfChipPkENCFF162TPR HepG2 NFRKB Transcription Factor ChIP-seq Peaks of NFRKB in HepG2 from ENCODE 3 (ENCFF162TPR) Regulation encTfChipPkENCFF882YLO HepG2 NFE2L2 Transcription Factor ChIP-seq Peaks of NFE2L2 in HepG2 from ENCODE 3 (ENCFF882YLO) Regulation encTfChipPkENCFF616RSZ HepG2 NCOR1 Transcription Factor ChIP-seq Peaks of NCOR1 in HepG2 from ENCODE 3 (ENCFF616RSZ) Regulation encTfChipPkENCFF516UWH HepG2 NBN Transcription Factor ChIP-seq Peaks of NBN in HepG2 from ENCODE 3 (ENCFF516UWH) Regulation encTfChipPkENCFF482JSR HepG2 MNT 2 Transcription Factor ChIP-seq Peaks of MNT in HepG2 from ENCODE 3 (ENCFF482JSR) Regulation encTfChipPkENCFF562FMQ HepG2 MNT 1 Transcription Factor ChIP-seq Peaks of MNT in HepG2 from ENCODE 3 (ENCFF562FMQ) Regulation encTfChipPkENCFF140PUO HepG2 MAX Transcription Factor ChIP-seq Peaks of MAX in HepG2 from ENCODE 3 (ENCFF140PUO) Regulation encTfChipPkENCFF171OJF HepG2 MAFK 2 Transcription Factor ChIP-seq Peaks of MAFK in HepG2 from ENCODE 3 (ENCFF171OJF) Regulation encTfChipPkENCFF770TZL HepG2 MAFK 1 Transcription Factor ChIP-seq Peaks of MAFK in HepG2 from ENCODE 3 (ENCFF770TZL) Regulation encTfChipPkENCFF493TIR HepG2 MAFF Transcription Factor ChIP-seq Peaks of MAFF in HepG2 from ENCODE 3 (ENCFF493TIR) Regulation encTfChipPkENCFF611PIO HepG2 LCORL Transcription Factor ChIP-seq Peaks of LCORL in HepG2 from ENCODE 3 (ENCFF611PIO) Regulation encTfChipPkENCFF334HKG HepG2 KDM5A Transcription Factor ChIP-seq Peaks of KDM5A in HepG2 from ENCODE 3 (ENCFF334HKG) Regulation encTfChipPkENCFF768FGG HepG2 KDM1A Transcription Factor ChIP-seq Peaks of KDM1A in HepG2 from ENCODE 3 (ENCFF768FGG) Regulation encTfChipPkENCFF091BEK HepG2 KAT2B Transcription Factor ChIP-seq Peaks of KAT2B in HepG2 from ENCODE 3 (ENCFF091BEK) Regulation encTfChipPkENCFF539GRW HepG2 JUND 2 Transcription Factor ChIP-seq Peaks of JUND in HepG2 from ENCODE 3 (ENCFF539GRW) Regulation encTfChipPkENCFF430PEI HepG2 JUND 1 Transcription Factor ChIP-seq Peaks of JUND in HepG2 from ENCODE 3 (ENCFF430PEI) Regulation encTfChipPkENCFF969BZA HepG2 IKZF1 Transcription Factor ChIP-seq Peaks of IKZF1 in HepG2 from ENCODE 3 (ENCFF969BZA) Regulation encTfChipPkENCFF509YFF HepG2 HNRNPUL1 Transcription Factor ChIP-seq Peaks of HNRNPUL1 in HepG2 from ENCODE 3 (ENCFF509YFF) Regulation encTfChipPkENCFF890KTX HepG2 HNRNPLL Transcription Factor ChIP-seq Peaks of HNRNPLL in HepG2 from ENCODE 3 (ENCFF890KTX) Regulation encTfChipPkENCFF039CUI HepG2 HNRNPL Transcription Factor ChIP-seq Peaks of HNRNPL in HepG2 from ENCODE 3 (ENCFF039CUI) Regulation encTfChipPkENCFF828KXG HepG2 HNRNPK Transcription Factor ChIP-seq Peaks of HNRNPK in HepG2 from ENCODE 3 (ENCFF828KXG) Regulation encTfChipPkENCFF046NUR HepG2 HNRNPH1 Transcription Factor ChIP-seq Peaks of HNRNPH1 in HepG2 from ENCODE 3 (ENCFF046NUR) Regulation encTfChipPkENCFF086CTA HepG2 HNF4G Transcription Factor ChIP-seq Peaks of HNF4G in HepG2 from ENCODE 3 (ENCFF086CTA) Regulation encTfChipPkENCFF072CXB HepG2 HNF4A Transcription Factor ChIP-seq Peaks of HNF4A in HepG2 from ENCODE 3 (ENCFF072CXB) Regulation encTfChipPkENCFF800QTO HepG2 HNF1A Transcription Factor ChIP-seq Peaks of HNF1A in HepG2 from ENCODE 3 (ENCFF800QTO) Regulation encTfChipPkENCFF109EXK HepG2 HDAC6 Transcription Factor ChIP-seq Peaks of HDAC6 in HepG2 from ENCODE 3 (ENCFF109EXK) Regulation encTfChipPkENCFF182XZZ HepG2 HDAC2 2 Transcription Factor ChIP-seq Peaks of HDAC2 in HepG2 from ENCODE 3 (ENCFF182XZZ) Regulation encTfChipPkENCFF589GSN HepG2 HDAC2 1 Transcription Factor ChIP-seq Peaks of HDAC2 in HepG2 from ENCODE 3 (ENCFF589GSN) Regulation encTfChipPkENCFF069KPS HepG2 HDAC1 Transcription Factor ChIP-seq Peaks of HDAC1 in HepG2 from ENCODE 3 (ENCFF069KPS) Regulation encTfChipPkENCFF485SRU HepG2 HCFC1 Transcription Factor ChIP-seq Peaks of HCFC1 in HepG2 from ENCODE 3 (ENCFF485SRU) Regulation encTfChipPkENCFF097OXR HepG2 GATA4 Transcription Factor ChIP-seq Peaks of GATA4 in HepG2 from ENCODE 3 (ENCFF097OXR) Regulation encTfChipPkENCFF054HJA HepG2 GABPA Transcription Factor ChIP-seq Peaks of GABPA in HepG2 from ENCODE 3 (ENCFF054HJA) Regulation encTfChipPkENCFF216YZI HepG2 FUS Transcription Factor ChIP-seq Peaks of FUS in HepG2 from ENCODE 3 (ENCFF216YZI) Regulation encTfChipPkENCFF029UJC HepG2 FOXP1 Transcription Factor ChIP-seq Peaks of FOXP1 in HepG2 from ENCODE 3 (ENCFF029UJC) Regulation encTfChipPkENCFF315CHX HepG2 FOXK2 Transcription Factor ChIP-seq Peaks of FOXK2 in HepG2 from ENCODE 3 (ENCFF315CHX) Regulation encTfChipPkENCFF259BJR HepG2 FOXA2 2 Transcription Factor ChIP-seq Peaks of FOXA2 in HepG2 from ENCODE 3 (ENCFF259BJR) Regulation encTfChipPkENCFF184NAC HepG2 FOXA2 1 Transcription Factor ChIP-seq Peaks of FOXA2 in HepG2 from ENCODE 3 (ENCFF184NAC) Regulation encTfChipPkENCFF367TQC HepG2 FOXA1 3 Transcription Factor ChIP-seq Peaks of FOXA1 in HepG2 from ENCODE 3 (ENCFF367TQC) Regulation encTfChipPkENCFF872MGU HepG2 FOXA1 2 Transcription Factor ChIP-seq Peaks of FOXA1 in HepG2 from ENCODE 3 (ENCFF872MGU) Regulation encTfChipPkENCFF152BOT HepG2 FOXA1 1 Transcription Factor ChIP-seq Peaks of FOXA1 in HepG2 from ENCODE 3 (ENCFF152BOT) Regulation encTfChipPkENCFF054ESU HepG2 FOSL2 Transcription Factor ChIP-seq Peaks of FOSL2 in HepG2 from ENCODE 3 (ENCFF054ESU) Regulation encTfChipPkENCFF031LBW HepG2 FIP1L1 Transcription Factor ChIP-seq Peaks of FIP1L1 in HepG2 from ENCODE 3 (ENCFF031LBW) Regulation encTfChipPkENCFF504QZJ HepG2 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in HepG2 from ENCODE 3 (ENCFF504QZJ) Regulation encTfChipPkENCFF710CRT HepG2 ETV4 Transcription Factor ChIP-seq Peaks of ETV4 in HepG2 from ENCODE 3 (ENCFF710CRT) Regulation encTfChipPkENCFF128TUP HepG2 ETS1 Transcription Factor ChIP-seq Peaks of ETS1 in HepG2 from ENCODE 3 (ENCFF128TUP) Regulation encTfChipPkENCFF674QCU HepG2 EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in HepG2 from ENCODE 3 (ENCFF674QCU) Regulation encTfChipPkENCFF806JJS HepG2 EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in HepG2 from ENCODE 3 (ENCFF806JJS) Regulation encTfChipPkENCFF840RWO HepG2 ELF1 Transcription Factor ChIP-seq Peaks of ELF1 in HepG2 from ENCODE 3 (ENCFF840RWO) Regulation encTfChipPkENCFF413RQL HepG2 EHMT2 Transcription Factor ChIP-seq Peaks of EHMT2 in HepG2 from ENCODE 3 (ENCFF413RQL) Regulation encTfChipPkENCFF543WTP HepG2 CTCF Transcription Factor ChIP-seq Peaks of CTCF in HepG2 from ENCODE 3 (ENCFF543WTP) Regulation encTfChipPkENCFF290UGF HepG2 CREM Transcription Factor ChIP-seq Peaks of CREM in HepG2 from ENCODE 3 (ENCFF290UGF) Regulation encTfChipPkENCFF550TXR HepG2 CREB1 Transcription Factor ChIP-seq Peaks of CREB1 in HepG2 from ENCODE 3 (ENCFF550TXR) Regulation encTfChipPkENCFF148ABR HepG2 CHD4 Transcription Factor ChIP-seq Peaks of CHD4 in HepG2 from ENCODE 3 (ENCFF148ABR) Regulation encTfChipPkENCFF915ZYE HepG2 CEBPB 2 Transcription Factor ChIP-seq Peaks of CEBPB in HepG2 from ENCODE 3 (ENCFF915ZYE) Regulation encTfChipPkENCFF862DXR HepG2 CEBPB 1 Transcription Factor ChIP-seq Peaks of CEBPB in HepG2 from ENCODE 3 (ENCFF862DXR) Regulation encTfChipPkENCFF039LHY HepG2 CCAR2 Transcription Factor ChIP-seq Peaks of CCAR2 in HepG2 from ENCODE 3 (ENCFF039LHY) Regulation encTfChipPkENCFF501QII HepG2 CBX2 Transcription Factor ChIP-seq Peaks of CBX2 in HepG2 from ENCODE 3 (ENCFF501QII) Regulation encTfChipPkENCFF736GHL HepG2 BRD4 Transcription Factor ChIP-seq Peaks of BRD4 in HepG2 from ENCODE 3 (ENCFF736GHL) Regulation encTfChipPkENCFF897ETK HepG2 BRCA1 Transcription Factor ChIP-seq Peaks of BRCA1 in HepG2 from ENCODE 3 (ENCFF897ETK) Regulation encTfChipPkENCFF361YXC HepG2 BHLHE40 2 Transcription Factor ChIP-seq Peaks of BHLHE40 in HepG2 from ENCODE 3 (ENCFF361YXC) Regulation encTfChipPkENCFF863ATX HepG2 BHLHE40 1 Transcription Factor ChIP-seq Peaks of BHLHE40 in HepG2 from ENCODE 3 (ENCFF863ATX) Regulation encTfChipPkENCFF906FVB HepG2 ATM Transcription Factor ChIP-seq Peaks of ATM in HepG2 from ENCODE 3 (ENCFF906FVB) Regulation encTfChipPkENCFF498YGH HepG2 ATF7 Transcription Factor ChIP-seq Peaks of ATF7 in HepG2 from ENCODE 3 (ENCFF498YGH) Regulation encTfChipPkENCFF137OEY HepG2 ATF3 Transcription Factor ChIP-seq Peaks of ATF3 in HepG2 from ENCODE 3 (ENCFF137OEY) Regulation encTfChipPkENCFF089BQU HepG2 ATF2 Transcription Factor ChIP-seq Peaks of ATF2 in HepG2 from ENCODE 3 (ENCFF089BQU) Regulation encTfChipPkENCFF638IUM HepG2 ASH2L Transcription Factor ChIP-seq Peaks of ASH2L in HepG2 from ENCODE 3 (ENCFF638IUM) Regulation encTfChipPkENCFF616WXJ HepG2 ARNT Transcription Factor ChIP-seq Peaks of ARNT in HepG2 from ENCODE 3 (ENCFF616WXJ) Regulation encTfChipPkENCFF247GXE HepG2 ARID3A Transcription Factor ChIP-seq Peaks of ARID3A in HepG2 from ENCODE 3 (ENCFF247GXE) Regulation encTfChipPkENCFF465FII HepG2 AGO2 Transcription Factor ChIP-seq Peaks of AGO2 in HepG2 from ENCODE 3 (ENCFF465FII) Regulation encTfChipPkENCFF627BHP HepG2 AGO1 Transcription Factor ChIP-seq Peaks of AGO1 in HepG2 from ENCODE 3 (ENCFF627BHP) Regulation encTfChipPkENCFF267DZF HeLa-S3 ZHX1 Transcription Factor ChIP-seq Peaks of ZHX1 in HeLa-S3 from ENCODE 3 (ENCFF267DZF) Regulation encTfChipPkENCFF834LQR HeLa-S3 UBTF Transcription Factor ChIP-seq Peaks of UBTF in HeLa-S3 from ENCODE 3 (ENCFF834LQR) Regulation encTfChipPkENCFF302RQH HeLa-S3 TBP Transcription Factor ChIP-seq Peaks of TBP in HeLa-S3 from ENCODE 3 (ENCFF302RQH) Regulation encTfChipPkENCFF044DFE HeLa-S3 SUPT20H Transcription Factor ChIP-seq Peaks of SUPT20H in HeLa-S3 from ENCODE 3 (ENCFF044DFE) Regulation encTfChipPkENCFF785YII HeLa-S3 SREBF2 Transcription Factor ChIP-seq Peaks of SREBF2 in HeLa-S3 from ENCODE 3 (ENCFF785YII) Regulation encTfChipPkENCFF208NUB HeLa-S3 REST Transcription Factor ChIP-seq Peaks of REST in HeLa-S3 from ENCODE 3 (ENCFF208NUB) Regulation encTfChipPkENCFF246QVY HeLa-S3 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in HeLa-S3 from ENCODE 3 (ENCFF246QVY) Regulation encTfChipPkENCFF305KIK HeLa-S3 NFE2L2 Transcription Factor ChIP-seq Peaks of NFE2L2 in HeLa-S3 from ENCODE 3 (ENCFF305KIK) Regulation encTfChipPkENCFF328IZQ HeLa-S3 MAFK Transcription Factor ChIP-seq Peaks of MAFK in HeLa-S3 from ENCODE 3 (ENCFF328IZQ) Regulation encTfChipPkENCFF672LKL HeLa-S3 MAFF Transcription Factor ChIP-seq Peaks of MAFF in HeLa-S3 from ENCODE 3 (ENCFF672LKL) Regulation encTfChipPkENCFF091UDB HeLa-S3 GABPA Transcription Factor ChIP-seq Peaks of GABPA in HeLa-S3 from ENCODE 3 (ENCFF091UDB) Regulation encTfChipPkENCFF260KLJ HeLa-S3 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in HeLa-S3 from ENCODE 3 (ENCFF260KLJ) Regulation encTfChipPkENCFF797QGP HL-60 SPI1 Transcription Factor ChIP-seq Peaks of SPI1 in HL-60 from ENCODE 3 (ENCFF797QGP) Regulation encTfChipPkENCFF839LPE HL-60 REST Transcription Factor ChIP-seq Peaks of REST in HL-60 from ENCODE 3 (ENCFF839LPE) Regulation encTfChipPkENCFF564YAP HL-60 GABPA Transcription Factor ChIP-seq Peaks of GABPA in HL-60 from ENCODE 3 (ENCFF564YAP) Regulation encTfChipPkENCFF152JZK HL-60 CTCF Transcription Factor ChIP-seq Peaks of CTCF in HL-60 from ENCODE 3 (ENCFF152JZK) Regulation encTfChipPkENCFF750KVF HFF-Myc CTCF Transcription Factor ChIP-seq Peaks of CTCF in HFF-Myc from ENCODE 3 (ENCFF750KVF) Regulation encTfChipPkENCFF817UEX HEK293T ZNF384 Transcription Factor ChIP-seq Peaks of ZNF384 in HEK293T from ENCODE 3 (ENCFF817UEX) Regulation encTfChipPkENCFF829NNC HEK293T ZFX Transcription Factor ChIP-seq Peaks of ZFX in HEK293T from ENCODE 3 (ENCFF829NNC) Regulation encTfChipPkENCFF708RSP HEK293T SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in HEK293T from ENCODE 3 (ENCFF708RSP) Regulation encTfChipPkENCFF532KPP HEK293T SP1 Transcription Factor ChIP-seq Peaks of SP1 in HEK293T from ENCODE 3 (ENCFF532KPP) Regulation encTfChipPkENCFF234WZT HEK293T PKNOX1 Transcription Factor ChIP-seq Peaks of PKNOX1 in HEK293T from ENCODE 3 (ENCFF234WZT) Regulation encTfChipPkENCFF421INQ HEK293T NFRKB Transcription Factor ChIP-seq Peaks of NFRKB in HEK293T from ENCODE 3 (ENCFF421INQ) Regulation encTfChipPkENCFF939UTN HEK293T LEF1 Transcription Factor ChIP-seq Peaks of LEF1 in HEK293T from ENCODE 3 (ENCFF939UTN) Regulation encTfChipPkENCFF156RLT HEK293T L3MBTL2 Transcription Factor ChIP-seq Peaks of L3MBTL2 in HEK293T from ENCODE 3 (ENCFF156RLT) Regulation encTfChipPkENCFF685TME HEK293T FOXM1 Transcription Factor ChIP-seq Peaks of FOXM1 in HEK293T from ENCODE 3 (ENCFF685TME) Regulation encTfChipPkENCFF959TZW HEK293T FOXK2 Transcription Factor ChIP-seq Peaks of FOXK2 in HEK293T from ENCODE 3 (ENCFF959TZW) Regulation encTfChipPkENCFF514ZNN HEK293T FOXA1 Transcription Factor ChIP-seq Peaks of FOXA1 in HEK293T from ENCODE 3 (ENCFF514ZNN) Regulation encTfChipPkENCFF919JTO HEK293T ELF4 Transcription Factor ChIP-seq Peaks of ELF4 in HEK293T from ENCODE 3 (ENCFF919JTO) Regulation encTfChipPkENCFF867WWZ HEK293T CTBP1 Transcription Factor ChIP-seq Peaks of CTBP1 in HEK293T from ENCODE 3 (ENCFF867WWZ) Regulation encTfChipPkENCFF104NYV HEK293T BHLHE40 Transcription Factor ChIP-seq Peaks of BHLHE40 in HEK293T from ENCODE 3 (ENCFF104NYV) Regulation encTfChipPkENCFF694XWV HEK293T ARNT Transcription Factor ChIP-seq Peaks of ARNT in HEK293T from ENCODE 3 (ENCFF694XWV) Regulation encTfChipPkENCFF827SZZ HEK293 ZNF263 Transcription Factor ChIP-seq Peaks of ZNF263 in HEK293 from ENCODE 3 (ENCFF827SZZ) Regulation encTfChipPkENCFF860DHS HEK293 TRIM28 Transcription Factor ChIP-seq Peaks of TRIM28 in HEK293 from ENCODE 3 (ENCFF860DHS) Regulation encTfChipPkENCFF215SIC HCT116 ZFX Transcription Factor ChIP-seq Peaks of ZFX in HCT116 from ENCODE 3 (ENCFF215SIC) Regulation encTfChipPkENCFF998KDQ HCT116 JUND Transcription Factor ChIP-seq Peaks of JUND in HCT116 from ENCODE 3 (ENCFF998KDQ) Regulation encTfChipPkENCFF926EZW HCT116 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in HCT116 from ENCODE 3 (ENCFF926EZW) Regulation encTfChipPkENCFF171SNH HCT116 CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in HCT116 from ENCODE 3 (ENCFF171SNH) Regulation encTfChipPkENCFF518MQA HCT116 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in HCT116 from ENCODE 3 (ENCFF518MQA) Regulation encTfChipPkENCFF549PGC HCT116 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in HCT116 from ENCODE 3 (ENCFF549PGC) Regulation encTfChipPkENCFF723LVE H54 CTCF Transcription Factor ChIP-seq Peaks of CTCF in H54 from ENCODE 3 (ENCFF723LVE) Regulation encTfChipPkENCFF933WSP H1-hESC ZNF143 Transcription Factor ChIP-seq Peaks of ZNF143 in H1-hESC from ENCODE 3 (ENCFF933WSP) Regulation encTfChipPkENCFF509GYP H1-hESC YY1 Transcription Factor ChIP-seq Peaks of YY1 in H1-hESC from ENCODE 3 (ENCFF509GYP) Regulation encTfChipPkENCFF710JBU H1-hESC USF2 Transcription Factor ChIP-seq Peaks of USF2 in H1-hESC from ENCODE 3 (ENCFF710JBU) Regulation encTfChipPkENCFF699HXL H1-hESC USF1 Transcription Factor ChIP-seq Peaks of USF1 in H1-hESC from ENCODE 3 (ENCFF699HXL) Regulation encTfChipPkENCFF740HPV H1-hESC TCF12 Transcription Factor ChIP-seq Peaks of TCF12 in H1-hESC from ENCODE 3 (ENCFF740HPV) Regulation encTfChipPkENCFF748YXF H1-hESC TBP Transcription Factor ChIP-seq Peaks of TBP in H1-hESC from ENCODE 3 (ENCFF748YXF) Regulation encTfChipPkENCFF243PSJ H1-hESC TAF7 Transcription Factor ChIP-seq Peaks of TAF7 in H1-hESC from ENCODE 3 (ENCFF243PSJ) Regulation encTfChipPkENCFF870SFJ H1-hESC TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in H1-hESC from ENCODE 3 (ENCFF870SFJ) Regulation encTfChipPkENCFF671SZQ H1-hESC SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in H1-hESC from ENCODE 3 (ENCFF671SZQ) Regulation encTfChipPkENCFF345IDL H1-hESC SRF Transcription Factor ChIP-seq Peaks of SRF in H1-hESC from ENCODE 3 (ENCFF345IDL) Regulation encTfChipPkENCFF500JFI H1-hESC SP1 Transcription Factor ChIP-seq Peaks of SP1 in H1-hESC from ENCODE 3 (ENCFF500JFI) Regulation encTfChipPkENCFF644BNN H1-hESC SIX5 Transcription Factor ChIP-seq Peaks of SIX5 in H1-hESC from ENCODE 3 (ENCFF644BNN) Regulation encTfChipPkENCFF539KSF H1-hESC SIRT6 Transcription Factor ChIP-seq Peaks of SIRT6 in H1-hESC from ENCODE 3 (ENCFF539KSF) Regulation encTfChipPkENCFF514BGQ H1-hESC SIN3A 2 Transcription Factor ChIP-seq Peaks of SIN3A in H1-hESC from ENCODE 3 (ENCFF514BGQ) Regulation encTfChipPkENCFF905VZD H1-hESC SIN3A 1 Transcription Factor ChIP-seq Peaks of SIN3A in H1-hESC from ENCODE 3 (ENCFF905VZD) Regulation encTfChipPkENCFF193TFR H1-hESC SAP30 Transcription Factor ChIP-seq Peaks of SAP30 in H1-hESC from ENCODE 3 (ENCFF193TFR) Regulation encTfChipPkENCFF430SIE H1-hESC RXRA Transcription Factor ChIP-seq Peaks of RXRA in H1-hESC from ENCODE 3 (ENCFF430SIE) Regulation encTfChipPkENCFF283MNG H1-hESC RNF2 Transcription Factor ChIP-seq Peaks of RNF2 in H1-hESC from ENCODE 3 (ENCFF283MNG) Regulation encTfChipPkENCFF062WBN H1-hESC RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in H1-hESC from ENCODE 3 (ENCFF062WBN) Regulation encTfChipPkENCFF403CAJ H1-hESC REST 2 Transcription Factor ChIP-seq Peaks of REST in H1-hESC from ENCODE 3 (ENCFF403CAJ) Regulation encTfChipPkENCFF779CWH H1-hESC REST 1 Transcription Factor ChIP-seq Peaks of REST in H1-hESC from ENCODE 3 (ENCFF779CWH) Regulation encTfChipPkENCFF607WCG H1-hESC RBBP5 Transcription Factor ChIP-seq Peaks of RBBP5 in H1-hESC from ENCODE 3 (ENCFF607WCG) Regulation encTfChipPkENCFF060IVS H1-hESC RAD21 2 Transcription Factor ChIP-seq Peaks of RAD21 in H1-hESC from ENCODE 3 (ENCFF060IVS) Regulation encTfChipPkENCFF255FRL H1-hESC RAD21 1 Transcription Factor ChIP-seq Peaks of RAD21 in H1-hESC from ENCODE 3 (ENCFF255FRL) Regulation encTfChipPkENCFF422HDN H1-hESC POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in H1-hESC from ENCODE 3 (ENCFF422HDN) Regulation encTfChipPkENCFF651QOL H1-hESC PHF8 Transcription Factor ChIP-seq Peaks of PHF8 in H1-hESC from ENCODE 3 (ENCFF651QOL) Regulation encTfChipPkENCFF407IVS H1-hESC NRF1 Transcription Factor ChIP-seq Peaks of NRF1 in H1-hESC from ENCODE 3 (ENCFF407IVS) Regulation encTfChipPkENCFF794GVQ H1-hESC NANOG Transcription Factor ChIP-seq Peaks of NANOG in H1-hESC from ENCODE 3 (ENCFF794GVQ) Regulation encTfChipPkENCFF392JJN H1-hESC MYC Transcription Factor ChIP-seq Peaks of MYC in H1-hESC from ENCODE 3 (ENCFF392JJN) Regulation encTfChipPkENCFF712RIS H1-hESC MAFK Transcription Factor ChIP-seq Peaks of MAFK in H1-hESC from ENCODE 3 (ENCFF712RIS) Regulation encTfChipPkENCFF342EEV H1-hESC KDM5A Transcription Factor ChIP-seq Peaks of KDM5A in H1-hESC from ENCODE 3 (ENCFF342EEV) Regulation encTfChipPkENCFF205WRX H1-hESC KDM4A Transcription Factor ChIP-seq Peaks of KDM4A in H1-hESC from ENCODE 3 (ENCFF205WRX) Regulation encTfChipPkENCFF562OAN H1-hESC KDM1A Transcription Factor ChIP-seq Peaks of KDM1A in H1-hESC from ENCODE 3 (ENCFF562OAN) Regulation encTfChipPkENCFF646IUA H1-hESC JUND 2 Transcription Factor ChIP-seq Peaks of JUND in H1-hESC from ENCODE 3 (ENCFF646IUA) Regulation encTfChipPkENCFF443HNU H1-hESC JUND 1 Transcription Factor ChIP-seq Peaks of JUND in H1-hESC from ENCODE 3 (ENCFF443HNU) Regulation encTfChipPkENCFF312GEN H1-hESC JUN Transcription Factor ChIP-seq Peaks of JUN in H1-hESC from ENCODE 3 (ENCFF312GEN) Regulation encTfChipPkENCFF129WNO H1-hESC HDAC6 Transcription Factor ChIP-seq Peaks of HDAC6 in H1-hESC from ENCODE 3 (ENCFF129WNO) Regulation encTfChipPkENCFF497YNJ H1-hESC HDAC2 2 Transcription Factor ChIP-seq Peaks of HDAC2 in H1-hESC from ENCODE 3 (ENCFF497YNJ) Regulation encTfChipPkENCFF009IVJ H1-hESC HDAC2 1 Transcription Factor ChIP-seq Peaks of HDAC2 in H1-hESC from ENCODE 3 (ENCFF009IVJ) Regulation encTfChipPkENCFF225GFQ H1-hESC GABPA Transcription Factor ChIP-seq Peaks of GABPA in H1-hESC from ENCODE 3 (ENCFF225GFQ) Regulation encTfChipPkENCFF063OKB H1-hESC FOSL1 Transcription Factor ChIP-seq Peaks of FOSL1 in H1-hESC from ENCODE 3 (ENCFF063OKB) Regulation encTfChipPkENCFF483HNU H1-hESC EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in H1-hESC from ENCODE 3 (ENCFF483HNU) Regulation encTfChipPkENCFF834UVX H1-hESC EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in H1-hESC from ENCODE 3 (ENCFF834UVX) Regulation encTfChipPkENCFF477ANT H1-hESC EGR1 Transcription Factor ChIP-seq Peaks of EGR1 in H1-hESC from ENCODE 3 (ENCFF477ANT) Regulation encTfChipPkENCFF821AQO H1-hESC CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in H1-hESC from ENCODE 3 (ENCFF821AQO) Regulation encTfChipPkENCFF368LWM H1-hESC CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in H1-hESC from ENCODE 3 (ENCFF368LWM) Regulation encTfChipPkENCFF658SXI H1-hESC CHD7 Transcription Factor ChIP-seq Peaks of CHD7 in H1-hESC from ENCODE 3 (ENCFF658SXI) Regulation encTfChipPkENCFF806HXY H1-hESC CHD1 2 Transcription Factor ChIP-seq Peaks of CHD1 in H1-hESC from ENCODE 3 (ENCFF806HXY) Regulation encTfChipPkENCFF549ODQ H1-hESC CHD1 1 Transcription Factor ChIP-seq Peaks of CHD1 in H1-hESC from ENCODE 3 (ENCFF549ODQ) Regulation encTfChipPkENCFF962YTC H1-hESC BRCA1 Transcription Factor ChIP-seq Peaks of BRCA1 in H1-hESC from ENCODE 3 (ENCFF962YTC) Regulation encTfChipPkENCFF087VWX H1-hESC BCL11A 2 Transcription Factor ChIP-seq Peaks of BCL11A in H1-hESC from ENCODE 3 (ENCFF087VWX) Regulation encTfChipPkENCFF533KIC H1-hESC BCL11A 1 Transcription Factor ChIP-seq Peaks of BCL11A in H1-hESC from ENCODE 3 (ENCFF533KIC) Regulation encTfChipPkENCFF851YHG H1-hESC BACH1 Transcription Factor ChIP-seq Peaks of BACH1 in H1-hESC from ENCODE 3 (ENCFF851YHG) Regulation encTfChipPkENCFF487GLV H1-hESC ATF3 Transcription Factor ChIP-seq Peaks of ATF3 in H1-hESC from ENCODE 3 (ENCFF487GLV) Regulation encTfChipPkENCFF777DCR H1-hESC ASH2L Transcription Factor ChIP-seq Peaks of ASH2L in H1-hESC from ENCODE 3 (ENCFF777DCR) Regulation encTfChipPkENCFF904USP GM23338 REST Transcription Factor ChIP-seq Peaks of REST in GM23338 from ENCODE 3 (ENCFF904USP) Regulation encTfChipPkENCFF621PFM GM23338 NANOG Transcription Factor ChIP-seq Peaks of NANOG in GM23338 from ENCODE 3 (ENCFF621PFM) Regulation encTfChipPkENCFF097WNJ GM23338 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in GM23338 from ENCODE 3 (ENCFF097WNJ) Regulation encTfChipPkENCFF511AZU GM23338 ETS1 Transcription Factor ChIP-seq Peaks of ETS1 in GM23338 from ENCODE 3 (ENCFF511AZU) Regulation encTfChipPkENCFF960XTR GM23338 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in GM23338 from ENCODE 3 (ENCFF960XTR) Regulation encTfChipPkENCFF322WKG GM23338 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in GM23338 from ENCODE 3 (ENCFF322WKG) Regulation encTfChipPkENCFF976SAN GM23248 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in GM23248 from ENCODE 3 (ENCFF976SAN) Regulation encTfChipPkENCFF419PTP GM20000 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM20000 from ENCODE 3 (ENCFF419PTP) Regulation encTfChipPkENCFF084FRB GM13977 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM13977 from ENCODE 3 (ENCFF084FRB) Regulation encTfChipPkENCFF072IHJ GM12892 YY1 Transcription Factor ChIP-seq Peaks of YY1 in GM12892 from ENCODE 3 (ENCFF072IHJ) Regulation encTfChipPkENCFF033PLJ GM12892 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in GM12892 from ENCODE 3 (ENCFF033PLJ) Regulation encTfChipPkENCFF403ZEO GM12892 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in GM12892 from ENCODE 3 (ENCFF403ZEO) Regulation encTfChipPkENCFF538VYU GM12891 YY1 Transcription Factor ChIP-seq Peaks of YY1 in GM12891 from ENCODE 3 (ENCFF538VYU) Regulation encTfChipPkENCFF471NIK GM12891 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in GM12891 from ENCODE 3 (ENCFF471NIK) Regulation encTfChipPkENCFF744AGB GM12891 SPI1 Transcription Factor ChIP-seq Peaks of SPI1 in GM12891 from ENCODE 3 (ENCFF744AGB) Regulation encTfChipPkENCFF113EFE GM12891 POU2F2 Transcription Factor ChIP-seq Peaks of POU2F2 in GM12891 from ENCODE 3 (ENCFF113EFE) Regulation encTfChipPkENCFF021HUZ GM12891 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in GM12891 from ENCODE 3 (ENCFF021HUZ) Regulation encTfChipPkENCFF987CQF GM12891 PAX5 Transcription Factor ChIP-seq Peaks of PAX5 in GM12891 from ENCODE 3 (ENCFF987CQF) Regulation encTfChipPkENCFF260NAX GM12878 ZZZ3 Transcription Factor ChIP-seq Peaks of ZZZ3 in GM12878 from ENCODE 3 (ENCFF260NAX) Regulation encTfChipPkENCFF214NJL GM12878 ZSCAN29 Transcription Factor ChIP-seq Peaks of ZSCAN29 in GM12878 from ENCODE 3 (ENCFF214NJL) Regulation encTfChipPkENCFF137BRA GM12878 ZNF687 Transcription Factor ChIP-seq Peaks of ZNF687 in GM12878 from ENCODE 3 (ENCFF137BRA) Regulation encTfChipPkENCFF615DTQ GM12878 ZNF592 Transcription Factor ChIP-seq Peaks of ZNF592 in GM12878 from ENCODE 3 (ENCFF615DTQ) Regulation encTfChipPkENCFF942MDT GM12878 ZNF384 Transcription Factor ChIP-seq Peaks of ZNF384 in GM12878 from ENCODE 3 (ENCFF942MDT) Regulation encTfChipPkENCFF200SLC GM12878 ZNF217 Transcription Factor ChIP-seq Peaks of ZNF217 in GM12878 from ENCODE 3 (ENCFF200SLC) Regulation encTfChipPkENCFF676BIG GM12878 ZNF207 Transcription Factor ChIP-seq Peaks of ZNF207 in GM12878 from ENCODE 3 (ENCFF676BIG) Regulation encTfChipPkENCFF193POQ GM12878 ZNF143 2 Transcription Factor ChIP-seq Peaks of ZNF143 in GM12878 from ENCODE 3 (ENCFF193POQ) Regulation encTfChipPkENCFF153TQR GM12878 ZNF143 1 Transcription Factor ChIP-seq Peaks of ZNF143 in GM12878 from ENCODE 3 (ENCFF153TQR) Regulation encTfChipPkENCFF084IUW GM12878 ZBTB40 Transcription Factor ChIP-seq Peaks of ZBTB40 in GM12878 from ENCODE 3 (ENCFF084IUW) Regulation encTfChipPkENCFF475DID GM12878 ZBTB33 2 Transcription Factor ChIP-seq Peaks of ZBTB33 in GM12878 from ENCODE 3 (ENCFF475DID) Regulation encTfChipPkENCFF773OQL GM12878 ZBTB33 1 Transcription Factor ChIP-seq Peaks of ZBTB33 in GM12878 from ENCODE 3 (ENCFF773OQL) Regulation encTfChipPkENCFF630FLK GM12878 ZBED1 Transcription Factor ChIP-seq Peaks of ZBED1 in GM12878 from ENCODE 3 (ENCFF630FLK) Regulation encTfChipPkENCFF752IXD GM12878 YY1 2 Transcription Factor ChIP-seq Peaks of YY1 in GM12878 from ENCODE 3 (ENCFF752IXD) Regulation encTfChipPkENCFF223MUF GM12878 YY1 1 Transcription Factor ChIP-seq Peaks of YY1 in GM12878 from ENCODE 3 (ENCFF223MUF) Regulation encTfChipPkENCFF514DDI GM12878 WRNIP1 Transcription Factor ChIP-seq Peaks of WRNIP1 in GM12878 from ENCODE 3 (ENCFF514DDI) Regulation encTfChipPkENCFF514SWA GM12878 USF2 Transcription Factor ChIP-seq Peaks of USF2 in GM12878 from ENCODE 3 (ENCFF514SWA) Regulation encTfChipPkENCFF295ZLM GM12878 UBTF Transcription Factor ChIP-seq Peaks of UBTF in GM12878 from ENCODE 3 (ENCFF295ZLM) Regulation encTfChipPkENCFF552WAH GM12878 TRIM22 2 Transcription Factor ChIP-seq Peaks of TRIM22 in GM12878 from ENCODE 3 (ENCFF552WAH) Regulation encTfChipPkENCFF830TFU GM12878 TRIM22 1 Transcription Factor ChIP-seq Peaks of TRIM22 in GM12878 from ENCODE 3 (ENCFF830TFU) Regulation encTfChipPkENCFF152RNE GM12878 TCF7 Transcription Factor ChIP-seq Peaks of TCF7 in GM12878 from ENCODE 3 (ENCFF152RNE) Regulation encTfChipPkENCFF897RYA GM12878 TCF12 2 Transcription Factor ChIP-seq Peaks of TCF12 in GM12878 from ENCODE 3 (ENCFF897RYA) Regulation encTfChipPkENCFF768VSH GM12878 TCF12 1 Transcription Factor ChIP-seq Peaks of TCF12 in GM12878 from ENCODE 3 (ENCFF768VSH) Regulation encTfChipPkENCFF971VHK GM12878 TBX21 Transcription Factor ChIP-seq Peaks of TBX21 in GM12878 from ENCODE 3 (ENCFF971VHK) Regulation encTfChipPkENCFF896UZB GM12878 TBP Transcription Factor ChIP-seq Peaks of TBP in GM12878 from ENCODE 3 (ENCFF896UZB) Regulation encTfChipPkENCFF392JWA GM12878 TBL1XR1 Transcription Factor ChIP-seq Peaks of TBL1XR1 in GM12878 from ENCODE 3 (ENCFF392JWA) Regulation encTfChipPkENCFF540AAP GM12878 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in GM12878 from ENCODE 3 (ENCFF540AAP) Regulation encTfChipPkENCFF547FUI GM12878 SUZ12 Transcription Factor ChIP-seq Peaks of SUZ12 in GM12878 from ENCODE 3 (ENCFF547FUI) Regulation encTfChipPkENCFF069YVD GM12878 SUPT20H Transcription Factor ChIP-seq Peaks of SUPT20H in GM12878 from ENCODE 3 (ENCFF069YVD) Regulation encTfChipPkENCFF383YEA GM12878 STAT5A Transcription Factor ChIP-seq Peaks of STAT5A in GM12878 from ENCODE 3 (ENCFF383YEA) Regulation encTfChipPkENCFF923CHO GM12878 STAT3 Transcription Factor ChIP-seq Peaks of STAT3 in GM12878 from ENCODE 3 (ENCFF923CHO) Regulation encTfChipPkENCFF323QQU GM12878 STAT1 Transcription Factor ChIP-seq Peaks of STAT1 in GM12878 from ENCODE 3 (ENCFF323QQU) Regulation encTfChipPkENCFF182IFE GM12878 SRF 3 Transcription Factor ChIP-seq Peaks of SRF in GM12878 from ENCODE 3 (ENCFF182IFE) Regulation encTfChipPkENCFF829SEJ GM12878 SRF 2 Transcription Factor ChIP-seq Peaks of SRF in GM12878 from ENCODE 3 (ENCFF829SEJ) Regulation encTfChipPkENCFF766WWB GM12878 SRF 1 Transcription Factor ChIP-seq Peaks of SRF in GM12878 from ENCODE 3 (ENCFF766WWB) Regulation encTfChipPkENCFF572RPI GM12878 SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in GM12878 from ENCODE 3 (ENCFF572RPI) Regulation encTfChipPkENCFF052STI GM12878 SMARCA5 Transcription Factor ChIP-seq Peaks of SMARCA5 in GM12878 from ENCODE 3 (ENCFF052STI) Regulation encTfChipPkENCFF855SJG GM12878 SMAD5 Transcription Factor ChIP-seq Peaks of SMAD5 in GM12878 from ENCODE 3 (ENCFF855SJG) Regulation encTfChipPkENCFF987PGY GM12878 SMAD1 Transcription Factor ChIP-seq Peaks of SMAD1 in GM12878 from ENCODE 3 (ENCFF987PGY) Regulation encTfChipPkENCFF903KEI GM12878 SKIL Transcription Factor ChIP-seq Peaks of SKIL in GM12878 from ENCODE 3 (ENCFF903KEI) Regulation encTfChipPkENCFF864TFH GM12878 SIX5 Transcription Factor ChIP-seq Peaks of SIX5 in GM12878 from ENCODE 3 (ENCFF864TFH) Regulation encTfChipPkENCFF050CYK GM12878 SIN3A Transcription Factor ChIP-seq Peaks of SIN3A in GM12878 from ENCODE 3 (ENCFF050CYK) Regulation encTfChipPkENCFF313BDA GM12878 RXRA Transcription Factor ChIP-seq Peaks of RXRA in GM12878 from ENCODE 3 (ENCFF313BDA) Regulation encTfChipPkENCFF677QUK GM12878 RUNX3 Transcription Factor ChIP-seq Peaks of RUNX3 in GM12878 from ENCODE 3 (ENCFF677QUK) Regulation encTfChipPkENCFF259LNG GM12878 RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in GM12878 from ENCODE 3 (ENCFF259LNG) Regulation encTfChipPkENCFF313CII GM12878 REST Transcription Factor ChIP-seq Peaks of REST in GM12878 from ENCODE 3 (ENCFF313CII) Regulation encTfChipPkENCFF105YDI GM12878 RELB Transcription Factor ChIP-seq Peaks of RELB in GM12878 from ENCODE 3 (ENCFF105YDI) Regulation encTfChipPkENCFF470ZMK GM12878 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in GM12878 from ENCODE 3 (ENCFF470ZMK) Regulation encTfChipPkENCFF687SSY GM12878 RBBP5 Transcription Factor ChIP-seq Peaks of RBBP5 in GM12878 from ENCODE 3 (ENCFF687SSY) Regulation encTfChipPkENCFF034OSV GM12878 RB1 Transcription Factor ChIP-seq Peaks of RB1 in GM12878 from ENCODE 3 (ENCFF034OSV) Regulation encTfChipPkENCFF996NBR GM12878 RAD51 Transcription Factor ChIP-seq Peaks of RAD51 in GM12878 from ENCODE 3 (ENCFF996NBR) Regulation encTfChipPkENCFF654EGO GM12878 RAD21 Transcription Factor ChIP-seq Peaks of RAD21 in GM12878 from ENCODE 3 (ENCFF654EGO) Regulation encTfChipPkENCFF455ZLJ GM12878 POLR2A Transcription Factor ChIP-seq Peaks of POLR2A in GM12878 from ENCODE 3 (ENCFF455ZLJ) Regulation encTfChipPkENCFF335ADU GM12878 PKNOX1 Transcription Factor ChIP-seq Peaks of PKNOX1 in GM12878 from ENCODE 3 (ENCFF335ADU) Regulation encTfChipPkENCFF926LHG GM12878 PBX3 Transcription Factor ChIP-seq Peaks of PBX3 in GM12878 from ENCODE 3 (ENCFF926LHG) Regulation encTfChipPkENCFF992JWY GM12878 PAX8 Transcription Factor ChIP-seq Peaks of PAX8 in GM12878 from ENCODE 3 (ENCFF992JWY) Regulation encTfChipPkENCFF946SAG GM12878 PAX5 Transcription Factor ChIP-seq Peaks of PAX5 in GM12878 from ENCODE 3 (ENCFF946SAG) Regulation encTfChipPkENCFF652BRY GM12878 NRF1 Transcription Factor ChIP-seq Peaks of NRF1 in GM12878 from ENCODE 3 (ENCFF652BRY) Regulation encTfChipPkENCFF434HVY GM12878 NR2C2 Transcription Factor ChIP-seq Peaks of NR2C2 in GM12878 from ENCODE 3 (ENCFF434HVY) Regulation encTfChipPkENCFF510NDO GM12878 NFYB Transcription Factor ChIP-seq Peaks of NFYB in GM12878 from ENCODE 3 (ENCFF510NDO) Regulation encTfChipPkENCFF278GJK GM12878 NFYA Transcription Factor ChIP-seq Peaks of NFYA in GM12878 from ENCODE 3 (ENCFF278GJK) Regulation encTfChipPkENCFF860IXB GM12878 NFXL1 Transcription Factor ChIP-seq Peaks of NFXL1 in GM12878 from ENCODE 3 (ENCFF860IXB) Regulation encTfChipPkENCFF480WDX GM12878 NFIC Transcription Factor ChIP-seq Peaks of NFIC in GM12878 from ENCODE 3 (ENCFF480WDX) Regulation encTfChipPkENCFF743UMZ GM12878 NFE2 Transcription Factor ChIP-seq Peaks of NFE2 in GM12878 from ENCODE 3 (ENCFF743UMZ) Regulation encTfChipPkENCFF704PDA GM12878 NFATC3 Transcription Factor ChIP-seq Peaks of NFATC3 in GM12878 from ENCODE 3 (ENCFF704PDA) Regulation encTfChipPkENCFF138ZBJ GM12878 NFATC1 Transcription Factor ChIP-seq Peaks of NFATC1 in GM12878 from ENCODE 3 (ENCFF138ZBJ) Regulation encTfChipPkENCFF811VEN GM12878 NBN Transcription Factor ChIP-seq Peaks of NBN in GM12878 from ENCODE 3 (ENCFF811VEN) Regulation encTfChipPkENCFF402TSJ GM12878 MYB Transcription Factor ChIP-seq Peaks of MYB in GM12878 from ENCODE 3 (ENCFF402TSJ) Regulation encTfChipPkENCFF199HGX GM12878 MXI1 Transcription Factor ChIP-seq Peaks of MXI1 in GM12878 from ENCODE 3 (ENCFF199HGX) Regulation encTfChipPkENCFF661FMB GM12878 MTA3 Transcription Factor ChIP-seq Peaks of MTA3 in GM12878 from ENCODE 3 (ENCFF661FMB) Regulation encTfChipPkENCFF587POH GM12878 MTA2 Transcription Factor ChIP-seq Peaks of MTA2 in GM12878 from ENCODE 3 (ENCFF587POH) Regulation encTfChipPkENCFF125MEN GM12878 MLLT1 Transcription Factor ChIP-seq Peaks of MLLT1 in GM12878 from ENCODE 3 (ENCFF125MEN) Regulation encTfChipPkENCFF830BRO GM12878 MEF2C Transcription Factor ChIP-seq Peaks of MEF2C in GM12878 from ENCODE 3 (ENCFF830BRO) Regulation encTfChipPkENCFF623FAW GM12878 MEF2B Transcription Factor ChIP-seq Peaks of MEF2B in GM12878 from ENCODE 3 (ENCFF623FAW) Regulation encTfChipPkENCFF958GXF GM12878 MEF2A Transcription Factor ChIP-seq Peaks of MEF2A in GM12878 from ENCODE 3 (ENCFF958GXF) Regulation encTfChipPkENCFF270NAL GM12878 MAX Transcription Factor ChIP-seq Peaks of MAX in GM12878 from ENCODE 3 (ENCFF270NAL) Regulation encTfChipPkENCFF186AWV GM12878 MAFK Transcription Factor ChIP-seq Peaks of MAFK in GM12878 from ENCODE 3 (ENCFF186AWV) Regulation encTfChipPkENCFF417WPC GM12878 KLF5 Transcription Factor ChIP-seq Peaks of KLF5 in GM12878 from ENCODE 3 (ENCFF417WPC) Regulation encTfChipPkENCFF799KZP GM12878 KDM1A Transcription Factor ChIP-seq Peaks of KDM1A in GM12878 from ENCODE 3 (ENCFF799KZP) Regulation encTfChipPkENCFF710ROZ GM12878 KAT2A Transcription Factor ChIP-seq Peaks of KAT2A in GM12878 from ENCODE 3 (ENCFF710ROZ) Regulation encTfChipPkENCFF873DJD GM12878 JUND Transcription Factor ChIP-seq Peaks of JUND in GM12878 from ENCODE 3 (ENCFF873DJD) Regulation encTfChipPkENCFF478XNA GM12878 JUNB Transcription Factor ChIP-seq Peaks of JUNB in GM12878 from ENCODE 3 (ENCFF478XNA) Regulation encTfChipPkENCFF843HDK GM12878 IRF5 Transcription Factor ChIP-seq Peaks of IRF5 in GM12878 from ENCODE 3 (ENCFF843HDK) Regulation encTfChipPkENCFF720YMW GM12878 IRF4 Transcription Factor ChIP-seq Peaks of IRF4 in GM12878 from ENCODE 3 (ENCFF720YMW) Regulation encTfChipPkENCFF719MXF GM12878 IRF3 2 Transcription Factor ChIP-seq Peaks of IRF3 in GM12878 from ENCODE 3 (ENCFF719MXF) Regulation encTfChipPkENCFF604AZX GM12878 IRF3 1 Transcription Factor ChIP-seq Peaks of IRF3 in GM12878 from ENCODE 3 (ENCFF604AZX) Regulation encTfChipPkENCFF088OLI GM12878 IKZF2 2 Transcription Factor ChIP-seq Peaks of IKZF2 in GM12878 from ENCODE 3 (ENCFF088OLI) Regulation encTfChipPkENCFF526WVH GM12878 IKZF2 1 Transcription Factor ChIP-seq Peaks of IKZF2 in GM12878 from ENCODE 3 (ENCFF526WVH) Regulation encTfChipPkENCFF018NNF GM12878 IKZF1 3 Transcription Factor ChIP-seq Peaks of IKZF1 in GM12878 from ENCODE 3 (ENCFF018NNF) Regulation encTfChipPkENCFF968NOG GM12878 IKZF1 2 Transcription Factor ChIP-seq Peaks of IKZF1 in GM12878 from ENCODE 3 (ENCFF968NOG) Regulation encTfChipPkENCFF197ABX GM12878 IKZF1 1 Transcription Factor ChIP-seq Peaks of IKZF1 in GM12878 from ENCODE 3 (ENCFF197ABX) Regulation encTfChipPkENCFF603BID GM12878 HSF1 Transcription Factor ChIP-seq Peaks of HSF1 in GM12878 from ENCODE 3 (ENCFF603BID) Regulation encTfChipPkENCFF248JAL GM12878 HDAC6 Transcription Factor ChIP-seq Peaks of HDAC6 in GM12878 from ENCODE 3 (ENCFF248JAL) Regulation encTfChipPkENCFF299UPZ GM12878 HDAC2 Transcription Factor ChIP-seq Peaks of HDAC2 in GM12878 from ENCODE 3 (ENCFF299UPZ) Regulation encTfChipPkENCFF722QBB GM12878 HCFC1 Transcription Factor ChIP-seq Peaks of HCFC1 in GM12878 from ENCODE 3 (ENCFF722QBB) Regulation encTfChipPkENCFF298AIX GM12878 GATAD2B Transcription Factor ChIP-seq Peaks of GATAD2B in GM12878 from ENCODE 3 (ENCFF298AIX) Regulation encTfChipPkENCFF946ACA GM12878 GABPA Transcription Factor ChIP-seq Peaks of GABPA in GM12878 from ENCODE 3 (ENCFF946ACA) Regulation encTfChipPkENCFF990MTR GM12878 FOXK2 Transcription Factor ChIP-seq Peaks of FOXK2 in GM12878 from ENCODE 3 (ENCFF990MTR) Regulation encTfChipPkENCFF615NYO GM12878 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in GM12878 from ENCODE 3 (ENCFF615NYO) Regulation encTfChipPkENCFF745ANU GM12878 ETV6 2 Transcription Factor ChIP-seq Peaks of ETV6 in GM12878 from ENCODE 3 (ENCFF745ANU) Regulation encTfChipPkENCFF116AMK GM12878 ETV6 1 Transcription Factor ChIP-seq Peaks of ETV6 in GM12878 from ENCODE 3 (ENCFF116AMK) Regulation encTfChipPkENCFF980VOD GM12878 ETS1 Transcription Factor ChIP-seq Peaks of ETS1 in GM12878 from ENCODE 3 (ENCFF980VOD) Regulation encTfChipPkENCFF722LJP GM12878 ESRRA Transcription Factor ChIP-seq Peaks of ESRRA in GM12878 from ENCODE 3 (ENCFF722LJP) Regulation encTfChipPkENCFF510FUM GM12878 EP300 3 Transcription Factor ChIP-seq Peaks of EP300 in GM12878 from ENCODE 3 (ENCFF510FUM) Regulation encTfChipPkENCFF080HJX GM12878 EP300 2 Transcription Factor ChIP-seq Peaks of EP300 in GM12878 from ENCODE 3 (ENCFF080HJX) Regulation encTfChipPkENCFF865UDD GM12878 EP300 1 Transcription Factor ChIP-seq Peaks of EP300 in GM12878 from ENCODE 3 (ENCFF865UDD) Regulation encTfChipPkENCFF432AQP GM12878 ELK1 Transcription Factor ChIP-seq Peaks of ELK1 in GM12878 from ENCODE 3 (ENCFF432AQP) Regulation encTfChipPkENCFF948CPI GM12878 ELF1 Transcription Factor ChIP-seq Peaks of ELF1 in GM12878 from ENCODE 3 (ENCFF948CPI) Regulation encTfChipPkENCFF341EJT GM12878 EGR1 Transcription Factor ChIP-seq Peaks of EGR1 in GM12878 from ENCODE 3 (ENCFF341EJT) Regulation encTfChipPkENCFF023ALY GM12878 EED Transcription Factor ChIP-seq Peaks of EED in GM12878 from ENCODE 3 (ENCFF023ALY) Regulation encTfChipPkENCFF249SVT GM12878 EBF1 Transcription Factor ChIP-seq Peaks of EBF1 in GM12878 from ENCODE 3 (ENCFF249SVT) Regulation encTfChipPkENCFF035GFS GM12878 E4F1 Transcription Factor ChIP-seq Peaks of E4F1 in GM12878 from ENCODE 3 (ENCFF035GFS) Regulation encTfChipPkENCFF412GFI GM12878 E2F8 Transcription Factor ChIP-seq Peaks of E2F8 in GM12878 from ENCODE 3 (ENCFF412GFI) Regulation encTfChipPkENCFF687SFB GM12878 E2F4 Transcription Factor ChIP-seq Peaks of E2F4 in GM12878 from ENCODE 3 (ENCFF687SFB) Regulation encTfChipPkENCFF771IAW GM12878 DPF2 Transcription Factor ChIP-seq Peaks of DPF2 in GM12878 from ENCODE 3 (ENCFF771IAW) Regulation encTfChipPkENCFF567NFS GM12878 CUX1 Transcription Factor ChIP-seq Peaks of CUX1 in GM12878 from ENCODE 3 (ENCFF567NFS) Regulation encTfChipPkENCFF960ZGP GM12878 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in GM12878 from ENCODE 3 (ENCFF960ZGP) Regulation encTfChipPkENCFF356LIU GM12878 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in GM12878 from ENCODE 3 (ENCFF356LIU) Regulation encTfChipPkENCFF091YID GM12878 CREM Transcription Factor ChIP-seq Peaks of CREM in GM12878 from ENCODE 3 (ENCFF091YID) Regulation encTfChipPkENCFF249SIN GM12878 CHD4 Transcription Factor ChIP-seq Peaks of CHD4 in GM12878 from ENCODE 3 (ENCFF249SIN) Regulation encTfChipPkENCFF863CTN GM12878 CHD1 Transcription Factor ChIP-seq Peaks of CHD1 in GM12878 from ENCODE 3 (ENCFF863CTN) Regulation encTfChipPkENCFF786YYI GM12878 CEBPB Transcription Factor ChIP-seq Peaks of CEBPB in GM12878 from ENCODE 3 (ENCFF786YYI) Regulation encTfChipPkENCFF417SVR GM12878 CBX5 Transcription Factor ChIP-seq Peaks of CBX5 in GM12878 from ENCODE 3 (ENCFF417SVR) Regulation encTfChipPkENCFF552QOA GM12878 CBX3 Transcription Factor ChIP-seq Peaks of CBX3 in GM12878 from ENCODE 3 (ENCFF552QOA) Regulation encTfChipPkENCFF070SOX GM12878 CBFB Transcription Factor ChIP-seq Peaks of CBFB in GM12878 from ENCODE 3 (ENCFF070SOX) Regulation encTfChipPkENCFF005JKU GM12878 BRCA1 Transcription Factor ChIP-seq Peaks of BRCA1 in GM12878 from ENCODE 3 (ENCFF005JKU) Regulation encTfChipPkENCFF592LPO GM12878 BMI1 Transcription Factor ChIP-seq Peaks of BMI1 in GM12878 from ENCODE 3 (ENCFF592LPO) Regulation encTfChipPkENCFF370ZNL GM12878 BHLHE40 2 Transcription Factor ChIP-seq Peaks of BHLHE40 in GM12878 from ENCODE 3 (ENCFF370ZNL) Regulation encTfChipPkENCFF622HGF GM12878 BHLHE40 1 Transcription Factor ChIP-seq Peaks of BHLHE40 in GM12878 from ENCODE 3 (ENCFF622HGF) Regulation encTfChipPkENCFF247MHT GM12878 BCL3 Transcription Factor ChIP-seq Peaks of BCL3 in GM12878 from ENCODE 3 (ENCFF247MHT) Regulation encTfChipPkENCFF383HAY GM12878 BCL11A Transcription Factor ChIP-seq Peaks of BCL11A in GM12878 from ENCODE 3 (ENCFF383HAY) Regulation encTfChipPkENCFF832YIE GM12878 BATF Transcription Factor ChIP-seq Peaks of BATF in GM12878 from ENCODE 3 (ENCFF832YIE) Regulation encTfChipPkENCFF725YZH GM12878 BACH1 Transcription Factor ChIP-seq Peaks of BACH1 in GM12878 from ENCODE 3 (ENCFF725YZH) Regulation encTfChipPkENCFF495PWL GM12878 ATF7 Transcription Factor ChIP-seq Peaks of ATF7 in GM12878 from ENCODE 3 (ENCFF495PWL) Regulation encTfChipPkENCFF806KKM GM12878 ATF2 2 Transcription Factor ChIP-seq Peaks of ATF2 in GM12878 from ENCODE 3 (ENCFF806KKM) Regulation encTfChipPkENCFF210HTZ GM12878 ATF2 1 Transcription Factor ChIP-seq Peaks of ATF2 in GM12878 from ENCODE 3 (ENCFF210HTZ) Regulation encTfChipPkENCFF096XRG GM12878 ASH2L Transcription Factor ChIP-seq Peaks of ASH2L in GM12878 from ENCODE 3 (ENCFF096XRG) Regulation encTfChipPkENCFF758RQJ GM12878 ARNT Transcription Factor ChIP-seq Peaks of ARNT in GM12878 from ENCODE 3 (ENCFF758RQJ) Regulation encTfChipPkENCFF003VDB GM12878 ARID3A Transcription Factor ChIP-seq Peaks of ARID3A in GM12878 from ENCODE 3 (ENCFF003VDB) Regulation encTfChipPkENCFF834WWA GM12874 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM12874 from ENCODE 3 (ENCFF834WWA) Regulation encTfChipPkENCFF913EEI GM12873 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM12873 from ENCODE 3 (ENCFF913EEI) Regulation encTfChipPkENCFF965YZI GM12865 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM12865 from ENCODE 3 (ENCFF965YZI) Regulation encTfChipPkENCFF751IKT GM12864 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM12864 from ENCODE 3 (ENCFF751IKT) Regulation encTfChipPkENCFF178PUI GM10266 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM10266 from ENCODE 3 (ENCFF178PUI) Regulation encTfChipPkENCFF329TZO GM08714 ZNF274 Transcription Factor ChIP-seq Peaks of ZNF274 in GM08714 from ENCODE 3 (ENCFF329TZO) Regulation encTfChipPkENCFF897RQN GM06990 CTCF Transcription Factor ChIP-seq Peaks of CTCF in GM06990 from ENCODE 3 (ENCFF897RQN) Regulation encTfChipPkENCFF837RIT DOHH2 CTCF Transcription Factor ChIP-seq Peaks of CTCF in DOHH2 from ENCODE 3 (ENCFF837RIT) Regulation encTfChipPkENCFF990ZZT Caco-2 CTCF Transcription Factor ChIP-seq Peaks of CTCF in Caco-2 from ENCODE 3 (ENCFF990ZZT) Regulation encTfChipPkENCFF300XXC CD14+monocyte CTCF Transcription Factor ChIP-seq Peaks of CTCF in CD14-positive_monocyte from ENCODE 3 (ENCFF300XXC) Regulation encTfChipPkENCFF856AUX C4-2B ZFX Transcription Factor ChIP-seq Peaks of ZFX in C4-2B from ENCODE 3 (ENCFF856AUX) Regulation encTfChipPkENCFF675JFN C4-2B CTCF Transcription Factor ChIP-seq Peaks of CTCF in C4-2B from ENCODE 3 (ENCFF675JFN) Regulation encTfChipPkENCFF910TER B_cell CTCF Transcription Factor ChIP-seq Peaks of CTCF in B_cell from ENCODE 3 (ENCFF910TER) Regulation encTfChipPkENCFF704JHR BJ CTCF Transcription Factor ChIP-seq Peaks of CTCF in BJ from ENCODE 3 (ENCFF704JHR) Regulation encTfChipPkENCFF594OZI BE2C CTCF Transcription Factor ChIP-seq Peaks of CTCF in BE2C from ENCODE 3 (ENCFF594OZI) Regulation encTfChipPkENCFF100IYW AG10803 CTCF Transcription Factor ChIP-seq Peaks of CTCF in AG10803 from ENCODE 3 (ENCFF100IYW) Regulation encTfChipPkENCFF119XBW AG09319 CTCF Transcription Factor ChIP-seq Peaks of CTCF in AG09319 from ENCODE 3 (ENCFF119XBW) Regulation encTfChipPkENCFF826NCK AG09309 CTCF Transcription Factor ChIP-seq Peaks of CTCF in AG09309 from ENCODE 3 (ENCFF826NCK) Regulation encTfChipPkENCFF788LNG AG04450 CTCF Transcription Factor ChIP-seq Peaks of CTCF in AG04450 from ENCODE 3 (ENCFF788LNG) Regulation encTfChipPkENCFF652LEH AG04449 CTCF Transcription Factor ChIP-seq Peaks of CTCF in AG04449 from ENCODE 3 (ENCFF652LEH) Regulation encTfChipPkENCFF807XMX A673 EZH2 Transcription Factor ChIP-seq Peaks of EZH2 in A673 from ENCODE 3 (ENCFF807XMX) Regulation encTfChipPkENCFF695QMG A673 CTCF Transcription Factor ChIP-seq Peaks of CTCF in A673 from ENCODE 3 (ENCFF695QMG) Regulation encTfChipPkENCFF593ZJA A549 ZBTB33 Transcription Factor ChIP-seq Peaks of ZBTB33 in A549 from ENCODE 3 (ENCFF593ZJA) Regulation encTfChipPkENCFF613DTQ A549 YY1 Transcription Factor ChIP-seq Peaks of YY1 in A549 from ENCODE 3 (ENCFF613DTQ) Regulation encTfChipPkENCFF593EOW A549 USF2 Transcription Factor ChIP-seq Peaks of USF2 in A549 from ENCODE 3 (ENCFF593EOW) Regulation encTfChipPkENCFF228CDD A549 TCF12 Transcription Factor ChIP-seq Peaks of TCF12 in A549 from ENCODE 3 (ENCFF228CDD) Regulation encTfChipPkENCFF886KDK A549 TAF1 Transcription Factor ChIP-seq Peaks of TAF1 in A549 from ENCODE 3 (ENCFF886KDK) Regulation encTfChipPkENCFF483YCC A549 SREBF2 Transcription Factor ChIP-seq Peaks of SREBF2 in A549 from ENCODE 3 (ENCFF483YCC) Regulation encTfChipPkENCFF624DDK A549 SREBF1 Transcription Factor ChIP-seq Peaks of SREBF1 in A549 from ENCODE 3 (ENCFF624DDK) Regulation encTfChipPkENCFF404OSB A549 SP1 Transcription Factor ChIP-seq Peaks of SP1 in A549 from ENCODE 3 (ENCFF404OSB) Regulation encTfChipPkENCFF256LDD A549 SMC3 Transcription Factor ChIP-seq Peaks of SMC3 in A549 from ENCODE 3 (ENCFF256LDD) Regulation encTfChipPkENCFF189NMX A549 SIX5 Transcription Factor ChIP-seq Peaks of SIX5 in A549 from ENCODE 3 (ENCFF189NMX) Regulation encTfChipPkENCFF708HTR A549 SIN3A 2 Transcription Factor ChIP-seq Peaks of SIN3A in A549 from ENCODE 3 (ENCFF708HTR) Regulation encTfChipPkENCFF567BJI A549 SIN3A 1 Transcription Factor ChIP-seq Peaks of SIN3A in A549 from ENCODE 3 (ENCFF567BJI) Regulation encTfChipPkENCFF110EOX A549 RNF2 Transcription Factor ChIP-seq Peaks of RNF2 in A549 from ENCODE 3 (ENCFF110EOX) Regulation encTfChipPkENCFF179WDI A549 RFX5 Transcription Factor ChIP-seq Peaks of RFX5 in A549 from ENCODE 3 (ENCFF179WDI) Regulation encTfChipPkENCFF706DRE A549 REST 2 Transcription Factor ChIP-seq Peaks of REST in A549 from ENCODE 3 (ENCFF706DRE) Regulation encTfChipPkENCFF107EWI A549 REST 1 Transcription Factor ChIP-seq Peaks of REST in A549 from ENCODE 3 (ENCFF107EWI) Regulation encTfChipPkENCFF993WZP A549 RCOR1 Transcription Factor ChIP-seq Peaks of RCOR1 in A549 from ENCODE 3 (ENCFF993WZP) Regulation encTfChipPkENCFF897QCA A549 RAD21 Transcription Factor ChIP-seq Peaks of RAD21 in A549 from ENCODE 3 (ENCFF897QCA) Regulation encTfChipPkENCFF664KTN A549 POLR2A 2 Transcription Factor ChIP-seq Peaks of POLR2A in A549 from ENCODE 3 (ENCFF664KTN) Regulation encTfChipPkENCFF915LKZ A549 POLR2A 1 Transcription Factor ChIP-seq Peaks of POLR2A in A549 from ENCODE 3 (ENCFF915LKZ) Regulation encTfChipPkENCFF907WHF A549 PHF8 Transcription Factor ChIP-seq Peaks of PHF8 in A549 from ENCODE 3 (ENCFF907WHF) Regulation encTfChipPkENCFF463DJO A549 NR3C1 5 Transcription Factor ChIP-seq Peaks of NR3C1 in A549 from ENCODE 3 (ENCFF463DJO) Regulation encTfChipPkENCFF114SRD A549 NR3C1 4 Transcription Factor ChIP-seq Peaks of NR3C1 in A549 from ENCODE 3 (ENCFF114SRD) Regulation encTfChipPkENCFF963CGV A549 NR3C1 3 Transcription Factor ChIP-seq Peaks of NR3C1 in A549 from ENCODE 3 (ENCFF963CGV) Regulation encTfChipPkENCFF514IGC A549 NR3C1 2 Transcription Factor ChIP-seq Peaks of NR3C1 in A549 from ENCODE 3 (ENCFF514IGC) Regulation encTfChipPkENCFF714KXI A549 NR3C1 1 Transcription Factor ChIP-seq Peaks of NR3C1 in A549 from ENCODE 3 (ENCFF714KXI) Regulation encTfChipPkENCFF418TUX A549 NFE2L2 Transcription Factor ChIP-seq Peaks of NFE2L2 in A549 from ENCODE 3 (ENCFF418TUX) Regulation encTfChipPkENCFF542GMN A549 MYC Transcription Factor ChIP-seq Peaks of MYC in A549 from ENCODE 3 (ENCFF542GMN) Regulation encTfChipPkENCFF813WJW A549 MAFK Transcription Factor ChIP-seq Peaks of MAFK in A549 from ENCODE 3 (ENCFF813WJW) Regulation encTfChipPkENCFF149INM A549 KDM5A Transcription Factor ChIP-seq Peaks of KDM5A in A549 from ENCODE 3 (ENCFF149INM) Regulation encTfChipPkENCFF316CBQ A549 KDM1A Transcription Factor ChIP-seq Peaks of KDM1A in A549 from ENCODE 3 (ENCFF316CBQ) Regulation encTfChipPkENCFF587VEY A549 JUND Transcription Factor ChIP-seq Peaks of JUND in A549 from ENCODE 3 (ENCFF587VEY) Regulation encTfChipPkENCFF127HJG A549 JUN Transcription Factor ChIP-seq Peaks of JUN in A549 from ENCODE 3 (ENCFF127HJG) Regulation encTfChipPkENCFF814DAF A549 HDAC2 Transcription Factor ChIP-seq Peaks of HDAC2 in A549 from ENCODE 3 (ENCFF814DAF) Regulation encTfChipPkENCFF520GJC A549 GABPA Transcription Factor ChIP-seq Peaks of GABPA in A549 from ENCODE 3 (ENCFF520GJC) Regulation encTfChipPkENCFF167BKY A549 FOXA1 2 Transcription Factor ChIP-seq Peaks of FOXA1 in A549 from ENCODE 3 (ENCFF167BKY) Regulation encTfChipPkENCFF297HAX A549 FOXA1 1 Transcription Factor ChIP-seq Peaks of FOXA1 in A549 from ENCODE 3 (ENCFF297HAX) Regulation encTfChipPkENCFF808RWZ A549 FOSL2 Transcription Factor ChIP-seq Peaks of FOSL2 in A549 from ENCODE 3 (ENCFF808RWZ) Regulation encTfChipPkENCFF896WFR A549 ETS1 Transcription Factor ChIP-seq Peaks of ETS1 in A549 from ENCODE 3 (ENCFF896WFR) Regulation encTfChipPkENCFF558UWY A549 ESRRA Transcription Factor ChIP-seq Peaks of ESRRA in A549 from ENCODE 3 (ENCFF558UWY) Regulation encTfChipPkENCFF605JXG A549 ELK1 Transcription Factor ChIP-seq Peaks of ELK1 in A549 from ENCODE 3 (ENCFF605JXG) Regulation encTfChipPkENCFF935ZUW A549 ELF1 Transcription Factor ChIP-seq Peaks of ELF1 in A549 from ENCODE 3 (ENCFF935ZUW) Regulation encTfChipPkENCFF199OOU A549 EHMT2 Transcription Factor ChIP-seq Peaks of EHMT2 in A549 from ENCODE 3 (ENCFF199OOU) Regulation encTfChipPkENCFF646TUX A549 CTCF 3 Transcription Factor ChIP-seq Peaks of CTCF in A549 from ENCODE 3 (ENCFF646TUX) Regulation encTfChipPkENCFF615GTV A549 CTCF 2 Transcription Factor ChIP-seq Peaks of CTCF in A549 from ENCODE 3 (ENCFF615GTV) Regulation encTfChipPkENCFF535MZG A549 CTCF 1 Transcription Factor ChIP-seq Peaks of CTCF in A549 from ENCODE 3 (ENCFF535MZG) Regulation encTfChipPkENCFF186ZET A549 CREB1 2 Transcription Factor ChIP-seq Peaks of CREB1 in A549 from ENCODE 3 (ENCFF186ZET) Regulation encTfChipPkENCFF576PUH A549 CREB1 1 Transcription Factor ChIP-seq Peaks of CREB1 in A549 from ENCODE 3 (ENCFF576PUH) Regulation encTfChipPkENCFF766YPH A549 CHD4 Transcription Factor ChIP-seq Peaks of CHD4 in A549 from ENCODE 3 (ENCFF766YPH) Regulation encTfChipPkENCFF047UIF A549 CEBPB Transcription Factor ChIP-seq Peaks of CEBPB in A549 from ENCODE 3 (ENCFF047UIF) Regulation encTfChipPkENCFF330OCU A549 CBX8 Transcription Factor ChIP-seq Peaks of CBX8 in A549 from ENCODE 3 (ENCFF330OCU) Regulation encTfChipPkENCFF208AXT A549 CBX2 Transcription Factor ChIP-seq Peaks of CBX2 in A549 from ENCODE 3 (ENCFF208AXT) Regulation encTfChipPkENCFF093ZAB A549 BCL3 Transcription Factor ChIP-seq Peaks of BCL3 in A549 from ENCODE 3 (ENCFF093ZAB) Regulation encTfChipPkENCFF851UTY A549 ATF3 Transcription Factor ChIP-seq Peaks of ATF3 in A549 from ENCODE 3 (ENCFF851UTY) Regulation cons241way Cactus 241-way Cactus Alignment & Conservation of Zoonomia Placental Mammals (241 Species) Comparative Genomics Downloads for data in this track are available: Cactus alignments (MAF format), and phylogenetic trees, and PhyloP conservation (WIG and bigWig format) Description Warning: Unlike other alignment tracks on the genome browser, this one does not show insertions in the query genomes. Also, all other alignment tracks show one query genome sequence for target target genome sequence, but in this track, each target genome sequence can be aligned to multiple query genome sequences. Only the first sequence is shown on the genome browser itself, the others are shown on the details page, when one clicks on the alignment. If you are interested in this track and want these shortcomings to be fixed, please contact us. This track shows multiple alignments of 241 vertebrate species and measurements of evolutionary conservation from the Zoonomia Project. The multiple alignments were generated using the Cactus comparative genomics alignment system. Cactus produces reference-free, whole-genome multiple alignments. The base-wise conservation scores are computed using phyloP from the PHAST package, for all species. This version was prepared by Michael Dong (Uppsala U) with an improved neutral model incorporating better versions of ancestral repeats. For genome assemblies not available in the genome browser, there are alternative assembly hub genome browsers. Missing sequence in any assembly is highlighted in the track display by regions of yellow when zoomed out and by Ns when displayed at base level (see Gap Annotation, below). count commonname CLADE group scientificname sequencingsource NCBIassembly speciesstatus 1 Cape golden mole AFROSORICIDA Chrysochloridae Chrysochloris asiatica 1. Zoonomia GCA_004027935.1 LC 2 Small madagascar hedgehog AFROSORICIDA Tenrecidae Echinops telfairi 2. Existing assembly GCF_000313985.1 LC 3 Talazac's shrew tenrec AFROSORICIDA Tenrecidae Microgale talazaci 1. Zoonomia GCA_004026705.1 LC 4 Cheetah CARNIVORA Felidae Acinonyx jubatus 2. Existing assembly GCF_001443585.1 CR 5 Giant panda CARNIVORA Ursidae Ailuropoda melanoleuca 2. Existing assembly GCA_002007445.1 VU 6 Lesser panda CARNIVORA Ailuridae Ailurus fulgens 2. Existing assembly GCA_002007465.1 EN 7 Domestic dog CARNIVORA Canidae Canis lupus familiaris 2. Existing assembly GCF_000002285.3 LC 8 Domestic dog (village dog) CARNIVORA Canidae Canis lupus familiaris 1. Zoonomia GCA_004027395.1 LC 9 Fossa CARNIVORA Eupleridae Cryptoprocta ferox 1. Zoonomia GCA_004023885.1 VU 10 Sea otter CARNIVORA Mustelidae Enhydra lutris 2. Existing assembly GCF_002288905.1 EN 11 Domestic cat CARNIVORA Felidae Felis catus 2. Existing assembly GCF_000181335.2 LC 12 Black-footed cat CARNIVORA Felidae Felis nigripes 1. Zoonomia GCA_004023925.1 VU 13 Dwarf mongoose CARNIVORA Herpestidae Helogale parvula 1. Zoonomia GCA_004023845.1 LC 14 Striped hyena CARNIVORA Hyaenidae Hyaena hyaena 1. Zoonomia GCA_004023945.1 NT 15 Weddell seal CARNIVORA Phocidae Leptonychotes weddellii 2. Existing assembly GCF_000349705.1 LC 16 African hunting dog CARNIVORA Canidae Lycaon pictus 2. Existing assembly GCA_001887905.1 EN 17 Honey badger CARNIVORA Mustelidae Mellivora capensis 1. Zoonomia GCA_004024625.1 LC 18 Northern elephant seal CARNIVORA Phocidae Mirounga angustirostris 1. Zoonomia GCA_004023865.1 LC 19 South African banded mongoose CARNIVORA Herpestidae Mungos mungo 1. Zoonomia GCA_004023785.1 LC 20 Domestic ferret CARNIVORA Mustelidae Mustela putorius 2. Existing assembly GCF_000239315.1 LC 21 Hawaiian monk seal CARNIVORA Phocidae Neomonachus schauinslandi 2. Existing assembly GCA_002201575.1 EN 22 Pacific walrus CARNIVORA Odobenidae Odobenus rosmarus 2. Existing assembly GCF_000321225.1 DD 23 Jaguar CARNIVORA Felidae Panthera onca 1. Zoonomia GCA_004023805.1 NT 24 Leopard CARNIVORA Felidae Panthera pardus 2. Existing assembly GCA_001857705.1 VU 25 Amur tiger CARNIVORA Felidae Panthera tigris 2. Existing assembly GCF_000464555.1 EN 26 Asian palm civet CARNIVORA Viverridae Paradoxurus hermaphroditus 1. Zoonomia GCA_004024585.1 LC 27 Giant otter CARNIVORA Mustelidae Pteronura brasiliensis 1. Zoonomia GCA_004024605.1 EN 28 Puma CARNIVORA Felidae Puma concolor 2. Existing assembly GCF_003327715.1 LC 29 Western spotted skunk CARNIVORA Mephitidae Spilogale gracilis 1. Zoonomia GCA_004023965.1 LC 30 Meerkat CARNIVORA Herpestidae Suricata suricatta 1. Zoonomia GCA_004023905.1 LC 31 Polar bear CARNIVORA Ursidae Ursus maritimus 2. Existing assembly GCF_000687225.1 VU 32 Arctic fox CARNIVORA Canidae Vulpes lagopus 1. Zoonomia GCA_004023825.1 LC 33 California sea lion CARNIVORA Otariidae Zalophus californianus 1. Zoonomia GCA_004024565.1 LC 34 Aoudad CETARTIODACTYLA Bovidae Ammotragus lervia 2. Existing assembly GCA_002201775.1 VU 35 Pronghorn CETARTIODACTYLA Antilocapridae Antilocapra americana 1. Zoonomia GCA_004027515.1 LC 36 Minke whale CETARTIODACTYLA Balaenopteridae Balaenoptera acutorostrata 2. Existing assembly GCF_000493695.1 LC 37 Antarctic minke whale CETARTIODACTYLA Balaenopteridae Balaenoptera bonaerensis 2. Existing assembly GCA_000978805.1 DD 38 Hirola CETARTIODACTYLA Bovidae Beatragus hunteri 1. Zoonomia GCA_004027495.1 CR 39 American bison CETARTIODACTYLA Bovidae Bison bison 2. Existing assembly GCF_000754665.1 NT 40 Zebu cattle CETARTIODACTYLA Bovidae Bos indicus 2. Existing assembly GCA_000247795.2 LC 41 Wild yak CETARTIODACTYLA Bovidae Bos mutus 2. Existing assembly GCF_000298355.1 VU 42 Cattle CETARTIODACTYLA Bovidae Bos taurus 2. Existing assembly GCF_000003205.7 LC 43 Water buffalo CETARTIODACTYLA Bovidae Bubalus bubalis 2. Existing assembly GCF_000471725.1 LC 44 Bactrian camel CETARTIODACTYLA Camelidae Camelus bactrianus 2. Existing assembly GCF_000767855.1 LC 45 Arabian camel CETARTIODACTYLA Camelidae Camelus dromedarius 2. Existing assembly GCF_000767585.1 LC 46 Wild bactrian camel CETARTIODACTYLA Camelidae Camelus ferus 2. Existing assembly GCF_000311805.1 CR 47 Wild goat CETARTIODACTYLA Bovidae Capra aegagrus 2. Existing assembly GCA_000978405.1 VU 48 Goat CETARTIODACTYLA Bovidae Capra hircus 2. Existing assembly GCF_001704415.1 LC 49 Chacoan peccary CETARTIODACTYLA Tayassuidae Catagonus wagneri 1. Zoonomia GCA_004024745.1 EN 50 Beluga whale CETARTIODACTYLA Monodontidae Delphinapterus leucas 2. Existing assembly GCF_002288925.1 LC 51 Pere david's deer CETARTIODACTYLA Cervidae Elaphurus davidianus 2. Existing assembly GCA_002443075.1 CR 52 Grey whale CETARTIODACTYLA Eschrichtiidae Eschrichtius robustus 1. Zoonomia GCA_004363415.1 LC 53 North Pacific right whale CETARTIODACTYLA Balaenidae Eubalaena japonica 1. Zoonomia GCA_004363455.1 EN 54 Giraffe CETARTIODACTYLA Giraffidae Giraffa tippelskirchi 2. Existing assembly GCA_001651235.1 VU 55 Nilgiri tahr CETARTIODACTYLA Bovidae Hemitragus hylocrius 1. Zoonomia GCA_004026825.1 EN 56 Hippopotamus CETARTIODACTYLA Hippopotamidae Hippopotamus amphibius 1. Zoonomia GCA_004027065.1 VU 57 Amazon river dolphin CETARTIODACTYLA Iniidae Inia geoffrensis 1. Zoonomia GCA_004363515.1 DD 58 Pygmy sperm whale CETARTIODACTYLA Physeteridae Kogia breviceps 1. Zoonomia GCA_004363705.1 DD 59 Yangtze river dolphin CETARTIODACTYLA Iniidae Lipotes vexillifer 2. Existing assembly GCF_000442215.1 CR 60 Sowerby's beaked whale CETARTIODACTYLA Ziphiidae Mesoplodon bidens 1. Zoonomia GCA_004027085.1 DD 61 Narwhal CETARTIODACTYLA Monodontidae Monodon monoceros 1. Zoonomia GCA_004026685.1 LC 62 Siberian musk deer CETARTIODACTYLA Moschidae Moschus moschiferus 1. Zoonomia GCA_004024705.1 VU 63 Yangtze finless porpoise CETARTIODACTYLA Phocoenidae Neophocaena asiaeorientalis 2. Existing assembly GCA_003031525.1 EN 64 White-tailed deer CETARTIODACTYLA Cervidae Odocoileus virginianus 2. Existing assembly GCA_002102435.1 LC 65 Okapi CETARTIODACTYLA Giraffidae Okapia johnstoni 2. Existing assembly GCA_001660835.1 EN 66 Killer whale CETARTIODACTYLA Delphinidae Orcinus orca 2. Existing assembly GCF_000331955.2 DD 67 Sheep CETARTIODACTYLA Bovidae Ovis aries 2. Existing assembly GCF_000298735.2 LC 68 Peninsular bighorn sheep CETARTIODACTYLA Bovidae Ovis canadensis cremnobates 1. Zoonomia GCA_004026945.1 EN 69 Chiru CETARTIODACTYLA Bovidae Pantholops hodgsonii 2. Existing assembly GCF_000400835.1 NT 70 Harbor porpoise CETARTIODACTYLA Phocoenidae Phocoena phocoena 1. Zoonomia GCA_004363495.1 LC 71 Indus river dolphin CETARTIODACTYLA Platanistidae Platanista gangetica minor 1. Zoonomia GCA_004363435.1 EN 72 Siberian reindeer CETARTIODACTYLA Cervidae Rangifer tarandus 1. Zoonomia GCA_004026565.1 VU 73 Russian saiga CETARTIODACTYLA Bovidae Saiga tatarica tatarica 1. Zoonomia GCA_004024985.1 CR 74 Pig CETARTIODACTYLA Suidae Sus scrofa 2. Existing assembly GCF_000003025.5 LC 75 Java lesser chevrotain CETARTIODACTYLA Tragulidae Tragulus javanicus 1. Zoonomia GCA_004024965.1 DD 76 Bottlenose dolphin CETARTIODACTYLA Delphinidae Tursiops truncatus 2. Existing assembly GCA_001922835.1 LC 77 Alpaca CETARTIODACTYLA Camelidae Vicugna pacos 2. Existing assembly GCA_000767525.1 LC 78 Cuvier's beaked whale CETARTIODACTYLA Ziphiidae Ziphius cavirostris 1. Zoonomia GCA_004364475.1 LC 79 Tailed tailless bat CHIROPTERA Phyllostomidae Anoura caudifer 1. Zoonomia GCA_004027475.1 LC 80 Jamacian fruit-eating bat CHIROPTERA Phyllostomidae Artibeus jamaicensis 1. Zoonomia GCA_004027435.1 LC 81 Seba's short-tailed bat CHIROPTERA Phyllostomidae Carollia perspicillata 1. Zoonomia GCA_004027735.1 LC 82 Bumblebee bat CHIROPTERA Craseonycteridae Craseonycteris thonglongyai 1. Zoonomia GCA_004027555.1 VU 83 Common vampire bat CHIROPTERA Phyllostomidae Desmodus rotundus 2. Existing assembly GCA_002940915.2 LC 84 Straw-colored fruit bat CHIROPTERA Pteropodidae Eidolon helvum 2. Existing assembly GCA_000465285.1 NT 85 Big brown bat CHIROPTERA Vespertilionidae Eptesicus fuscus 2. Existing assembly GCF_000308155.1 LC 86 Great roundleaf bat CHIROPTERA Hipposideridae Hipposideros armiger 2. Existing assembly GCA_001890085.1 LC 87 Cantor's leaf-nosed bat CHIROPTERA Hipposideridae Hipposideros galeritus 1. Zoonomia GCA_004027415.1 LC 88 Eastern red bat CHIROPTERA Vespertilionidae Lasiurus borealis 1. Zoonomia GCA_004026805.1 LC 89 Long-tongued fruit bat CHIROPTERA Pteropodidae Macroglossus sobrinus 1. Zoonomia GCA_004027375.1 LC 90 Greater false vampire bat CHIROPTERA Megadermatidae Megaderma lyra 1. Zoonomia GCA_004026885.1 LC 91 Hairy big-eared bat CHIROPTERA Phyllostomidae Micronycteris hirsuta 1. Zoonomia GCA_004026765.1 LC 92 Natal long-fingered bat CHIROPTERA Vespertilionidae Miniopterus natalensis 2. Existing assembly GCF_001595765.1 LC 93 Common bent-wing bat CHIROPTERA Vespertilionidae Miniopterus schreibersii 1. Zoonomia GCA_004026525.1 NT 94 Ghost-faced bat CHIROPTERA Mormoopidae Mormoops blainvillei 1. Zoonomia GCA_004026545.1 LC 95 Ashy-gray tube-nosed bat CHIROPTERA Vespertilionidae Murina feae 1. Zoonomia GCA_004026665.1 LC 96 Brandt's bat CHIROPTERA Vespertilionidae Myotis brandtii 2. Existing assembly GCF_000412655.1 LC 97 David's myotis bat CHIROPTERA Vespertilionidae Myotis davidii 2. Existing assembly GCF_000327345.1 LC 98 Little brown bat CHIROPTERA Vespertilionidae Myotis lucifugus 2. Existing assembly GCF_000147115.1 LC 99 Greater mouse-eared bat CHIROPTERA Vespertilionidae Myotis myotis 1. Zoonomia GCA_004026985.1 LC 100 Greater bulldog bat CHIROPTERA Noctilionidae Noctilio leporinus 1. Zoonomia GCA_004026585.1 LC 101 Common pipistrelle CHIROPTERA Vespertilionidae Pipistrellus pipistrellus 1. Zoonomia GCA_004026625.1 LC 102 Parnell's mustached bat CHIROPTERA Mormoopidae Pteronotus parnellii 2. Existing assembly GCA_000465405.1 LC 103 Black flying fox CHIROPTERA Pteropodidae Pteropus alecto 2. Existing assembly GCF_000325575.1 LC 104 Large flying fox CHIROPTERA Pteropodidae Pteropus vampyrus 2. Existing assembly GCF_000151845.1 NT 105 Chinese rufous horseshoe bat CHIROPTERA Rhinolophidae Rhinolophus sinicus 2. Existing assembly GCA_001888835.1 LC 106 Egyptian fruit bat CHIROPTERA Pteropodidae Rousettus aegyptiacus 1. Zoonomia GCA_004024865.1 LC 107 Mexican free-tailed bat CHIROPTERA Molossidae Tadarida brasiliensis 1. Zoonomia GCA_004025005.1 LC 108 Stripe-headed round-eared bat CHIROPTERA Phyllostomidae Tonatia saurophila 1. Zoonomia GCA_004024845.1 LC 109 Screaming hairy armadillo CINGULATA Dasypodidae Chaetophractus vellerosus 1. Zoonomia GCA_004027955.1 LC 110 Nine-banded armadillo CINGULATA Dasypodidae Dasypus novemcinctus 2. Existing assembly GCF_000208655.1 LC 111 Southern three-banded armadillo CINGULATA Dasypodidae Tolypeutes matacus 1. Zoonomia GCA_004025125.1 NT 112 Sunda flying lemur DERMOPTERA Cynocephalidae Galeopterus variegatus 1. Zoonomia GCA_004027255.1 LC 113 Star-nosed mole EULIPOTYPHLA Talpidae Condylura cristata 2. Existing assembly GCF_000260355.1 LC 114 Indochinese shrew EULIPOTYPHLA Soricidae Crocidura indochinensis 1. Zoonomia GCA_004027635.1 LC 115 Western european hedgehog EULIPOTYPHLA Erinaceidae Erinaceus europaeus 2. Existing assembly GCF_000296755.1 LC 116 Eastern mole EULIPOTYPHLA Talpidae Scalopus aquaticus 1. Zoonomia GCA_004024925.1 LC 117 Hispaniolan solenodon EULIPOTYPHLA Solenodontidae Solenodon paradoxus 1. Zoonomia GCA_004363575.1 EN 118 European shrew EULIPOTYPHLA Soricidae Sorex araneus 2. Existing assembly GCF_000181275.1 LC 119 Gracile shrew-like mole EULIPOTYPHLA Talpidae Uropsilus gracilis 1. Zoonomia GCA_004024945.1 LC 120 African yellow-spotted rock hyrax HYRACOIDEA Procaviidae Heterohyrax brucei 1. Zoonomia GCA_004026845.1 LC 121 South African rock hyrax HYRACOIDEA Procaviidae Procavia capensis 1. Zoonomia GCA_004026925.1 LC 122 Snowshoe hare LAGOMORPHA Leporidae Lepus americanus 1. Zoonomia GCA_004026855.1 LC 123 American pika LAGOMORPHA Ochotonidae Ochotona princeps 2. Existing assembly GCF_000292845.1 LC 124 Rabbit LAGOMORPHA Leporidae Oryctolagus cuniculus 2. Existing assembly GCF_000003625.3 NT 125 Cape elephant shrew MACROSCELIDEA Macroscelididae Elephantulus edwardii 1. Zoonomia GCA_004027355.1 LC 126 Southern white rhinoceros PERISSODACTYLA Rhinocerotidae Ceratotherium simum 2. Existing assembly GCF_000283155.1 NT 127 Northern white rhino PERISSODACTYLA Rhinocerotidae Ceratotherium simum cottoni 1. Zoonomia GCA_004027795.1 CR 128 Sumatran rhinoceros PERISSODACTYLA Rhinocerotidae Dicerorhinus sumatrensis 2. Existing assembly GCA_002844835.1 CR 129 Black rhinocerous PERISSODACTYLA Rhinocerotidae Diceros bicornis 1. Zoonomia GCA_004027315.1 CR 130 Ass PERISSODACTYLA Equidae Equus asinus 2. Existing assembly GCF_001305755.1 LC 131 Horse PERISSODACTYLA Equidae Equus caballus 2. Existing assembly GCF_000002305.2 LC 132 Przewalski's horse PERISSODACTYLA Equidae Equus przewalskii 2. Existing assembly GCF_000696695.1 EN 133 Malayan tapir PERISSODACTYLA Tapiridae Tapirus indicus 1. Zoonomia GCA_004024905.1 EN 134 South American tapir PERISSODACTYLA Tapiridae Tapirus terrestris 1. Zoonomia GCA_004025025.1 VU 135 Malayan pangolin PHOLIDOTA Manidae Manis javanica 2. Existing assembly GCF_001685135.1 CR 136 Chinese pangolin PHOLIDOTA Manidae Manis pentadactyla 2. Existing assembly GCA_000738955.1 CR 137 Linnaeus's two toed sloth PILOSA Megalonychidae Choloepus didactylus 1. Zoonomia GCA_004027855.1 LC 138 Hoffmann's two-fingered sloth PILOSA Megalonychidae Choloepus hoffmanni 2. Existing assembly GCA_000164785.2 LC 139 Giant anteater PILOSA Myrmecophagidae Myrmecophaga tridactyla 1. Zoonomia GCA_004026745.1 VU 140 Southern tamandua PILOSA Myrmecophagidae Tamandua tetradactyla 1. Zoonomia GCA_004025105.1 LC 141 Mexican howler monkey PRIMATES Atelidae Alouatta palliata mexicana 1. Zoonomia GCA_004027835.1 CR 142 Ma's night monkey PRIMATES Aotidae Aotus nancymaae 2. Existing assembly GCA_000952055.2 VU 143 Geoffroy's spider monkey PRIMATES Atelidae Ateles geoffroyi 1. Zoonomia GCA_004024785.1 EN 144 White-eared titi PRIMATES Pitheciidae Callicebus donacophilus 1. Zoonomia GCA_004027715.1 LC 145 White-tufted-ear marmoset PRIMATES Cebidae Callithrix jacchus 2. Existing assembly GCA_002754865.1 LC 146 White-fronted capuchin PRIMATES Cebidae Cebus albifrons 1. Zoonomia GCA_004027755.1 LC 147 White-faced sapajou PRIMATES Cebidae Cebus capucinus 2. Existing assembly GCF_001604975.1 LC 148 Sooty mangabey PRIMATES Cercopithecidae Cercocebus atys 2. Existing assembly GCF_000955945.1 NT 149 De brazza's monkey PRIMATES Cercopithecidae Cercopithecus neglectus 1. Zoonomia GCA_004027615.1 LC 150 Fat-tailed dwarf lemur PRIMATES Cheirogaleidae Cheirogaleus medius 1. Zoonomia GCA_004024725.1 LC 151 Green monkey PRIMATES Cercopithecidae Chlorocebus sabaeus 2. Existing assembly GCF_000409795.2 LC 152 Angolan colobus PRIMATES Cercopithecidae Colobus angolensis 2. Existing assembly GCF_000951035.1 VU 153 Aye-aye PRIMATES Daubentoniidae Daubentonia madagascariensis 1. Zoonomia GCA_004027145.1 EN 154 Patas monkey PRIMATES Cercopithecidae Erythrocebus patas 1. Zoonomia GCA_004027335.1 LC 155 Sclater's lemur PRIMATES Lemuridae Eulemur flavifrons 2. Existing assembly GCA_001262665.1 CR 156 Common brown lemur PRIMATES Lemuridae Eulemur fulvus 1. Zoonomia GCA_004027275.1 NT 157 Western lowland gorilla PRIMATES Hominidae Gorilla gorilla 2. Existing assembly GCA_900006655.3 CR 158 Human PRIMATES Hominidae Homo sapiens 2. Existing assembly GCA_000001405.27 LC 159 Indri PRIMATES Indridae Indri indri 1. Zoonomia GCA_004363605.1 CR 160 Ring tailed lemur PRIMATES Lemuridae Lemur catta 1. Zoonomia GCA_004024665.1 EN 161 Crab-eating macaque PRIMATES Cercopithecidae Macaca fascicularis 2. Existing assembly GCF_000364345.1 DD 162 Rhesus monkey PRIMATES Cercopithecidae Macaca mulatta 2. Existing assembly GCF_000772875.2 LC 163 Pig-tailed macaque PRIMATES Cercopithecidae Macaca nemestrina 2. Existing assembly GCF_000956065.1 VU 164 Drill PRIMATES Cercopithecidae Mandrillus leucophaeus 2. Existing assembly GCF_000951045.1 EN 165 Gray mouse lemur PRIMATES Cheirogaleidae Microcebus murinus 2. Existing assembly GCA_000165445.3 LC 166 Coquerel's giant mouse lemur PRIMATES Cheirogaleidae Mirza coquereli 1. Zoonomia GCA_004024645.1 EN 167 Proboscis monkey PRIMATES Cercopithecidae Nasalis larvatus 1. Zoonomia GCA_004027105.1 EN 168 Northern white-cheeked gibbon PRIMATES Hylobatidae Nomascus leucogenys 2. Existing assembly GCF_000146795.2 CR 169 Sunda slow loris PRIMATES Lorisidae Nycticebus coucang 1. Zoonomia GCA_004027815.1 VU 170 Small-eared galago PRIMATES Galagidae Otolemur garnettii 2. Existing assembly GCF_000181295.1 LC 171 Pygmy chimpanzee PRIMATES Hominidae Pan paniscus 2. Existing assembly GCF_000258655.2 EN 172 Chimpanzee PRIMATES Hominidae Pan troglodytes 2. Existing assembly GCA_002880755.3 EN 173 Olive baboon PRIMATES Cercopithecidae Papio anubis 2. Existing assembly GCA_000264685.2 LC 174 Ugandan red colobus PRIMATES Cercopithecidae Piliocolobus tephrosceles 2. Existing assembly GCA_002776525.1 EN 175 White-faced saki PRIMATES Pitheciidae Pithecia pithecia 1. Zoonomia GCA_004026645.1 LC 176 Sumatran orangutan PRIMATES Hominidae Pongo abelii 2. Existing assembly GCA_002880775.3 CR 177 Coquerel's sifaka PRIMATES Indridae Propithecus coquereli 2. Existing assembly GCF_000956105.1 EN 178 Red-shanked douc PRIMATES Cercopithecidae Pygathrix nemaeus 1. Zoonomia GCA_004024825.1 EN 179 Black snub-nosed monkey PRIMATES Cercopithecidae Rhinopithecus bieti 2. Existing assembly GCF_001698545.1 EN 180 Golden snub-nosed monkey PRIMATES Cercopithecidae Rhinopithecus roxellana 2. Existing assembly GCF_000769185.1 EN 181 Emperor tamarin PRIMATES Cebidae Saguinus imperator 1. Zoonomia GCA_004024885.1 LC 182 Bolivian squirrel monkey PRIMATES Cebidae Saimiri boliviensis 2. Existing assembly GCF_000235385.1 LC 183 Northern Plains gray langur PRIMATES Cercopithecidae Semnopithecus entellus 1. Zoonomia GCA_004025065.1 LC 184 African savanna elephant PROBOSCIDEA Elephantidae Loxodonta Africana 2. Existing assembly GCF_000001905.1 VU 185 Cairo spiny mouse RODENTIA Muridae Acomys cahirinus 1. Zoonomia GCA_004027535.1 LC 186 Gobi jerboa RODENTIA Dipodidae Allactaga bullata 1. Zoonomia GCA_004027895.1 LC 187 Mountain beaver RODENTIA Aplodontiidae Aplodontia rufa 1. Zoonomia GCA_004027875.1 LC 188 Desmarest's hutia RODENTIA Capromyidae Capromys pilorides 1. Zoonomia GCA_004027915.1 LC 189 North American beaver RODENTIA Castoridae Castor canadensis 1. Zoonomia GCA_004027675.1 LC 190 Brazilian guinea pig RODENTIA Caviidae Cavia aperea 2. Existing assembly GCA_000688575.1 LC 191 Domestic guinea pig RODENTIA Caviidae Cavia porcellus 2. Existing assembly GCF_000151735.1 LC 192 Montane guinea pig RODENTIA Caviidae Cavia tschudii 1. Zoonomia GCA_004027695.1 LC 193 Long-tailed chinchilla RODENTIA Chinchillidae Chinchilla lanigera 2. Existing assembly GCF_000276665.1 EN 194 Gambian pouched rat RODENTIA Nesomyidae Cricetomys gambianus 1. Zoonomia GCA_004027575.1 LC 195 Chinese hamster RODENTIA Nesomyidae Cricetulus griseus 2. Existing assembly GCA_900186095.1 LC 196 Common gundi RODENTIA Ctenodactylidae Ctenodactylus gundi 1. Zoonomia GCA_004027205.1 LC 197 Social tuco-tuco RODENTIA Ctenomyidae Ctenomys sociabilis 1. Zoonomia GCA_004027165.1 CR 198 Lowland paca RODENTIA Cuniculidae Cuniculus paca 1. Zoonomia GCA_004365215.1 LC 199 Central American agouti RODENTIA Dasyproctidae Dasyprocta punctata 1. Zoonomia GCA_004363535.1 LC 200 Pacarana RODENTIA Dinomyidae Dinomys branickii 1. Zoonomia GCA_004027595.1 LC 201 Ord's kangaroo rat RODENTIA Heteromyidae Dipodomys ordii 2. Existing assembly GCF_000151885.1 LC 202 Stephen's kangaroo rat RODENTIA Heteromyidae Dipodomys stephensi 1. Zoonomia GCA_004024685.1 VU 203 Patagonian mara RODENTIA Caviidae Dolichotis patagonum 1. Zoonomia GCA_004027295.1 NT 204 Transcaucasian mole vole RODENTIA Cricetidae Ellobius lutescens 2. Existing assembly GCA_001685075.1 LC 205 Northern mole vole RODENTIA Cricetidae Ellobius talpinus 2. Existing assembly GCA_001685095.1 LC 206 Damara mole-rat RODENTIA Bathyergidae Fukomys damarensis 2. Existing assembly GCF_000743615.1 LC 207 Edible dormouse RODENTIA Gliridae Glis glis 1. Zoonomia GCA_004027185.1 LC 208 Woodland doormouse RODENTIA Gliridae Graphiurus murinus 1. Zoonomia GCA_004027655.1 LC 209 Naked mole-rat RODENTIA Bathyergidae Heterocephalus glaber 2. Existing assembly GCF_000247695.1 LC 210 Capybara RODENTIA Caviidae Hydrochoerus hydrochaeris 1. Zoonomia GCA_004027455.1 LC 211 Northern crested porcupine RODENTIA Hystricidae Hystrix cristata 1. Zoonomia GCA_004026905.1 LC 212 Thirteen-lined ground squirrel RODENTIA Sciuridae Ictidomys tridecemlineatus 2. Existing assembly GCF_000236235.1 LC 213 Lesser egyptian jerboa RODENTIA Dipodidae Jaculus jaculus 2. Existing assembly GCF_000280705.1 LC 214 Alpine marmot RODENTIA Sciuridae Marmota marmota 2. Existing assembly GCF_001458135.1 LC 215 Mongolian jird RODENTIA Muridae Meriones unguiculatus 1. Zoonomia GCA_004026785.1 LC 216 Golden hamster RODENTIA Cricetidae Mesocricetus auratus 2. Existing assembly GCF_000349665.1 VU 217 Prairie vole RODENTIA Cricetidae Microtus ochrogaster 2. Existing assembly GCF_000317375.1 LC 218 Ryukyu mouse RODENTIA Muridae Mus caroli 2. Existing assembly GCA_900094665.2 LC 219 House mouse RODENTIA Muridae Mus musculus 2. Existing assembly GCF_000001635.26 LC 220 Shrew mouse RODENTIA Muridae Mus pahari 2. Existing assembly GCA_900095145.2 LC 221 Western wild mouse RODENTIA Muridae Mus spretus 2. Existing assembly GCA_001624865.1 LC 222 Hazel dormouse RODENTIA Gliridae Muscardinus avellanarius 1. Zoonomia GCA_004027005.1 LC 223 Coypu RODENTIA Myocastoridae Myocastor coypus 1. Zoonomia GCA_004027025.1 LC 224 Upper galilee mountains blind mole rat RODENTIA Spalacidae Nannospalax galili 2. Existing assembly GCF_000622305.1 DD 225 Degu RODENTIA Octodontidae Octodon degus 2. Existing assembly GCF_000260255.1 LC 226 Muskrat RODENTIA Cricetidae Ondatra zibethicus 1. Zoonomia GCA_004026605.1 LC 227 Scorpion mouse RODENTIA Cricetidae Onychomys torridus 1. Zoonomia GCA_004026725.1 LC 228 Pacific pocket mouse RODENTIA Heteromyidae Perognathus longimembris pacificus 1. Zoonomia GCA_004363475.1 LC 229 Prairie deer mouse RODENTIA Cricetidae Peromyscus maniculatus 2. Existing assembly GCF_000500345.1 LC 230 Dassie rat RODENTIA Petromuridae Petromus typicus 1. Zoonomia GCA_004026965.1 LC 231 Fat sand rat RODENTIA Muridae Psammomys obesus 2. Existing assembly GCA_002215935.1 LC 232 Norway rat RODENTIA Muridae Rattus norvegicus 2. Existing assembly GCF_000001895.5 LC 233 Hispid cotton rat RODENTIA Cricetidae Sigmodon hispidus 1. Zoonomia GCA_004025045.1 LC 234 Daurian ground squirrel RODENTIA Sciuridae Spermophilus dauricus 2. Existing assembly GCA_002406435.1 LC 235 Greater cane rat RODENTIA Thryonomyidae Thryonomys swinderianus 1. Zoonomia GCA_004025085.1 LC 236 Cape ground squirrel RODENTIA Sciuridae Xerus inauris 1. Zoonomia GCA_004024805.1 LC 237 Meadow jumping mouse RODENTIA Dipodidae Zapus hudsonius 1. Zoonomia GCA_004024765.1 LC 238 Northern tree shrew SCANDENTIA Tupaiidae Tupaia belangeri chinensis 2. Existing assembly GCF_000334495.1 LC 239 Large treeshrew SCANDENTIA Tupaiidae Tupaia tana 1. Zoonomia GCA_004365275.1 LC 240 Florida manatee SIRENIA Trichechidae Trichechus manatus 2. Existing assembly GCF_000243295.1 EN 241 Aardvark TUBULIDENTATA Orycteropodidae Orycteropus afer 1. Zoonomia GCA_004365145.1 LC Table 1. Genome assemblies included in the 241-way Conservation track. Species status:LC = Least Concern; NT = Near threatened; VU = Vulnerable; EN = Endangered; CR = Critically endangered Display Conventions and Configuration In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: No bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: Aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: Aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: Represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: Enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: The gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: The annotations from the genome displayed in the Default species to establish reading frame pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: Codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: Codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation: Gene TrackSpecies UCSC GenesHuman Ensembl Genes v104Brazilian guinea pig, gibbon RefSeq GenesAngolan colobus, Balaenoptera acutorostrata, Bison bison, Black flying-fox, Brandt's myotis (bat), Bushbaby, Camelus bactrianus, Camelus ferus, Canis lupus familiaris, Cape elephant shrew, Capra hircus, Cavia porcellus, Ceratotherium simum, Cercocebus atys, Chinchilla, Chinese tree shrew, Chlorocebus sabaeus, Condylura cristata, Damara mole rat, Dasypus novemcinctus, David's myotis (bat), Delphinapterus leucas, Echinops telfairi, Enhydra lutris, Eptesicus fuscus, Equus asinus, Equus przewalskii, Erinaceus europaeus, Felis catus, Heterocephalus glaber, Jaculus jaculus, Kangaroo rat, Killer whale, Leptonychotes weddellii, Lipotes vexillifer, Little brown bat, Loxodonta africana, Macaca fascicularis, Macaca nemestrina, Mandrillus leucophaeus, Manis javanica, Marmota marmota, Mesocricetus auratus, Miniopterus natalensis, Mus musculus, Nannospalax galili, Ochotona princeps, Octodon degus, Oryctolagus cuniculus, Pacific walrus, Pan paniscus, Panthera tigris, Peromyscus maniculatus, Prairie vole, Propithecus coquereli, Pteropus vampyrus, Puma concolor, Rattus norvegicus, Rhinopithecus bieti, Shrew, Squirrel monkey, Squirrel, Sus scrofa, Trichechus manatus, Ursus maritimus, White-faced sapajou, Wild yak no annotationAcinonyx jubatus, Acomys cahirinus, Ailuropoda melanoleuca, Ailurus fulgens, Allactaga bullata, Alouatta palliata, Ammotragus lervia, Anoura caudifer, Antilocapra americana, Aotus nancymaae, Aplodontia rufa, Artibeus jamaicensis, Ateles geoffroyi, Balaenoptera bonaerensis, Beatragus hunteri, Bos indicus, Bos taurus, Bubalus bubalis, Callicebus donacophilus, Callithrix jacchus, Camelus dromedarius, Canis lupus, Capra aegagrus, Capromys pilorides, Carollia perspicillata, Castor canadensis, Catagonus wagneri, Cavia tschudii, Cebus albifrons, Ceratotherium simum cottoni, Cercopithecus neglectus, Chaetophractus vellerosus, Cheirogaleus medius, Choloepus didactylus, Choloepus hoffmanni, Chrysochloris asiatica, Craseonycteris thonglongyai, Cricetomys gambianus, Cricetulus griseus, Crocidura indochinensis, Cryptoprocta ferox, Ctenodactylus gundi, Ctenomys sociabilis, Cuniculus paca, Dasyprocta punctata, Daubentonia madagascariensis, Desmodus rotundus, Dicerorhinus sumatrensis, Diceros bicornis, Dinomys branickii, Dipodomys stephensi, Dolichotis patagonum, Elaphurus davidianus, Ellobius lutescens, Ellobius talpinus, Equus caballus, Erythrocebus patas, Eschrichtius robustus, Eubalaena japonica, Eulemur flavifrons, Eulemur fulvus, Felis nigripes, Galeopterus variegatus, Giraffa tippelskirchi, Glis glis, Gorilla gorilla, Graphiurus murinus, Helogale parvula, Hemitragus hylocrius, Heterohyrax brucei, Hippopotamus amphibius, Hipposideros armiger, Hipposideros galeritus, Hyaena hyaena, Hydrochoerus hydrochaeris, Hystrix cristata, Indri indri, Inia geoffrensis, Kogia breviceps, Lasiurus borealis, Lemur catta, Lepus americanus, Lycaon pictus, Macaca mulatta, Macroglossus sobrinus, Manis pentadactyla, Megaderma lyra, Mellivora capensis, Meriones unguiculatus, Mesoplodon bidens, Microcebus murinus, Microgale talazaci, Micronycteris hirsuta, Miniopterus schreibersii, Mirounga angustirostris, Mirza coquereli, Monodon monoceros, Mormoops blainvillei, Moschus moschiferus, Mungos mungo, Murina feae, Mus caroli, Mus pahari, Mus spretus, Muscardinus avellanarius, Mustela putorius, Myocastor coypus, Myotis myotis, Myrmecophaga tridactyla, Nasalis larvatus, Neomonachus schauinslandi, Neophocaena asiaeorientalis, Noctilio leporinus, Nycticebus coucang, Odocoileus virginianus, Okapia johnstoni, Ondatra zibethicus, Onychomys torridus, Orycteropus afer, Ovis aries, Ovis canadensis, Pan troglodytes, Panthera onca, Panthera pardus, Pantholops hodgsonii, Papio anubis, Paradoxurus hermaphroditus Table 2. Gene tracks used for codon translation. Methods The Zoonomia alignment was composed of two sets of mammalian genomes: newly assembled DISCOVAR assemblies and GenBank assemblies. The DISCOVAR genomes were masked with RepeatMasker (commit 2d947604), using Repbase version 20170127 as the repeat library and CrossMatch as the alignment engine. The pipeline used is available at repeatMaskerPipeline (commit a6ad966). The guide-tree topology was taken from the TimeTree database (using release current in October 2018), and the branch lengths were estimated using the least-squares-fit mode of PHYLIP, version 3.695. The distance matrix used was largely based on distances from the 4d site trees from the UCSC browser. To add those species not present in the UCSC tree, approximate distances estimated by Mash (commit 541971b) to the closest UCSC species were added to the distance between the two closest UCSC species. We used the HAL package (commit 68db41d) produce the HAL file. Phylogenetic Tree Model The phyloP are phylogenetic methods that rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The all-species tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 241-way alignment (msa_view). The 4d sites were derived from the RefSeq (Reviewed+Coding) gene set, filtered to select single-coverage long transcripts. This same tree model was used in the phyloP calculations; however, the background frequencies were modified to maintain reversibility. The resulting tree model: all species. PhyloP Conservation The phyloP program supports several different methods for computing p-values of conservation or acceleration, for individual nucleotides or larger elements ( http://compgen.cshl.edu/phast/). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., the --subtree option was not used). The scores were computed by performing a likelihood ratio test at each alignment column (--method LRT), and scores for both conservation and acceleration were produced (--mode CONACC). References Zoonomia: Zoonomia Consortium.. A comparative genomics multitool for scientific discovery and conservation. Nature. 2020 Nov;587(7833):240-245. PMID: 33177664; PMC: PMC7759459; DOI: 10.1038/s41586-020-2876-6 Cactus: Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111 Harris RS. Improved pairwise alignment of genomic DNA. Ph.D. Thesis. Pennsylvania State University, USA. 2007. PhyloP: Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program., Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005 Jul;15(7):901-13. PMID: 15965027; PMC: PMC1172034; DOI: 10.1101/gr.3577405 Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010 Jan;20(1):110-21. PMID: 19858363; PMC: PMC2798823 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. DOI: 10.1007/0-387-27733-1_12 Siepel A, Pollard KS, and Haussler D. New methods for detecting lineage-specific selection. In Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB 2006), pp. 190-205. DOI: 10.1007/11732990_17 cons241wayViewalign Cactus Alignments Cactus Alignment & Conservation of Zoonomia Placental Mammals (241 Species) Comparative Genomics cactus241wayBM Cactus Align Cactus Alignments of Zoonomia 241 Placental Mammals Comparative Genomics cons241wayViewphyloP Basewise Conservation (phyloP) Cactus Alignment & Conservation of Zoonomia Placental Mammals (241 Species) Comparative Genomics phyloP241wayBW Basewise Cons PhyloP Basewise Conservation of Zoonomia 241 Placental Mammals Comparative Genomics covidHgiGwasR4Pval COVID GWAS v4 COVID risk variants from GWAS meta-analyses by the COVID-19 Host Genetics Initiative (Rel 4, Oct 2020) Phenotype and Literature Description This track set shows the results of the GWAS Data Release 4 (October 2020) from the COVID-19 Host Genetics Initiative (HGI): a collaborative effort to facilitate the generation of meta-analysis across multiple studies contributed by partners world-wide to identify the genetic determinants of SARS-CoV-2 infection susceptibility, disease severity and outcomes. The COVID-19 HGI also aims to provide a platform for study partners to share analytical results in the form of summary statistics and/or individual level data of COVID-19 host genetics research. At the time of this release, a total of 137 studies were registered with this effort. The specific phenotypes studied by the COVID-19 HGI are those that benefit from maximal sample size: primary analysis on disease severity. For the Data Release 4 the number of cases have increased by nearly ten-fold (more than 30,000 COVID-19 cases and 1.47 million controls) by combining data from 34 studies across 16 countries. The four tracks here are based on data from HGI meta-analyses A2, B2, C1, and C2, described here: Severe COVID vars (A2): Cases with very severe respiratory failure confirmed for COVID-19 vs. population (i.e. everybody that is not a case). The increased sample size resulted in strong evidence of seven genomic regions associated with severe COVID-19 and one additional signal associated with COVID-19 partial-susceptibility. Many of these regions were identified by the Genetics of Mortality in Critical Care (GenOMICC) study and are shown below (table adapted from Pairo-Castineira et. al.). SNP Human GRCh37/hg19 Assembly Human GRCh38/hg38 Assembly Risk Allele Alternative Gene nearest to SNP rs73064425 chr3:45901089-45901089 chr3:45859597-45859597 T C LZTFL1 rs9380142 chr6:29798794-29798794 chr6:29831017-29831017 A G HLA-G rs143334143 chr6:31121426-31121426 chr6:31153649-31153649 A G CCHCR1 rs10735079 chr12:113380008-113380008 chr12:112942203-112942203 A G OAS3 rs74956615 chr19:10427721-10427721 chr19:10317045-10317045 A T ICAM5/TYK2 rs2109069 chr19:4719443-4719443 chr19:4719431-4719431 A G DPP9 rs2236757 chr21:34624917-34624917 chr21:33252612-33252612 A G IFNAR2 Hosp COVID vars (B2): Cases hospitalized and confirmed for COVID-19 vs. population (i.e. everybody that is not a case) Tested COVID vars (C1): Cases with laboratory confirmed SARS-CoV-2 infection, or health record/physician-confirmed COVID-19, or self-reported COVID-19 via questionare vs. laboratory /self-reported negative cases All COVID vars (C2): Cases with laboratory confirmed SARS-CoV-2 infection, or health record/physician-confirmed COVID-19, or self-reported COVID-19 vs. population (i.e. everybody that is not a case) Due to privacy concerns, these browser tracks exclude data provided by 23andMe contributed studies in the full analysis results. The actual study and case and control counts for the individual browser tracks are listed in the track labels. Details on all studies can be found here. Display Conventions Displayed items are colored by GWAS effect: red for positive (harmful) effect, blue for negative (protective) effect. The height ('lollipop stem') of the item is based on statistical significance (p-value). For better visualization of the data, only SNPs with p-values smaller than 1e-3 are displayed by default. The color saturation indicates effect size (beta coefficient): values over the median of effect size are brightly colored (bright red    , bright blue    ), those below the median are paler (light red    , light blue    ). Each track has separate display controls and data can be filtered according to the number of studies, minimum -log10 p-value, and the effect size (beta coefficient), using the track Configure options. Mouseover on items shows the rs ID (or chrom:pos if none assigned), both the non-effect and effect alleles, the effect size (beta coefficient), the p-value, and the number of studies. Additional information on each variant can be found on the details page by clicking on the item. Methods COVID-19 Host Genetics Initiative (HGI) GWAS meta-analysis round 4 (October 2020) results were used in this study. Each participating study partner submitted GWAS summary statistics for up to four of the COVID-19 phenotype definitions. Data were generated from genome-wide SNP array and whole exome and genome sequencing, leveraging the impact of both common and rare variants. The statistical analysis performed takes into account differences between sex, ancestry, and date of sample collection. Alleles were harmonized across studies and reported allele frequencies are based on gnomAD version 3.0 reference data. Most study partners used the SAIGE GWAS pipeline in order to generate summary statistics used for the COVID-19 HGI meta-analysis. The summary statistics of individual studies were manually examined for inflation, deflation, and excessive number of false positives. Qualifying summary statistics were filtered for INFO > 0.6 and MAF > 0.0001 prior to meta-analyzing the entirety of the data. The meta-analysis was performed using fixed effects inverse variance weighting. The meta-analysis software and workflow are available here. More information about the prospective studies, processing pipeline, results and data sharing can be found here. Data Access The data underlying these tracks and summary statistics results are publicly available in COVID19-hg Release 4 (October 2020). The raw data can be explored interactively with the Table Browser, or the Data Integrator. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. Credits Thanks to the COVID-19 Host Genetics Initiative contributors and project leads for making these data available, and in particular to Rachel Liao, Juha Karjalainen, and Kumar Veerapen at the Broad Institute for their review and input during browser track development. References COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet. 2020 Jun;28(6):715-718. PMID: 32404885; PMC: PMC7220587 Pairo-Castineira E, Clohisey S, Klaric L, Bretherick AD, Rawlik K, Pasko D, Walker S, Parkinson N, Fourman MH, Russell CD et al. Genetic mechanisms of critical illness in Covid-19. Nature. 2020 Dec 11;. PMID: 33307546 covid COVID Data Container of SARS-CoV-2 data Phenotype and Literature Description This is a container track for all data related to SARS-CoV-2 for hg38 in the UCSC Genome Browser. Click into any of the sub-tracks to see information details on the specific annotations. covidHgiGwasR4PvalC2 All COVID vars COVID risk variants from the COVID-19 HGI GWAS Analysis C2 (17965 cases, 33 studies, Rel 4: Oct 2020) Phenotype and Literature covidHgiGwasR4PvalC1 Tested COVID vars Tested COVID risk variants from the COVID-19 HGI GWAS Analyis C1 (11085 cases, 20 studies, Rel 4: Oct 2020) Phenotype and Literature covidHgiGwasR4PvalB2 Hosp COVID vars Hospitalized COVID risk variants from the COVID-19 HGI GWAS Analysis B2 (7885 cases, 21 studies, Rel 4: Oct 2020) Phenotype and Literature covidHgiGwasR4PvalA2 Severe COVID vars Severe respiratory COVID risk variants from the COVID-19 HGI GWAS Analysis A2 (4336 cases, 12 studies, Rel 4: Oct 2020) Phenotype and Literature crossTissueMapsFullDetails Cross Tissue Details Cross tissue nuclei full details Single Cell RNA-seq Description This track collection shows data from Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. The dataset covers ~200,000 single nuclei from a total of 16 human donors across 25 samples, using 4 different sample preparation protocols followed by droplet based single-cell RNA-seq. The samples were obtained from frozen tissue as part of the Genotype-Tissue Expression (GTEx) project. Samples were taken from the esophagus, skeletal muscle, heart, lung, prostate, breast, and skin. The dataset includes 43 broad cell classes, some specific to certain tissues and some shared across all tissue types. This track collection contains three bar chart tracks of RNA expression. The first track, Cross Tissue Nuclei, allows cells to be grouped together and faceted on up to 4 categories: tissue, cell class, cell subclass, and cell type. The second track, Cross Tissue Details, allows cells to be grouped together and faceted on up to 7 categories: tissue, cell class, cell subclass, cell type, granular cell type, sex, and donor. The third track, GTEx Immune Atlas, allows cells to be grouped together and faceted on up to 5 categories: tissue, cell type, cell class, sex, and donor. Please see the GTEx portal for further interactive displays and additional data. Display Conventions and Configuration Tissue-cell type combinations in the Full and Combined tracks are colored by which cell type they belong to in the below table: Color Cell Type Endothelial Epithelial Glia Immune Neuron Stromal Other Tissue-cell type combinations in the Immune Atlas track are shaded according to the below table: Color Cell Type Inflammatory Macrophage Lung Macrophage Monocyte/Macrophage FCGR3A High Monocyte/Macrophage FCGR3A Low Macrophage HLAII High Macrophage LYVE1 High Proliferating Macrophage Dendritic Cell 1 Dendritic Cell 2 Mature Dendritic Cell Langerhans CD14+ Monocyte CD16+ Monocyte LAM-like Other Methods Using the previously collected tissue samples from the Genotype-Tissue Expression project, nuclei were isolated using four different protocols and sequenced using droplet based single cell RNA-seq. CellBender v2.1 and other standard quality control techniques were applied, resulting in 209,126 nuclei profiles across eight tissues, with a mean of 918 genes and 1519 transcripts per profile. Data from all samples was integrated with a conditional variation autoencoder in order to correct for multiple sources of variation like sex, and protocol while preserving tissue and cell type specific effects. For detailed methods, please refer to Eraslan et al, or the GTEx portal website. UCSC Methods The gene expression files were downloaded from the GTEx portal. The UCSC command line utilities matrixClusterColumns, matrixToBarChartBed, and bedToBigBed were used to transform these into a bar chart format bigBed file that can be visualized. The UCSC utilities can be found on our download server. Data Access The raw bar chart data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions or our Data Access FAQ for more information. Credits Thanks to the GTEx Consortium for creating and analyzing these data. References Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Slyper M, Wang J, Van Wittenberghe N, Rouhana JM, Waldman J et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science. 2022 May 13;376(6594):eabl4290. PMID: 35549429; PMC: PMC9383269 gnomadGenomesVariantsV3_1 gnomAD v3.1 Genome Aggregation Database (gnomAD) Genome Variants v3.1 Variation Description The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post. The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog. GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes. The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level. The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README. gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site. For questions on the gnomAD data, also see the gnomAD FAQ. More details on the Variant type(s) can be found on the Sequence Ontology page. Display Conventions and Configuration gnomAD v3.1.1 The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track, except as noted below. There is a Non-cancer filter used to exclude/include variants from samples of individuals who were not ascertained for having cancer in a cancer study. There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only). Where possible, variants overlapping multiple transcripts/genes have been collapsed into one variant, with additional information available on the details page, which has roughly halved the number of items in the bigBed. The bigBed has been split into two files, one with the information necessary for the track display, and one with the information necessary for the details page. For more information on this data format, please see the Data Access section below. The VEP annotation is shown as a table instead of spread across multiple fields. Intergenic variants have not been pre-filtered. gnomAD v3.1 By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters described below), before the track switches to dense display mode. Mouse hover on an item will display many details about each variant, including the affected gene(s), the variant type, and annotation (missense, synonymous, etc). Clicking on an item will display additional details on the variant, including a population frequency table showing allele count in each sub-population. Following the conventions on the gnomAD browser, items are shaded according to their Annotation type: pLoF Missense Synonymous Other Label Options To maintain consistency with the gnomAD website, variants are by default labeled according to their chromosomal start position followed by the reference and alternate alleles, for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional label, if the variant is present in dbSnp. Filtering Options Three filters are available for these tracks: FILTER: Used to exclude/include variants that failed Random Forest (RF), Inbreeding Coefficient (Inbreeding Coeff), or Allele Count (AC0) filters. The PASS option is used to include/exclude variants that pass all of the RF, InbreedingCoeff, and AC0 filters, as denoted in the original VCF. Annotation type: Used to exclude/include variants that are annotated as Probability Loss of Function (pLoF), Missense, Synonymous, or Other, as annotated by VEP version 85 (GENCODE v19). Variant Type: Used to exclude/include variants according to the type of variation, as annotated by VEP v85. There is one additional configurable filter on the minimum minor allele frequency. gnomAD v2.1.1 The gnomAD v2.1.1 track follows the standard display and configuration options available for VCF tracks, briefly explained below. In dense mode, a vertical line is drawn at the position of each variant. In pack mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts. Filtering Options Four filters are available for these tracks, the same as the underlying VCF: AC0: Allele Count 0 after filtering out low confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls)) InbreedingCoeff: Inbreeding Coefficient < -0.3 RF: Used to exclude/include variants that failed Random Forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels) Pass: Variant passes all 3 filters There are two additional filters available, one for the minimum minor allele frequency, and a configurable filter on the QUAL score. UCSC Methods The gnomAD v3.1.1 data is unfiltered. For the v3.1 update only, in order to cut down on the amount of displayed data, the following variant types have been filtered out, but are still viewable in the gnomAD browser: Regulatory Region Variants Downstream/Upstream Gene Variants Transcription Factor Binding Site Variants For the full steps used to create the track at UCSC, please see the section denoted "gnomAD v3.1 update" in the hg38 makedoc. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Variant VCFs can be found in the vcf/ subdirectory. The v3.1 and v3.1.1 variants can be found in a special directory as they have been transformed from the underlying VCF. For the v3.1.1 variants in particular, the underlying bigBed only contains enough information necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are available in the same directory as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip compressed extra data in JSON format, and the .gzi file is available to speed searching of this data. Each variant has an associated md5sum in the name field of the bigBed which can be used along with the _dataOffset and _dataLen fields to get the associated external data, as show below: # find item of interest: bigBedToBed genomes.bb stdout | head -4 | tail -1 chr1 12416 12417 854246d79dc5d02dcdbd5f5438542b6e [..omitted for brevity..] chr1-12417-G-A 67293 902 # use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data: bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz 854246d79dc5d02dcdbd5f5438542b6e {"DDX11L1": {"cons": ["non_coding_transcript_variant", [..omitted for brevity..] The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. The mutational constraints score was updated in October 2022 from a previous, now deprecated, pre-publication version. The old version can be found in our archive directory on the download server. It can be loaded by copying the URL into our "Custom tracks" input box. Credits Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (ODbL) as described here. References Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. doi: https://doi.org/10.1101/531210. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207 Chen S, Francioli L, Goodrich J, Collins R, Wang Q, Alfoldi J, Watts N, Vittal C, Gauthier L, Poterba T, Wilson M A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. Biorxiv 2022 gnomadGenomesVariantsV3_1_1 gnomAD v3.1.1 Genome Aggregation Database (gnomAD) Genome Variants v3.1.1 Variation Description The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post. The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog. GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes. The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level. The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README. gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site. For questions on the gnomAD data, also see the gnomAD FAQ. More details on the Variant type(s) can be found on the Sequence Ontology page. Display Conventions and Configuration gnomAD v3.1.1 The gnomAD v3