cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 cpgIslandExt CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 3 1 0 100 0 128 228 128 0 0 0
CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.
\ \\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\
\ \\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\
\ \CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \
\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\
\ \The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\ \ where N = length of sequence.\
\ The calculation of the track data is performed by the following command sequence:\
\ twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\ | cpg_lh /dev/stdin 2> cpg_lh.err \\\ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\ | sort -k1,1 -k2,2n > cpgIsland.bed\\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\ \ \
\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.
\\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\
\ \This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).
\ \\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\
\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper pack\ priority 1\ shortLabel CpG Islands\ track cpgIslandExt\ ncbiRefSeq RefSeq All genePred NCBI RefSeq genes, curated and predicted (NM_*, XM_*, NR_*, XR_*, NP_*, YP_*) 1 1 12 12 120 133 133 187 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ idXref ncbiRefSeqLink mrnaAcc name\ longLabel NCBI RefSeq genes, curated and predicted (NM_*, XM_*, NR_*, XR_*, NP_*, YP_*)\ parent refSeqComposite on\ priority 1\ shortLabel RefSeq All\ track ncbiRefSeq\ rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 1 0 0 0 127 127 127 1 0 0\ This track was created by using Arian Smit's\ RepeatMasker\ program, which screens DNA sequences\ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the\ query sequence (represented by this track), as well as a modified version\ of the query sequence in which all the annotated repeats have been masked\ (generally available on the\ Downloads page). RepeatMasker uses the\ Repbase Update library of repeats from the\ Genetic \ Information Research Institute (GIRI).\ Repbase Update is described in Jurka (2000) in the References section below.\ Some newer assemblies have been made with Dfam, not Repbase. You can\ find the details for how we make our database data here in our "makeDb/doc/"\ directory.
\ \\ In full display mode, this track displays up to ten different classes of repeats:\
\ The level of color shading in the graphical display reflects the amount of\ base mismatch, base deletion, and base insertion associated with a repeat\ element. The higher the combined number of these, the lighter the shading.\
\ \\ A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that\ the curator was unsure of the classification. At some point in the future,\ either the "?" will be removed or the classification will be changed.
\ \\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. Repeats are soft-masked. Alignments may\ extend through repeats, but are not permitted to initiate in them.\ See the FAQ for more information.\
\ \\ Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and\ repeat libraries used to generate this track.\
\ \\ Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0.\ \ http://www.repeatmasker.org. 1996-2010.\
\ \\ Repbase Update is described in:\
\ \\ Jurka J.\ \ Repbase Update: a database and an electronic journal of repetitive elements.\ Trends Genet. 2000 Sep;16(9):418-420.\ PMID: 10973072\
\ \\ For a discussion of repeats in mammalian genomes, see:\
\ \\ Smit AF.\ \ Interspersed repeats and other mementos of transposable elements in mammalian genomes.\ Curr Opin Genet Dev. 1999 Dec;9(6):657-63.\ PMID: 10607616\
\ \\ Smit AF.\ \ The origin of interspersed repeats in the human genome.\ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\ PMID: 8994846\
\ varRep 0 canPack off\ group varRep\ longLabel Repeating Elements by RepeatMasker\ maxWindowToDraw 10000000\ priority 1\ shortLabel RepeatMasker\ spectrum on\ track rmsk\ type rmsk\ visibility dense\ refSeqComposite NCBI RefSeq genePred RefSeq genes from NCBI 1 2 0 0 0 127 127 127 0 0 0\ The NCBI RefSeq Genes composite track shows ferret protein-coding and non-protein-coding\ genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use\ coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by\ realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences\ between the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we advise\ using NCBI aligned tables like RefSeq All or RefSeq Curated. See the \ Methods section for more details about how the different tracks were \ created.
\\ Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, \ submit additions and corrections, or ask for help concerning RefSeq records.
\ \\ For more information on the different gene tracks, see our Genes FAQ.
\ \\ This track is a composite track that contains differing data sets.\ To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to \ hide. Note: Not all subtracts are available on all assemblies.
\ \ The possible subtracks include:\\ The RefSeq All, RefSeq Curated, RefSeq Predicted, and\ UCSC RefSeq tracks follow the display conventions for\ gene prediction tracks.\ The color shading indicates the level of review the RefSeq record has undergone:\ predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq.
\ \\
Color | \Level of review | \
---|---|
\ | Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information. | \
\ | Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff. | \
\ | Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted. | \
\ The item labels and codon display properties for features within this track can be configured \ through the check-box controls at the top of the track description page. To adjust the settings \ for an individual subtrack, click the wrench icon next to the track name in the subtrack list .
\The RefSeq Diffs track contains five different types of inconsistency between the\ reference genome sequence and the RefSeq transcript sequences. The five types of differences are\ as follows:\
\ When reporting HGVS with RefSeq sequences, to make sure that results from\ research articles can be mapped to the genome unambiguously, \ please specify the RefSeq annotation release displayed on the transcript's\ Genome Browser details page and also the RefSeq transcript ID with version\ (e.g. NM_012309.4 not NM_012309). \
\ \ \ \\ Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using \ data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and \ converted to the genePred and PSL table formats for display in the Genome Browser. Information about\ the NCBI annotation pipeline can be found \ here.
\ \The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments.
\\ The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks.\ RefSeq RNAs were aligned against the ferret genome using BLAT. Those with an alignment of\ less than 15% were discarded. When a single RNA aligned in multiple places, the alignment\ having the highest base identity was identified. Only alignments having a base identity\ level within 0.1% of the best and at least 96% base identity with the genomic sequence were\ kept.
\ \\ The raw data for these tracks can be accessed in multiple ways. It can be explored interactively \ using the REST API,\ Table Browser or\ Data Integrator. The tables can also be accessed programmatically through our\ public MySQL server or downloaded from our\ downloads server for local processing. The previous track versions are available\ in the archives of our downloads server. You can also access any RefSeq table\ entries in JSON format through our \ JSON API.
\\ The data in the RefSeq Other and RefSeq Diffs tracks are organized in \ bigBed file format; more\ information about accessing the information in this bigBed file can be found\ below. The other subtracks are associated with database tables as follows:
\\ The first column of each of these tables is "bin". This column is designed\ to speed up access for display in the Genome Browser, but can be safely ignored in downstream\ analysis. You can read more about the bin indexing system\ here.
\\ The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed \ files, which can be obtained from our downloads server here,\ ncbiRefSeqOther.bb and \ ncbiRefSeqDiffs.bb.\ Individual regions or the whole set of genome-wide annotations can be obtained using our tool\ bigBedToBed which can be compiled from the source code or downloaded as a precompiled\ binary for your system from the utilities directory linked below. For example, to extract only\ annotations in a given region, you could use the following command:
\\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/musFur1/ncbiRefSeq/ncbiRefSeqOther.bb\ -chrom=chr16 -start=34990190 -end=36727467 stdout
\\ You can download a GTF format version of the RefSeq All table from the \ GTF downloads directory.\ The genePred format tracks can also be converted to GTF format using the\ genePredToGtf utility, available from the\ utilities directory on the UCSC downloads \ server. The utility can be run from the command line like so:
\ genePredToGtf musFur1 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf\\ Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore \ must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access\ section.
\\ A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, \ and RefSeq Predicted tracks can be found on our downloads server\ here.
\\ Please refer to our mailing list archives for questions.
\ \\ Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server.\
\ \\ This track was produced at UCSC from data generated by scientists worldwide and curated by the\ NCBI RefSeq project.
\ \\ Kent WJ.\ BLAT - the BLAST-like \ alignment tool. Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518
\\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: \ PMC3965018
\\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts \ and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979
\ genes 1 allButtonPair on\ compositeTrack on\ dataVersion /gbdb/$D/ncbiRefSeq/ncbiRefSeqVersion.txt\ dbPrefixLabels hg="HGNC" dm="FlyBase" ce="WormBase" rn="RGD" sacCer="SGD" danRer="ZFIN" mm="MGI" xenTro="XenBase"\ dbPrefixUrls hg="http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$" dm="http://flybase.org/reports/$$" ce="http://www.wormbase.org/db/gene/gene?name=$$" rn="https://rgd.mcw.edu/rgdweb/search/search.html?term=$$" sacCer="https://www.yeastgenome.org/locus/$$" danRer="https://zfin.org/$$" mm="http://www.informatics.jax.org/marker/$$" xenTro="https://www.xenbase.org/gene/showgene.do?method=display&geneId=$$"\ dragAndDrop subTracks\ group genes\ longLabel RefSeq genes from NCBI\ noInherit on\ priority 2\ shortLabel NCBI RefSeq\ track refSeqComposite\ type genePred\ visibility dense\ ncbiRefSeqCurated RefSeq Curated genePred NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*) 1 2 12 12 120 133 133 187 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ idXref ncbiRefSeqLink mrnaAcc name\ longLabel NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*)\ parent refSeqComposite on\ priority 2\ shortLabel RefSeq Curated\ track ncbiRefSeqCurated\ cpgIslandExtUnmasked Unmasked CpG bed 4 + CpG Islands on All Sequence (Islands < 300 Bases are Light Green) 0 2 0 100 0 128 228 128 0 0 0CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.
\ \\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\
\ \\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\
\ \CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \
\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\
\ \The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\ \ where N = length of sequence.\
\ The calculation of the track data is performed by the following command sequence:\
\ twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\ | cpg_lh /dev/stdin 2> cpg_lh.err \\\ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\ | sort -k1,1 -k2,2n > cpgIsland.bed\\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\ \ \
\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.
\\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\
\ \This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).
\ \\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\
\ regulation 1 html cpgIslandSuper\ longLabel CpG Islands on All Sequence (Islands < 300 Bases are Light Green)\ parent cpgIslandSuper hide\ priority 2\ shortLabel Unmasked CpG\ track cpgIslandExtUnmasked\ ncbiRefSeqPredicted RefSeq Predicted genePred NCBI RefSeq genes, predicted subset (XM_* or XR_*) 1 3 12 12 120 133 133 187 0 0 0 genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ idXref ncbiRefSeqLink mrnaAcc name\ longLabel NCBI RefSeq genes, predicted subset (XM_* or XR_*)\ parent refSeqComposite off\ priority 3\ shortLabel RefSeq Predicted\ track ncbiRefSeqPredicted\ ncbiRefSeqPsl RefSeq Alignments psl RefSeq Alignments of RNAs 1 5 0 0 0 127 127 127 0 0 0 genes 1 baseColorDefault diffCodons\ baseColorUseCds table ncbiRefSeqCds\ baseColorUseSequence extFile seqNcbiRefSeq extNcbiRefSeq\ color 0,0,0\ idXref ncbiRefSeqLink mrnaAcc name\ indelDoubleInsert on\ indelQueryInsert on\ longLabel RefSeq Alignments of RNAs\ parent refSeqComposite off\ pepTable ncbiRefSeqPepTable\ priority 5\ pslSequence no\ shortLabel RefSeq Alignments\ showCdsAllScales .\ showCdsMaxZoom 10000.0\ showDiffBasesAllScales .\ showDiffBasesMaxZoom 10000.0\ track ncbiRefSeqPsl\ type psl\ refGene UCSC RefSeq genePred refPep refMrna UCSC annotations of RefSeq RNAs (NM_* and NR_*) 1 7 12 12 120 133 133 187 0 0 0\ The RefSeq Genes track shows known ferret protein-coding and\ non-protein-coding genes taken from the NCBI RNA reference sequences\ collection (RefSeq). The data underlying this track are updated weekly.
\ \\ Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to\ make suggestions, submit additions and corrections, or ask for help concerning\ RefSeq records.\
\ \\ For more information on the different gene tracks, see our Genes FAQ.
\ \\ This track follows the display conventions for\ \ gene prediction tracks.\ The color shading indicates the level of review the RefSeq record has\ undergone: predicted (light), provisional (medium), reviewed (dark).\
\ \\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page.\
\ RefSeq RNAs were aligned against the ferret genome using BLAT. Those\ with an alignment of less than 15% were discarded. When a single RNA\ aligned in multiple places, the alignment having the highest base identity\ was identified. Only alignments having a base identity level within 0.1% of\ the best and at least 96% base identity with the genomic sequence were kept.\
\ \\ This track was produced at UCSC from RNA sequence data generated by scientists\ worldwide and curated by the NCBI\ RefSeq project.\
\ \\ Kent WJ.\ \ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ \\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ \ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: PMC3965018\
\ \\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979\
\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,12,120\ dataVersion \ group genes\ idXref hgFixed.refLink mrnaAcc name\ longLabel UCSC annotations of RefSeq RNAs (NM_* and NR_*)\ parent refSeqComposite off\ priority 7\ shortLabel UCSC RefSeq\ track refGene\ type genePred refPep refMrna\ visibility dense\ est Ferret ESTs psl est Ferret ESTs Including Unspliced 0 100 0 0 0 127 127 127 1 0 0\ This track shows alignments between ferret expressed sequence tags\ (ESTs) in \ GenBank and the genome. ESTs are single-read sequences,\ typically about 500 bases in length, that usually represent fragments of\ transcribed genes.\
\ \\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\
\ \\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\
\ \\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\
\ \\ To use the filter:\
\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those\ that differ from the genomic sequence. For more information about this option,\ go to the\ \ Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\
\ \\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.\
\ \\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the\ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.\
\ \\ To generate this track, ferret ESTs from GenBank were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very\ long introns that might otherwise align. When a single\ EST aligned in multiple places, the alignment having the\ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity\ with the genomic sequence were kept.\
\ \\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\
\ \\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\
\ \\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\
\ \\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel Ferret ESTs Including Unspliced\ maxItems 300\ shortLabel Ferret ESTs\ spectrum on\ table all_est\ track est\ type psl est\ visibility hide\ mrna Ferret mRNAs psl . Ferret mRNAs from GenBank 3 100 0 0 0 127 127 127 0 0 0\ The mRNA track shows alignments between ferret mRNAs\ in \ GenBank and the genome.
\ \\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\
\ \\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\
\ \\ To use the filter:\
\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more\ information about this option, go to the\ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\
\ \\ GenBank ferret mRNAs were aligned against the genome using the\ blat program. When a single mRNA aligned in multiple places,\ the alignment having the highest base identity was found.\ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\
\ \\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\
\ \\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\
\ \\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\
\ \\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ longLabel Ferret mRNAs from GenBank\ shortLabel Ferret mRNAs\ showDiffBasesAllScales .\ table all_mrna\ track mrna\ type psl .\ visibility pack\ gold Assembly bed 3 + Assembly from Fragments 0 100 150 100 30 230 170 40 0 0 0\ This track shows the sequences used in the Apr. 2011 ferret genome assembly.\
\\
Genome assembly procedures are covered in the NCBI\
assembly documentation.
\
NCBI also provides\
specific information about this assembly.\
\ The definition of this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\
\\ In dense mode, this track depicts the contigs that make up the \ currently viewed scaffold. \ Contig boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist between contigs, spaces are shown between the gold and brown\ blocks. The relative order and orientation of the contigs\ within a scaffold is always known; therefore, a line is drawn in the graphical\ display to bridge the blocks.
\\ Component types found in this track (with counts of that type in parenthesis):\
\ This track shows ab initio predictions from the program\ AUGUSTUS (version 3.1).\ The predictions are based on the genome sequence alone.\
\ \\ For more information on the different gene tracks, see our Genes FAQ.
\ \\ Statistical signal models were built for splice sites, branch-point\ patterns, translation start sites, and the poly-A signal.\ Furthermore, models were built for the sequence content of\ protein-coding and non-coding regions as well as for the length distributions\ of different exon and intron types. Detailed descriptions of most of these different models\ can be found in Mario Stanke's\ dissertation.\ This track shows the most likely gene structure according to a\ Semi-Markov Conditional Random Field model.\ Alternative splicing transcripts were obtained with\ a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2\ --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).\
\ \\ The different models used by Augustus were trained on a number of different species-specific\ gene sets, which included 1000-2000 training gene structures. The --species option allows\ one to choose the species used for training the models. Different training species were used\ for the --species option when generating these predictions for different groups of\ assemblies.\
Assembly Group | \ \ \Training Species | \ \
Fish | \ \ \zebrafish\ \ |
Birds | \ \ \chicken\ \ |
Human and all other vertebrates | \ \ \human\ \ |
Nematodes | \ \ \caenorhabditis | \ \
Drosophila | \ \ \fly | \ \
A. mellifera | \ \ \honeybee1 | \ \
A. gambiae | \ \ \culex | \ \
S. cerevisiae | \ \ \saccharomyces | \ \
\ This table describes which training species was used for a particular group of assemblies.\ When available, the closest related training species was used.\
\ \\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\
\ \\ Stanke M, Waack S.\ \ Gene prediction with a hidden Markov model and a new intron submodel.\ Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25.\ PMID: 14534192\
\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,105,0\ group genes\ longLabel AUGUSTUS ab initio gene predictions v3.1\ shortLabel AUGUSTUS\ track augustusGene\ type genePred\ visibility hide\ cytoBandIdeo Chromosome Band (Ideogram) bed 4 + Ideogram for Orientation 1 100 0 0 0 127 127 127 0 0 0 map 1 group map\ longLabel Ideogram for Orientation\ shortLabel Chromosome Band (Ideogram)\ track cytoBandIdeo\ type bed 4 +\ visibility dense\ cpgIslandSuper CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 100 0 100 0 128 228 128 0 0 0CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a \ G (guanine) base (a CpG) is rare in\ vertebrate DNA because the Cs in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time,\ methylated Cs tend to turn into Ts because of spontaneous\ deamination. The result is that CpGs are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some other reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpGs are present at\ significantly higher levels than is typical for the genome as a whole.
\ \\ The unmasked version of the track displays potential CpG islands\ that exist in repeat regions and would otherwise not be visible\ in the repeat masked version.\
\ \\ By default, only the masked version of the track is displayed. To view the\ unmasked version, change the visibility settings in the track controls at\ the top of this page.\
\ \CpG islands were predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment was then\ evaluated for the following criteria:\ \
\ The entire genome sequence, masking areas included, was\ used for the construction of the track Unmasked CpG.\ The track CpG Islands is constructed on the sequence after\ all masked sequence is removed.\
\ \The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length. The ratio of observed to expected \ CpG is calculated according to the formula (cited in \ Gardiner-Garden et al. (1987)):\ \
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)\ \ where N = length of sequence.\
\ The calculation of the track data is performed by the following command sequence:\
\ twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \\\ | cpg_lh /dev/stdin 2> cpg_lh.err \\\ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\\t%d\\t%s\\t%s %s\\t%s\\t%s\\t%0.0f\\t%0.1f\\t%s\\t%s\\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \\\ | sort -k1,1 -k2,2n > cpgIsland.bed\\ The unmasked track data is constructed from\ twoBitToFa -noMask output for the twoBitToFa command.\ \ \
\ CpG islands and its associated tables can be explored interactively using the\ REST API, the\ Table Browser or the\ Data Integrator.\ All the tables can also be queried directly from our public MySQL\ servers, with more information available on our\ help page as well as on\ our blog.
\\ The source for the cpg_lh program can be obtained from\ src/utils/cpgIslandExt/.\ The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file")\
\ \This track was generated using a modification of a program developed by G. Miklem and L. Hillier \ (unpublished).
\ \\ Gardiner-Garden M, Frommer M.\ \ CpG islands in vertebrate genomes.\ J Mol Biol. 1987 Jul 20;196(2):261-82.\ PMID: 3656447\
\ regulation 1 altColor 128,228,128\ color 0,100,0\ group regulation\ html cpgIslandSuper\ longLabel CpG Islands (Islands < 300 Bases are Light Green)\ shortLabel CpG Islands\ superTrack on\ track cpgIslandSuper\ type bed 4 +\ ensGene Ensembl Genes genePred ensPep Ensembl Genes 0 100 150 0 0 202 127 127 0 0 0\ These gene predictions were generated by Ensembl.\
\ \\ For more information on the different gene tracks, see our Genes FAQ.
\ \\ For a description of the methods used in Ensembl gene predictions, please refer to\ Hubbard et al. (2002), also listed in the References section below. \
\ \\
Ensembl Gene data can be explored interactively using the\
Table Browser or the\
Data Integrator. \
For local downloads, the genePred format files for musFur1 are available in our\
\
downloads directory as ensGene.txt.gz or in our\
\
genes download directory in GTF format.
\
For programmatic access, the data can be queried from the \
REST API or\
directly from our public MySQL\
servers. Instructions on this method are available on our\
MySQL help page and on\
our blog.
\ Previous versions of this track can be found on our archive download server.\
\ \\ We would like to thank Ensembl for providing these gene annotations. For more information, please see\ Ensembl's genome annotation page.\
\ \\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al.\ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.\ PMID: 11752248; PMC: PMC99161\
\ genes 1 color 150,0,0\ exonNumbers on\ group genes\ longLabel Ensembl Genes\ shortLabel Ensembl Genes\ track ensGene\ type genePred ensPep\ visibility hide\ gap Gap bed 3 + Gap Locations 1 100 0 0 0 127 127 127 0 0 0\ This track shows the gaps in the Apr. 2011 ferret genome assembly.\
\\
Genome assembly procedures are covered in the NCBI\
assembly documentation.
\
NCBI also provides\
specific information about this assembly.\
\ The definition of the gaps in this assembly is from the\ AGP file delivered with the sequence. The NCBI document\ AGP Specification describes the format of the AGP file.\
\\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is supported by read pair data, \ it is a bridged gap and a white line is drawn \ through the black box representing the gap. \
\This assembly contains the following principal types of gaps:\
\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\
\\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \
The data and presentation of this graph were prepared by\ Hiram Clawson.\
\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ html gc5Base\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ track gc5BaseBw\ type bigWig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 0 100 170 100 0 212 177 127 0 0 0\ This track shows predictions from the\ Genscan program\ written by Chris Burge.\ The predictions are based on transcriptional, translational and donor/acceptor\ splicing signals as well as the length and compositional distributions of exons,\ introns and intergenic regions.\
\ \\ For more information on the different gene tracks, see our Genes FAQ.
\ \\ This track follows the display conventions for\ gene prediction\ tracks.\
\ \\ The track description page offers the following filter and configuration\ options:\
\ For a description of the Genscan program and the model that underlies it,\ refer to Burge and Karlin (1997) in the References section below.\ The splice site models used are described in more detail in Burge (1998)\ below.\
\ \\ Burge C.\ Modeling Dependencies in Pre-mRNA Splicing Signals.\ In: Salzberg S, Searls D, Kasif S, editors.\ Computational Methods in Molecular Biology.\ Amsterdam: Elsevier Science; 1998. p. 127-163.\
\ \\ Burge C, Karlin S.\ \ Prediction of complete gene structures in human genomic DNA.\ J. Mol. Biol. 1997 Apr 25;268(1):78-94.\ PMID: 9149143\
\ genes 1 color 170,100,0\ group genes\ longLabel Genscan Gene Predictions\ shortLabel Genscan Genes\ track genscan\ type genePred genscanPep\ visibility hide\ ucscToINSDC INSDC bed 4 Accession at INSDC - International Nucleotide Sequence Database Collaboration 0 100 0 0 0 127 127 127 0 0 0 https://www.ncbi.nlm.nih.gov/nuccore/$$\ This track associates UCSC Genome Browser chromosome names to accession\ names from the International Nucleotide Sequence Database Collaboration (INSDC).\
\ \\ The data were downloaded from the NCBI assembly database.\
\ \The data for this track was prepared by\ Hiram Clawson.\ \ map 1 group map\ longLabel Accession at INSDC - International Nucleotide Sequence Database Collaboration\ shortLabel INSDC\ track ucscToINSDC\ type bed 4\ url https://www.ncbi.nlm.nih.gov/nuccore/$$\ urlLabel INSDC link:\ visibility hide\ nestedRepeats Interrupted Rpts bed 12 + Fragments of Interrupted Repeats Joined by RepeatMasker ID 0 100 0 0 0 127 127 127 1 0 0
\ This track shows joined fragments of interrupted repeats extracted\ from the output of the \ RepeatMasker program which screens DNA sequences\ for interspersed repeats and low complexity DNA sequences using the\ \ Repbase Update library of repeats from the\ Genetic\ Information Research Institute (GIRI). Repbase Update is described in\ Jurka (2000) in the References section below.\
\ \\ The detailed annotations from RepeatMasker are in the RepeatMasker track. This\ track shows fragments of original repeat insertions which have been interrupted\ by insertions of younger repeats or through local rearrangements. The fragments\ are joined using the ID column of RepeatMasker output.\
\ \\ In pack or full mode, each interrupted repeat is displayed as boxes\ (fragments) joined by horizontal lines, labeled with the repeat name.\ If all fragments are on the same strand, arrows are added to the\ horizontal line to indicate the strand. In dense or squish mode, labels\ and arrows are omitted and in dense mode, all items are collapsed to\ fit on a single row.\
\ \\ Items are shaded according to the average identity score of their\ fragments. Usually, the shade of an item is similar to the shades of\ its fragments unless some fragments are much more diverged than\ others. The score displayed above is the average identity score,\ clipped to a range of 50% - 100% and then mapped to the range\ 0 - 1000 for shading in the browser.\
\ \\ UCSC has used the most current versions of the RepeatMasker software\ and repeat libraries available to generate these data. Note that these\ versions may be newer than those that are publicly available on the Internet.\
\ \\ Data are generated using the RepeatMasker -s flag. Additional flags\ may be used for certain organisms. See the\ FAQ for more information.\
\ \\ Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and\ repeat libraries used to generate this track.\
\ \\ Smit AFA, Hubley R, Green P.\ RepeatMasker Open-3.0.\ \ http://www.repeatmasker.org. 1996-2010.\
\ \\ Repbase Update is described in:\
\ \\ Jurka J.\ \ Repbase Update: a database and an electronic journal of repetitive elements.\ Trends Genet. 2000 Sep;16(9):418-420.\ PMID: 10973072\
\ \\ For a discussion of repeats in mammalian genomes, see:\
\ \\ Smit AF.\ \ Interspersed repeats and other mementos of transposable elements in mammalian genomes.\ Curr Opin Genet Dev. 1999 Dec;9(6):657-63.\ PMID: 10607616\
\ \\ Smit AF.\ \ The origin of interspersed repeats in the human genome.\ Curr Opin Genet Dev. 1996 Dec;6(6):743-8.\ PMID: 8994846\
\ varRep 1 exonNumbers off\ group varRep\ longLabel Fragments of Interrupted Repeats Joined by RepeatMasker ID\ shortLabel Interrupted Rpts\ track nestedRepeats\ type bed 12 +\ useScore 1\ visibility hide\ microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 100 0 0 0 127 127 127 0 0 0\ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats and tend to be highly polymorphic in the\ population.\
\ \\ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the \ Tandem Repeats Finder. For more information about this \ program, see Benson (1999).
\ \\ Tandem Repeats Finder was written by \ Gary Benson.
\ \\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\
\ varRep 1 group varRep\ longLabel Microsatellites - Di-nucleotide and Tri-nucleotide Repeats\ shortLabel Microsatellite\ track microsat\ type bed 4\ visibility hide\ xenoMrna Other mRNAs psl xeno Non-Ferret mRNAs from GenBank 0 100 0 0 0 127 127 127 1 0 0\ This track displays translated blat alignments of vertebrate and\ invertebrate mRNA in\ \ GenBank from organisms other than ferret .\
\ \\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\
\ \\ The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching\ translated genomic sequence. Because the two orientations of a DNA\ sequence give different predicted protein sequences, there are four\ combinations. ++ is not the same as --, nor is +- the same as -+.\
\ \\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\
\ \\ To use the filter:\
\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more\ information about this option, go to the\ \ Codon and Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\
\ \\ The mRNAs were aligned against the ferret genome using translated blat.\ When a single mRNA aligned in multiple places, the alignment having the\ highest base identity was found. Only those alignments having a base\ identity level within 1% of the best and at least 25% base identity with the\ genomic sequence were kept.\
\ \\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\
\ \\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\
\ \\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\
\ \\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ rna 1 baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ longLabel Non-Ferret mRNAs from GenBank\ shortLabel Other mRNAs\ showDiffBasesAllScales .\ spectrum on\ track xenoMrna\ type psl xeno\ visibility hide\ xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-Ferret RefSeq Genes 1 100 12 12 120 133 133 187 0 0 0\ This track shows known protein-coding and non-protein-coding genes \ for organisms other than ferret , taken from the NCBI RNA reference \ sequences collection (RefSeq). The data underlying this track are \ updated weekly.
\ \\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).
\\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \
\ The RNAs were aligned against the ferret genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\
\ \\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.
\ \\ Kent WJ.\ \ BLAT--the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ \\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ \ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: PMC3965018\
\ \\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979\
\ genes 1 color 12,12,120\ group genes\ longLabel Non-Ferret RefSeq Genes\ shortLabel Other RefSeq\ track xenoRefGene\ type genePred xenoRefPep xenoRefMrna\ visibility dense\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 100 0 0 0 127 127 127 0 0 0\ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.
\ \\ For more information about the TRF program, see Benson (1999).\
\ \\ TRF was written by \ Gary Benson.
\ \\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\
\ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ intronEst Spliced ESTs psl est Ferret ESTs That Have Been Spliced 1 100 0 0 0 127 127 127 1 0 0\ This track shows alignments between ferret expressed sequence tags\ (ESTs) in \ GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about\ 500 bases in length, that usually represent fragments of transcribed genes.\
\ \\ To be considered spliced, an EST must show\ evidence of at least one canonical intron (i.e., the genomic\ sequence between EST alignment blocks must be at least 32 bases in\ length and have GT/AG ends). By requiring splicing, the level\ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the\ ferret EST track.\
\ \\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, darker shading\ indicates a larger number of aligned ESTs.\
\ \\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\
\ \\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\
\ \\ To use the filter:\
\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those\ that differ from the genomic sequence. For more information about this option,\ go to the\ \ Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\
\ \\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.\
\ \\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the\ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.\
\ \\ To generate this track, ferret ESTs from GenBank were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very\ long introns that might otherwise align. When a single\ EST aligned in multiple places, the alignment having the\ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity\ with the genomic sequence are displayed in this track.\
\ \\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\
\ \\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\
\ \\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\
\ \\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\
\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel Ferret ESTs That Have Been Spliced\ maxItems 300\ shortLabel Spliced ESTs\ showDiffBasesAllScales .\ spectrum on\ track intronEst\ type psl est\ visibility dense\ windowmaskerSdust WM + SDust bed 3 Genomic Intervals Masked by WindowMasker + SDust 0 100 0 0 0 127 127 127 0 0 0\ This track depicts masked sequence as determined by\ WindowMasker. The\ WindowMasker tool is included in the NCBI C++ toolkit. The source code\ for the entire toolkit is available from the NCBI\ \ FTP site.\
\ \\ To create this track, WindowMasker was run with the following parameters:\
\ windowmasker -mk_counts true -input musFur1.fa -output wm_counts\ windowmasker -ustat wm_counts -sdust true -input musFur1.fa -output repeats.bed\\ The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for\ this track.\ \ \
\ Morgulis A, Gertz EM, Schäffer AA, Agarwala R.\ WindowMasker: window-based masker for sequenced genomes.\ Bioinformatics. 2006 Jan 15;22(2):134-41.\ PMID: 16287941\
\ varRep 1 group varRep\ longLabel Genomic Intervals Masked by WindowMasker + SDust\ shortLabel WM + SDust\ track windowmaskerSdust\ type bed 3\ visibility hide\