cytoBand Chromosome Band bed 4 + Chromosome Bands Localized by FISH Mapping Clones 1 1 0 0 0 150 50 50 0 0 0

Description

\

The chromosome band track represents the approximate \ location of bands seen on Giemsa-stained chromosomes at\ an 800 band resolution.

\

Methods

\

A full description of the method by which the chromosome \ band locations are estimated can be found in \ Furey, T.S., and Haussler, D.,Integration of the Cytogenetic Map with the\ Draft Human Genome Sequence, Hum. Mol. Gen., 12(9):1037-1044 (2003).\

\ \

Barbara Trask, Vivian Cheung, Norma Nowak and others in the BAC Resource\ Consortium used fluorescent in-situ\ hybridization (FISH) to determine a cytogenetic location for \ large genomic clones on the chromosomes.\ The results from these experiments are the primary source of information used\ in estimating the chromosome band locations.\ For more information about the BAC Resource Consortium, see "Integration of cytogenetic landmarks into the draft sequence of\ the human genome", Nature, 409:953-958, Feb. 2001 and the accompanying web site\ Human BAC Resource.\

\ \

\ BAC clone placements in the human sequence are determined at UCSC using a combination of full BAC clone sequence,\ BAC end sequence, and STS marker information.\

\

Credits

\

We would like to thank all of the labs that have contributed to this resource:\

\ map 1 stsMap STS Markers bed 5 + STS Markers on Genetic (blue) and Radiation Hybrid (black) Maps 1 4 0 0 0 128 128 255 0 0 0

Description

\

This track shows locations of Sequence Tagged Site (STS) markers\ along the draft assembly. These markers have been mapped using \ either genetic mapping (Genethon, Marshfield, and deCODE maps),\ radiation hybridization mapping (Stanford, Whitehead RH, and GeneMap99 maps) or\ YAC mapping (the Whitehead YAC map) techniques. \ Prior to August 2001, this track also\ showed the approximate position of fluorescent in situ hybridization (FISH) mapped clones.\ In the August 2001 and later assemblies, the FISH clones are displayed in a separate track.

\

Genetic map markers are shown in blue; radiation hybrid map markers are shown \ in black. When a marker maps to multiple positions in the genome, it is shown in a \ lighter color.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a set of map data \ within the track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

\

When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

Many thanks to the researchers who worked on these\ maps, and to Greg Schuler, Arek Kasprzyk, Wonhee Jang,\ Terry Furey and Sanja Rogic for helping\ process the data. Additional data on the individual maps can be\ found at the following links:\

\

\ \ map 1 fishClones FISH Clones bed 5 + Clones Placed on Cytogenetic Map Using FISH 0 6 0 150 0 127 202 127 0 0 0

Description

\

This track shows the location of fluorescent in situ hybridization (FISH) mapped clones along the \ draft assembly sequence. The locations of these clones were\ contributed as a part of The BAC Resource Consortium's paper "\ Integration of cytogenetic landmarks into the draft sequence of the\ human genome", Nature 409:953-958, Feb. 2001.

\ \

More information about the BAC clones, including how they can be\ obtained, can be found at the \ Human BAC Resource\ and the Clone Registry\ web sites hosted by NCBI.\ To view Clone Registry information for a clone, click on the clone name at the top of the details page for that item.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude the display of a dataset from an individual lab. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

\

When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

We would like to thank all of the labs that have contributed to this resource:\

\ \ \ map 1 genMapDb GenMapDB Clones bed 6 + GenMapDB BAC Clones 0 7 0 0 0 127 127 127 0 0 0

Description

\

BAC clones from GenMapDB\ are placed on the draft sequence using BAC end sequence information\ and confirmed using STS markers by Vivian Cheung's lab at the\ Department of Pediatrics, University of Pennsylvania. Further\ information about each clone can be obtained by clicking on the clone\ name on the track detail page.\

Credits

\ Thanks to Vivian Cheung's lab \ and GenMapDB at the University of Pennsylvania for providing the data used to create this track.\ map 1 recombRate Recomb Rate bed 4 + Recombination Rate from deCODE, Marshfield, or Genethon Maps (deCODE default) 0 8 0 0 0 127 127 127 0 0 0

Description

\

The recombination rate track represents\ calculated sex-averaged rates of recombination based on either the\ deCODE, Marshfield, or Genethon genetic maps. By default, the deCODE\ map rates are displayed. Female and male specific recombination\ rates, and well as rates from the Marshfield and Genethon maps, can\ also be displayed by choosing the appropriate filter option on the track description page.\

\ \

Methods

\

The deCODE genetic map was created at \ deCODE Genetics and is \ based on 5,136 microsatellite markers for 146 families with a total\ of 1,257 meiotic events. For more information on this map, see\ A. Kong et. al., "A high-resolution recombination map of the human genome", \ Nature Genetics, 31(3), pages 241-247 (2002).\

\

The Marshfield genetic map was created at the Center\ for Medical Genetics and is based on 8,325 short tandem repeat\ polymorphisms (STRPs) for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map,\ see K.W. Broman et. al.,\ "Comprehensive\ Human Genetic Maps: Individual and Sex-Specific Variation in\ Recombination", American Journal of Human Genetics\ 63:861-689 (1998).\

\

The Genethon genetic map was created at Genethon and is\ based on 5,264 microsatellites for 8 CEPH families consisting of 134\ individuals with 186 meioses. For more information on this map,\ see Dib et. al., "A Comprehensive Genetic Map of the Human Genome\ Based on 5,264 Microsatellites", Nature, 380, pages 152-154\ (1996).\

\

Each base is assigned the recombination rate calculated by\ assuming a linear genetic distance across the immediately flanking\ genetic markers. The recombination rate assigned to each 1Mb window\ is the average recombination rate of the bases contained within the\ window.\

\ \

Using the Filter

\

To view a particular map or gender-specific rate, select the corresponding\ option from the "Map Distances" pulldown list. By default, the browser \ displays the deCODE sex-averaged distances.

\ \

Credits

\

This track is produced at UCSC and uses data that are freely available for\ the Genethon, Marshfield, and deCODE genetic maps (see above links). Thanks\ to all who have played a part in the creation of these maps.

\ \ map 1 ctgPos Map Contigs Physical Map Contigs 0 9 150 0 0 202 127 127 0 0 0

Description

\ This track shows the locations of contigs of clones\ on the physical map. \ \

Method

\ In assembly versions prior to the August 2001\ freeze, this track was based on the Washington University accession\ map, which in turn was based on a fingerprint contig (FPC) map\ described in "A physical map of the human genome", Nature 409: 934-941. \

\

\ From the August 2001 to the Nov 2002 freeze, this track was based on\ tiling path (TPF) maps curated by the sequencing centers responsible for\ each chromosome, which were integrated into an assembly done by NCBI.\ Beginning with the April 2003 freeze, the chromosome coordinators\ at the individual sequencing centers took over complete responsibility\ for preparing the assembly of their chromosomes in AGP format. The\ files provided by these centers are checked and validated at NCBI, and\ form the basis for the definition of the physical map contigs.\

\ map 0 gold Assembly bed 3 + Assembly from Fragments 0 10 150 100 30 230 170 40 0 0 0

Description

\

This track shows the draft assembly of the $organism genome.\ This assembly merges contigs from overlapping drafts and\ finished clones into longer sequence contigs. The sequence\ contigs are ordered and oriented when possible by mRNA, EST,\ paired plasmid reads (from the SNP Consortium) and BAC end\ sequence pairs.

\

In dense mode, this track depicts the path through the draft and \ finished clones (aka the golden path) used to create the assembled sequence. \ Clone boundaries are distinguished by the use of alternating gold and brown \ coloration. Where gaps\ exist in the path, spaces are shown between the gold and brown\ blocks. If the relative order and orientation of the contigs\ between the two blocks is known, a line is drawn to bridge the\ blocks.

\

\ Clone Type Key:\

\ \ map 1 gap Gap bed 3 + Gap Locations 1 11 0 0 0 127 127 127 0 0 0

Description

\ This track depicts gaps in the assembly. These gaps - with the\ exception of intractable heterochromatic gaps - will be closed during the\ finishing process. \

\ Gaps are represented as black boxes in this track.\ If the relative order and orientation of the contigs on either side\ of the gap is known, it is a bridged gap and a white line is drawn \ through the black box representing the gap. \

\

There are four principal types of gaps:\

\ map 1 clonePos Coverage Clone Coverage/Fragment Position 0 14 0 0 0 180 180 180 0 0 0

Description

\

\ In dense display mode, this track shows the coverage level of \ the genome. Finished regions are depicted in black. Draft regions \ are shown in various shades of gray that correspond to the level of coverage. \

\ In full display mode, this track shows the position of each contig inside each \ draft or finished clone ("fragment") in the assembly. For some \ assemblies, clones in the sequencing center tiling path are displayed with\ blue rather than gray backgrounds.\

\ map 0 bacEndPairs BAC End Pairs bed 6 + BAC End Pairs 0 15 0 0 0 127 127 127 0 0 0

Description

\

Bacterial artificial chromosomes (BACs) are a key part of many large\ scale sequencing projects. A BAC typically consists of 50-300kb of\ DNA. During the early phase of a sequencing project, it is common\ to sequence a single read (approximately 500 bases) off each end of\ a large number of BACs. Later on in the project, these BAC end reads\ can be mapped to the genome sequence. \

\

This track shows these mappings\ in cases where both ends could be mapped. These BAC end pairs can\ be useful for validating the assembly over relatively long ranges. In some\ cases, the BACs are useful biological reagents. This track can also be\ used for determining which BAC contains a given gene, useful information\ for certain wet lab experiments.\ \

A valid pair of BAC end sequences must be\ at least 50Kb but no more than 600Kb away from each other. \ The orientation of the first BAC end sequence must be "+" and\ the orientation of the second BAC end sequence must be "-".

\ \

Methods

\

BAC end sequences are placed on the assembled sequence using\ Jim Kent's \ blat \ program.

\ \

Credits

\

Additional information about the clone, including how it\ can be obtained, may be found at the \ NCBI Clone Registry.\ To view the registry entry for a specific clone, open the details page for the clone and click on its name at the top of the page.\

\ map 1 exonArrows off\ fosEndPairs Fosmid End Pairs bed 6 + Fosmid End Pairs 0 18 0 0 0 127 127 127 0 0 0

Description

\

A valid pair of fosmid end sequences must be\ at least 32 kb but no more than 47 kb away from each other. \ The orientation of the first fosmid end sequence must be "+" and\ the orientation of the second fosmid end sequence must be "-".

\ \

Methods

End sequences were trimmed at the NCBI using\ ssahaCLIP written by Jim Mullikin. Trimmed fosmid end sequences were\ placed on the assembled sequence using Jim Kent's \ blat \ program.

\ \

Credits

\

Sequencing of the fosmid ends was done at the \ Eli & Edythe L. Broad Institute of MIT \ and Harvard University.\ Sequences and quality scores are available in the NCBI Trace Respository.\

\ map 1 exonArrows off\ gcPercent GC Percent bed 4 + Percentage GC in 20,000 Base Windows 0 23 0 0 0 127 127 127 1 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in a 20,000 base window. Windows with high GC content are drawn more darkly \ than windows with low GC content. High GC content is typically associated with \ gene-rich areas.\

\

Credits

\

\ This track was generated at UCSC.\ map 1 knownGene Known Genes genePred refPep refMrna Known Genes Based on SWISS-PROT, TrEMBL, mRNA, and RefSeq 3 34 12 12 120 133 133 187 0 0 0

Description

\

\ The Known Genes track shows known protein coding genes based on \ proteins from SWISS-PROT, TrEMBL, and TrEMBL-NEW and their\ corresponding mRNAs from \ GenBank.\ Coding exons are displayed as thicker blocks than 5' and 3' \ untranslated regions (UTR). Connecting introns \ are one-pixel lines with hatch marks indicating direction of transcription.\ Entries which have corresponding entries in PDB are colored black.\ Entries which either have corresponding proteins in SWISS-PROT or mRNAs that are \ NCBI Reference Sequences with a "Reviewed" status are colored dark blue.\ Entries which have mRNAs that are \ NCBI Reference Sequences with a "Provisional" status are colored lighter blue.\ Everything else is colored with lightest blue.

\ \

Methods

\

\ All mRNAs of a species are aligned against the genome using the blat\ program. When a single mRNA aligns in multiple places, only\ the best alignments are kept. The alignments must also have \ at least 98% sequence identity to be kept. \ This set of mRNA alignments is further reduced by keeping only those mRNAs that \ are referenced by a protein in SWISS-PROT, TrEMBL, or TrEMBL-NEW.

\

\ Among multiple mRNAs referenced by a single protein, the best mRNA is chosen based on \ a quality score, which depends on its length, how good its translation matches \ the protein sequence, and its release date.\ The list of mRNA and protein pairs are further cleaned up by removing \ short invalid entries and consolidating entries with identical CDS regions.

\

\ Finally, RefSeq entries which are derived from DNA sequences instead of \ mRNA sequences are added. Disease annotations are from SWISS-PROT.

\ \

Credits

\

\ The Known Genes track is produced at UCSC based primarily on cross-references \ between proteins from \ SWISS-PROT \ (also including TrEMBL and TrEMBL-NEW) and mRNAs from GenBank\ generated by scientists worldwide. Part of \ NCBI RefSeq \ data are also included in this track.

\ \

Data Use Restrictions

\

\ The SWISS-PROT entries in this annotation track are copyrighted. They are \ produced through a collaboration \ between the Swiss Institute of Bioinformatics and the EMBL Outstation - the \ European Bioinformatics Institute. There are no restrictions on their use by \ non-profit institutions as long as their content is in no way modified and this \ statement is not removed. Usage by and for commercial entities requires a \ license agreement (see \ http://www.isb-sib.ch/announce/ or send an email to \ license@isb-sib.ch).

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. \ Nucleic Acids Res. 32 Database issue:D23-6.\ genes 1 hgGene off\ refGene RefSeq Genes genePred refPep refMrna RefSeq Genes 1 35 12 12 120 133 133 187 0 0 0

Description

\

\ The RefSeq Genes track shows known protein-coding genes taken from mRNA \ reference sequences compiled at LocusLink. Coding exons are represented by \ blocks connected by horizontal lines representing introns. The 5' and 3' \ untranslated regions (UTRs) are displayed as thinner blocks on the leading \ and trailing ends of the aligning regions. In full display mode, arrowheads \ on the connecting intron lines indicate the direction of transcription.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark). \

\

\ Non-coding RNA genes have their own track in some assemblies.\

\

Method

\

\ Refseq mRNAs are aligned against the genome using the \ blat\ program. When a single mRNA aligns in multiple places, only\ the best alignments which also have at least 98% sequence identity are kept.\

\

Using the Filter

\

The track filter can be used to configure the labeling of the features within\ the track. By default, items are labeled by gene name. Click the \ appropriate Label option to display the accession name instead of the gene\ name, show both the gene and accession names, or turn off the label completely.\ After you have made your selection, click Submit to return to the tracks display\ page.\

Credits

\

\ The RefSeq Genes track is produced at UCSC from mRNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project. \

\ genes 1 vegaGene Vega Genes genePred vegaPep Vega Annotations from Sanger, Genoscope 0 37 0 100 180 127 177 217 0 0 3 chr14,chr20,chr22, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$

Description and Methods

\

\ Excerpted from the Vega \ main page:

\

\ "The \ \ Vertebrate Genome Annotation (VEGA) \ database is designed to be a central repository for manual annotation\ of different vertebrate finished genome sequence.\ In collaboration with the genome sequencing centres Vega attempts to\ present consistent high-quality curation of the published chromosome\ sequences."

\

\ "Finished genomic sequence is analysed on a clone by clone basis using\ a combination of similarity searches against DNA and protein databases\ as well as a series of ab initio gene predictions (GENSCAN, Fgenes)."

\

\ "In addition, comparative analysis using vertebrate datasets such as\ the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores\ (Evolutionary Conserved Regions) are used for novel gene discovery."

\

\ NOTE: VEGA annotations appear only on chromosomes 14, 20, and 22 in this assembly.\ \

Credits

\

\ Thanks to Steve Searle at the\ Sanger Institute \ for providing the GTF and FASTA files for the Vega annotations. Vega gene annotations are \ generated by manual annotation from the following groups:\

\ Chromosome 14: \ \ \ \ Genoscope
\ \ Relevant publication: Heilig et al., The DNA sequence and analysis of \ \ human chromosome 14. Nature 421:601-607 (2003).\

\ Chromosome 20: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas et al., The DNA sequence and \ \ comparative analysis of human chromosome 20. Nature 414:865-871 (2001).\

\ Chromosome 22: Chromosome 22 Group,\ \ \ \ Wellcome Trust Sanger Institute
\ \ Relevant publications:
\ \ -- Collins et al., Reevaluating Human Gene Annotation: \ \ A Second-Generation Analysis of Chromosome 22. Genome Research 13(1):27-36 (2003).
\ \ -- Dawson et al., A \ \ first-generation linkage disequilibrium map of \ \ human chromosome 22. Nature 418:544-548 (2002).
\ \ -- Dunham et al., The DNA sequence of human chromosome 22. Nature 402:489-495 (1999).\ \ genes 1 vegaPseudoGene Vega Pseudogenes genePred Vega Annotated Pseudogenes and Immunoglobulin Segments 0 37.1 30 130 210 142 192 232 0 0 3 chr14,chr20,chr22, http://vega.sanger.ac.uk/Homo_sapiens/geneview?transcript=$$

Description and Methods

\

\ Excerpted from the Vega \ main page:

\

\ "The \ \ Vertebrate Genome Annotation (VEGA) \ database is designed to be a central repository for manual annotation\ of different vertebrate finished genome sequence.\ In collaboration with the genome sequencing centres Vega attempts to\ present consistent high-quality curation of the published chromosome\ sequences."

\

\ "Finished genomic sequence is analysed on a clone by clone basis using\ a combination of similarity searches against DNA and protein databases\ as well as a series of ab initio gene predictions (GENSCAN, Fgenes)."

\

\ "In addition, comparative analysis using vertebrate datasets such as\ the Riken mouse cDNAs and Genoscope Tetraodon nigroviridis Ecores\ (Evolutionary Conserved Regions) are used for novel gene discovery."

\

\ NOTE: VEGA annotations appear only on chromosomes 14, 20, and 22 in this assembly.\ \

Credits

\

\ Thanks to Steve Searle at the\ Sanger Institute \ for providing the GTF and FASTA files for the Vega annotations. Vega gene annotations are \ generated by manual annotation from the following groups:\

\ Chromosome 14: \ \ \ \ Genoscope
\ \ Relevant publication: Heilig et al., The DNA sequence and analysis of \ \ human chromosome 14. Nature 421:601-607 (2003).\

\ Chromosome 20: \ \ The HAVANA Group, \ \ Wellcome Trust Sanger Institute
\ \ Relevant publication: Deloukas et al., The DNA sequence and \ \ comparative analysis of human chromosome 20. Nature 414:865-871 (2001).\

\ Chromosome 22: Chromosome 22 Group,\ \ \ \ Wellcome Trust Sanger Institute
\ \ Relevant publications:
\ \ -- Collins et al., Reevaluating Human Gene Annotation: \ \ A Second-Generation Analysis of Chromosome 22. Genome Research 13(1):27-36 (2003).
\ \ -- Dawson et al., A \ \ first-generation linkage disequilibrium map of \ \ human chromosome 22. Nature 418:544-548 (2002).
\ \ -- Dunham et al., The DNA sequence of human chromosome 22. Nature 402:489-495 (1999).\ \ genes 1 ensGene Ensembl Genes genePred ensPep Ensembl Gene Predictions 1 40 150 0 0 202 127 127 0 0 0 http://www.ensembl.org/perl/transview?transcript=$$

Description

\

\ These gene predictions are from Ensembl.

\ \

Methods

\

For a description of the methods used in Ensembl gene prediction, refer to \ "\ The Ensembl genome database project", Nucleic Acids Research, \ 2002, 30(1) 38-41.

\ \

Credits

\

\ Thanks to Ensembl for providing this annotation.

\ \ genes 1 acembly Acembly Genes genePred acemblyPep acemblyMrna AceView Gene Models With Alt-Splicing 1 41 155 0 125 205 127 190 0 0 0 http://www.ncbi.nih.gov/IEB/Research/Acembly/av.cgi?db=human&l=$$

Description

\

This track shows gene models reconstructed solely from\ mRNA and EST evidence by Danielle and Jean Thierry-Mieg\ and Vahan Simonyan using the Acembly program.

\ \

Methods

\

Acembly attempts to find the best alignment of each mRNA against the \ genome, and considers alternative splice models. If more than one gene \ model is produced that has statistical significance, all of these models \ are displayed.

\ \

Credits

\

Thanks to Jean Thierry-Mieg at NIH for \ providing this track.

\ \ \ genes 1 twinscan Twinscan genePred twinscanPep Twinscan Gene Predictions Using Mouse/Human Homology 0 45 0 100 100 127 177 177 0 0 0

Description

\

\ The Twinscan program predicts genes in a manner similar to Genscan, except that\ Twinscan takes advantage of genome comparison to improve gene prediction\ accuracy. More information and a web server can be found at\ \ http://genes.cs.wustl.edu/.\

\

Methods

\

\ The Twinscan algorithm is described in Korf I, Flicek P, Duan D, and Brent MR \ (2001), "Integrating genomic homology into gene structure prediction", \ Bioinformatics 17:S140-148.\

\

Credits

\

\ Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing these data.\ genes 1 slamMouse Slam Mouse genePred Slam Gene Predictions Using Human/Mouse (Feb. 2002/mm2) Homology 0 45.5 100 50 0 175 150 128 0 0 0

Description and Credits

\

\ Slam \ predicts coding exons and conserved noncoding regions in a pair of homologous DNA sequences, incorporating both statistical sequence properties and degree of conservation in making the predictions. This particular annotation uses the Feb. 2002 (mm2) assembly of the mouse genome. The model is symmetric and the same gene structure (with possibly different exon lengths) is predicted in both sequences.

\ \

\ The symmetry of the model gives it a higher degree of accuracy for regions where the true underlying gene structures contain the same number of coding exons, in cases where this is not true, or when one of the sequences is of lower quality and contains in-frame stop codons, the resulting predictions tend to have lower accuracy.

\ \

\ More information on the accuracy of the predictions can be found at http://bio.math.berkeley.edu/slam/mouse. A web server for individual requests is available at http://bio.math.berkeley.edu/slam.

\ \

References

\

\ M. Alexandersson, S. Cawley, L. Pachter (2003). SLAM - Cross-species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model. Genome Research 13(3):496-502.

\

\ L. Pachter, M. Alexandersson, S. Cawley (2001). \ Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \ Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001).

\

\ L. Pachter , M. Alexandersson, S. Cawley (2002). \ Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \ Journal of Computational Biology 9(2):389-400.

\ \ genes 1 slamRat Slam Rat genePred Slam Gene Predictions Using Human/Rat (Nov. 2002/rn1) Homology 0 45.6 100 50 0 175 150 128 0 0 0

Description and Credits

\

\ Slam \ predicts coding exons and conserved noncoding regions in a pair of homologous DNA sequences, incorporating both statistical sequence properties and degree of conservation in making the predictions. This particular annotation uses the Nov. 2002 (rn1) assembly of the rat genome. The model is symmetric and the same gene structure (with possibly different exon lengths) is predicted in both sequences.

\

\ The symmetry of the model gives it a higher degree of accuracy for regions where the true underlying gene structures contain the same number of coding exons, in cases where this is not true, or when one of the sequences is of lower quality and contains in-frame stop codons, the resulting predictions tend to have lower accuracy.

\

\ More information on the accuracy of the predictions can be found at http://bio.math.berkeley.edu/slam/rat. A web server for individual requests is available at http://bio.math.berkeley.edu/slam.

\ \

References

\

\ M. Alexandersson, S. Cawley, L. Pachter (2003). SLAM - Cross-species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model. Genome Research 13(3):496-502.

\

\ L. Pachter, M. Alexandersson, S. Cawley (2001). \ Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \ Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001).

\

\ L. Pachter , M. Alexandersson, S. Cawley (2002). \ Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems, \ Journal of Computational Biology 9(2):389-400.

\ \ genes 1 sgpGene SGP Genes genePred sgpPep SGP Gene Predictions Using Human/Mouse (Feb. 2002/mm2) Homology 0 47 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the SGP program, which is being developed at \ the Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. To predict genes in a genomic\ query, SGP combines geneid predictions with tblastx comparisons of the \ genomic query against other genomic sequences. In this particular annotation, \ the Feb. 2002 (mm2) assembly of the mouse genome was used to find homology \ evidence between the two genomes.\

\

Credits

\

\ Thanks to GRIB for providing these gene predictions.\

\ \ \ \ genes 1 softberryGene Fgenesh++ Genes genePred softberryPep Fgenesh++ Gene Predictions 0 48 0 100 0 127 177 127 0 0 0

Description

\

Fgenesh++ predictions are based on Softberry's gene finding software.

\ \

Methods

\ Fgenesh++ uses both hidden Markov models (HMMs) and protein similarity to find genes in a completely \ automated manner. For more information, see the paper Solovyev VV (2001), \ "Statistical approaches in Eukaryotic gene prediction" in the Handbook of \ Statistical Genetics (ed. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127.\ \

Credits

\

The Fgenesh++ gene predictions were produced by \ Softberry Inc. \ Commercial use of these predictions is restricted to viewing in \ this browser. Please contact Softberry Inc. to make arrangements for further commercial access.\ \ genes 1 geneid Geneid Genes genePred geneidPep Geneid Gene Predictions 0 49 0 90 100 127 172 177 0 0 0

Description

\

\ This track shows gene predictions from the geneid program developed at the \ Grup de Recerca en\ Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in \ Barcelona. \

\

Methods

\

\ Geneid is a program to predict genes in anonymous genomic sequences designed \ with a hierarchical structure. In the first step, splice sites, start and stop \ codons are predicted and scored along the sequence using Position Weight Arrays \ (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the \ scores of the defining sites, plus the the log-likelihood ratio of a \ Markov Model for coding DNA. Finally, from the set of predicted exons, the gene \ structure is assembled, maximizing the sum of the scores of the assembled exons. \

\

Credits

\

\ Thanks to GRIB for providing these data.\

\ genes 1 genscan Genscan Genes genePred genscanPep Genscan Gene Predictions 1 50 170 100 0 212 177 127 0 0 0

Description

\

This track shows predictions from the \ Genscan program written by Chris Burge.\

\

Methods

\ For a description of the Genscan program and the model that underlies it, refer\ to Burge C and Karlin S (1997), \ "Prediction of Complete Gene Structures in Human Genomic DNA", \ J. Mol. Biol. 268(1):78-94. The splice site models used are described in \ more detail in Burge C (1998), "Modeling Dependencies in Pre-mRNA Splicing \ Signals" in Salzberg S, Searls D, and Kasif S, eds. \ \ Computational Methods in Molecular Biology, Elsevier Science, Amsterdam, \ 127-163. \ \

Credits

\ Thanks to Chris Burge for providing these data.\ genes 1 rnaGene RNA Genes bed 6 + Non-coding RNA Genes (dark) and Pseudogenes (light) 0 52 170 80 0 230 180 130 0 0 0

Description

\

\ This track shows the location of non-protein coding RNA genes and\ pseudogenes. \

\ Feature types include:\

\

\ \

Methods

\ \

\ Eddy-tRNAscanSE (tRNA genes, Sean Eddy):
\ tRNAscan-SE 1.23 with default parameters.\ Score field contains tRNAscan-SE bit score; >20 is good, >50 is great.

\

\ Eddy-BLAST-tRNAlib (tRNA pseudogenes, Sean Eddy):
\ Wublast 2.0, with options "-kap wordmask=seg B=50000 W=8 cpus=1".\ Score field contains % identity in blast-aligned region.\ Used each of 602 tRNAs and pseudogenes predicted by tRNAscan-SE\ in the human oo27 assembly as queries. Kept all nonoverlapping\ regions that hit one or more of these with P <= 0.001.

\

\ Eddy-BLAST-snornalib (known snoRNAs and snoRNA pseudogenes, Steve Johnson):
\ Wublastn 2.0, with options "-V=25 -hspmax=5000 -kap wordmask=seg \ B=5000 W=8 cpus=1".\ Score field contains blast score.\ Used each of 104 unique snoRNAs in snorna.lib as a query.\ Any hit >=95% full length and >=90% identity is annotated as a\ "true gene".\ Any other hit with P <= 0.001 is annotated as a "related sequence" \ and interpreted as a putative pseudogene.

\

\ Eddy-BLAST-otherrnalib \ (non-tRNA, non-snoRNA noncoding RNAs with GenBank entries\ for the human gene.):
\ Wublastn 2.0 [15 Apr 2002]\ with options: "-kap -cpus=1 -wordmask=seg -W=8 -E=0.01 -hspmax=0\ -B=50000 -Z=3000000000". Exceptions to this are:\

\

\ The score field contains the blastn score. \ Used 41 unique miRNAs, and 29 other ncRNAs as queries.\ Any hit >=95% full length and >=95% identity is annotated as a \ "true gene".\ Any other hit with P <= 0.001 and >= 65% identity is annotated\ as a "related sequence". An exception to this is: all miRNAs consist \ \ of 16-26 bp sequences in GenBank \ and are only annotated if 100% full length and 100% identity. \ miRNAs consist of Let-7 from Pasquinelli et al., \ Nature (2000) 408:86; 40 from Mourelatos et al., Gene & Dev (2002) \ 16:720.

\

Credits

\

\ These data were kindly provided by Sean Eddy at Washington University.

\ genes 1 superfamily Superfamily bed 4 + Superfamily/SCOP: Proteins Having Homologs with Known Structure/Function 0 53 150 0 0 202 127 127 0 0 0 http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gene.cgi?genome=

Description

\

\ The \ Superfamily \ track shows proteins having homologs with known structures or functions.

\

\ Each entry on the track shows the coding region of a gene (based on Ensembl gene predictions).\ In full display mode, the label for an entry consists of the names of \ all known protein domains coded by this gene. This \ usually contains structural and/or function descriptions that provide valuable information to help users get a quick grasp of the biological significance of the gene.

\

Method

\

\ Data are downloaded from the Superfamily server.\ Using the cross-reference between Superfamily entries and Ensembl gene prediction entries\ and their alignment to the appropriate genome, the associated data are processed to generate \ a simple BED format track.

\

Credits

\

\ Superfamily is developed by\ Julian\ Gough at the MRC Laboratory\ of Molecular Biology, Cambridge.

\

\ Gough, J., Karplus, K., Hughey, R. and\ Chothia, C. (2001). "Assignment of Homology to Genome Sequences using a\ Library of Hidden Markov Models that Represent all Proteins of Known Structure". \ J. Mol. Biol., 313(4), 903-919.

\ \ genes 1 mrna $Organism mRNAs psl . $Organism mRNAs from GenBank 3 54 0 0 0 127 127 127 1 0 0

Description

\

\ The $Organism mRNA track shows alignments between $organism mRNAs\ in GenBank and the genome. Aligning regions (usually exons)\ are shown as black boxes connected by lines for gaps (spliced-out introns, \ usually). In full display, arrows on the introns\ indicate the direction of transcription.

\ \

Method

\

\ GenBank $organism mRNAs are aligned against the genome using the \ blat\ program. When a single mRNA aligns in multiple places, \ the alignment having the highest base identity is found. \ Only alignments that have a base identity level within 1% of\ the best are kept. Alignments must also have at least 95%\ base identity to be kept.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the mRNA display. For\ example, to apply the filter to all liver mRNAs, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all of the filter criteria will\ be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display mRNAs that match the filter criteria. If "include" is selected, the browser \ will display only those mRNAs that match the filter criteria.\

\

\ When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

\ The $Organism mRNA track is produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. \ Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 cdsDrawOptions enabled\ intronEst Spliced ESTs psl est $Organism ESTs That Have Been Spliced 1 56 0 0 0 127 127 127 1 0 0

Description

\

The Spliced EST track displays Expressed Sequence Tags \ (ESTs) from GenBank that show signs of splicing when\ aligned against the genome. To be considered spliced, an EST must show \ evidence of at least one cannonical intron, i.e. one that is at least\ 32 bases in length and has GT/AG ends. By requiring splicing, the level \ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the \ $Organism EST track.

\ \

Expressed sequence tags are single-read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced-out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the Genbank annotations, which frequently were inaccurate.

\ \

Strand information provided for ESTs (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\ \

Method

\

To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\ \

In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.

\ \

To generate this track, $organism ESTs from Genbank are aligned \ against the genome using the \ blat program. Note that the maximum intron length\ allowed by blat is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 0.1% of the best are kept. \ Alignments must also have at least 98% base identity to be kept.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any one of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\

\

When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

\ The Spliced EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. \ Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 intronGap 30\ est $Organism ESTs psl est $Organism ESTs Including Unspliced 0 57 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows alignments between $organism Expressed\ Sequence Tags (ESTs) in Genbank and the genome.

\ \

Expressed sequence tags are single-read (typically\ approximately 500 base) sequences which usually\ represent fragments of transcribed genes. Aligning \ regions (usually exons) are shown as black boxes \ connected by lines for gaps (usually spliced-out introns). \ In full display mode, arrows on the introns\ indicate the direction of transcription. In the\ December 2001 assembly and later, this direction is\ taken by looking at the splice sites. In previous\ assemblies, the direction of transcription was taken from \ the Genbank annotations, which frequently were inaccurate.

\ \

Strand information provided for ESTs (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\ \

Method

\

To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector, and a read taken from the 5'\ and/or 3' primer. For most - but not all - ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.

\ \

In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries are starting to hit transcription start\ reasonably well. Before the cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to get sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ (Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination.) Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism. However, because the $organism 3' UTRs are quite\ long, the splicing requirement does eliminate many genuine 3'\ ESTs.

\ \

To generate this track, $organism ESTs from Genbank are aligned \ against the genome using the \ blat \ program. Note that the maximum intron length\ allowed by blat is 500,000 bases, which may eliminate some ESTs with very \ long introns that might otherwise align. When a single \ EST aligns in multiple places, the alignment having the \ highest base identity is found. Only alignments that have \ a base identity level within 0.1% of the best are kept. \ Alignments must also have at least 98% base identity to be kept.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any one of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\

\

When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

\ The $Organism EST track is produced at UCSC from EST sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. <\ em>Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 intronGap 30\ xenoMrna Non-$Organism mRNAs psl xeno Non-$Organism mRNAs from GenBank 0 63 0 0 0 127 127 127 1 0 0

Description

\

\ This track displays translated \ blat\ alignments of\ non-$organism vertebrate and invertebrate mRNA from \ GenBank.

\ \

The strand information (+/-) for this track is in two parts. The\ first + indicates the orientation of the query sequence whose\ translated protein produced the match (here always 5' to 3', hence +).\ The second + or - indicates the orientation of the matching \ translated genomic sequence. Because the two orientations of a DNA \ sequence give different predicted protein sequences, there are four \ combinations. ++ is not the same as --; nor is +- the same as -+.\ \ \

Method

\

\ The alignments were passed through a near-best-in-genome filter.

\ \

Using the Filter

\

The track filter can be used to color, include, or exclude a subset of individual \ items within a track. This is helpful when many items are shown in the track\ display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the mRNA display. For\ example, to apply the filter to all brain mRNAs, type "brain" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all of the filter criteria will\ be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that will be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display mRNAs that match the filter criteria. If "include" is selected, the browser \ will display only those mRNAs that match the filter criteria.\

\

When you have finished configuring the filter, click the Submit button.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. \ Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 cdsDrawOptions enabled\ xenoEst Non-$Organism ESTs psl xeno Non-$Organism ESTs from GenBank 0 65 0 0 0 127 127 127 1 0 0 http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$

Description

\

\ This track displays translated \ blat\ alignments of non-$organism vertebrate ESTs from \ GenBank.

\ \

The strand information (+/-) for this track is in two parts. The\ first + or - indicates the orientation of the query sequence whose\ translated protein produced the match. The second + or - indicates the\ orientation of the matching translated genomic sequence. Because the two\ orientations of a DNA sequence give different predicted protein sequences,\ there are four combinations. ++ is not the same as --; nor is +- the same\ as -+.\ \

Method

\

\ To generate this track, the ESTs are aligned against the genome using the blat\ program. The alignments are passed through a piecewise near-best-in-genome\ filter.

\ \

Using the Filter

\

The track filter can be used to change the color or include/exclude a subset of \ individual items within a track. This is helpful when many items are shown in the \ track display, especially when only some are relevant to the current task. To use the\ filter:\

    \
  1. Enter a value in one or more of the text boxes to filter the EST display. For\ example, to apply the filter to all ESTs expressed in the liver, type "liver" in the \ tissue box. For a list of permissible filter values, consult the non-positional table in\ the Table Browser that corresponds to the factor on which you wish to filter. For\ example, the non-positional table "tissue" contains all of the types of tissues\ that can be entered into the tissue text box. Wildcards can also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all of the filter criteria will\ be highlighted. If "or" is selected, ESTs that match any 1 of the filter criteria\ will be highlighted.\
  3. Choose the color or display characteristic that should be used to highlight or\ include/exclude the filtered items. If "exclude" is chosen, the browser will not \ display ESTs that match the filter criteria. If "include" is selected, the browser \ will display only those ESTs that match the filter criteria.\

\

When you have finished configuring the filter, click the Submit button.

\ \

Credits

\

\ This track is produced at UCSC from EST sequence data submitted to the\ international public sequence databases by scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 tigrGeneIndex TIGR Gene Index genePred Alignment of TIGR Gene Index TCs Against the $Organism Genome 0 68 100 0 0 177 127 127 0 0 0 http://www.tigr.org/tigr-scripts/tgi/tc_report.pl?$$

Description

\

This track displays alignments of the TIGR Gene Index (TGI)\ against the $organism genome. The TIGR Gene Index is based\ largely on assemblies of EST sequences in the public databases.\ See \ www.tigr.org for more information about TIGR and the Gene Index.

\

Credits

\

Thanks to Foo Cheung and Razvan Sultana of the The Institute for Genomic Research, for converting these data into a track for the browser.

\ rna 1 uniGene_2 UniGene bed 12 UniGene Hs 159 Alignments and SAGEmap Info 0 69 0 0 0 127 127 127 1 0 0 \

Description

\

\ Serial Analysis of Gene Expression (SAGE)\ is a quantative measurement gene expression. Data is presented for\ every cluster contained in the browser window and the selected cluster\ name is highlighted in red. All data are from the repository at the SageMap project\ built on UniGene version Hs 159. \ \ Selecting the UniGene cluster name will display SageMap's page for\ that cluster.

\ \

Methods

\

\ SAGE counts are produced by sequencing small "tags" of DNA believed to\ be associated with a gene. These tags are produced by attatching\ poly-A RNA to oligo-dT beads. After synthesis of double stranded cDNA\ transcripts are cleaved by an anchoring enzyme (usually NIaIII). Then\ small tags are produced by ligation with a linker containing a type\ IIS restriction enzyme site and cleavage with the tagging enzyme\ (usually BsmFI). The tags are then concatenated together and\ sequenced. The frequency of each tag is counted and used to infer\ expression level of transcripts that can be matched to that tag. \

\

Credits

\

\ All\ SAGE data presented here was mapped to UniGene transcripts by the SageMap project at NCBI.\

\


\ rna 1 rnaCluster Gene Bounds bed 12 Gene Boundaries as Defined by RNA and Spliced EST Clusters 0 71 200 0 50 227 127 152 0 0 0

Description

\

\ This track shows the boundaries of genes and the direction of\ transcription as deduced from clustering spliced ESTs and mRNAs\ against the genome. When there are many spliced variants\ of the same gene, this track shows the variant that\ spans the greatest distance in the genome.

\ \

Method

\

\ ESTs and mRNAs from \ GenBank are aligned against the genome with the \ blat\ program, and filtered to keep only those alignments\ that have at least 97.5% base identity within the \ aligning blocks. When multiple alignments occur, only the\ alignments with a percentage identity within 0.2% of the\ best alignment are kept. ESTs that align without any\ introns are discarded. Blocks that are less than 130 bases\ and are not next to an intron are discarded. Blocks smaller\ than 10 bases are discarded. The orientations of the \ ESTs and mRNAs are deduced from the GT/AG splice sites\ at the introns, and ESTs and mRNAs with overlapping blocks\ on the same strand are merged into clusters. Only the\ extent and orientation of the clusters are shown here.\

\

Credits

\

\ This track -- which was originally developed by Jim Kent --\ was generated at UCSC and uses data submitted to GenBank by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004)\ GenBank: update. \ Nucleic Acids Res. 32 Database issue:D23-6.\ rna 1 affyTranscriptome Transcriptome sample Affymetrix Experimentally Derived Transcriptome 0 88 100 50 0 0 0 255 0 0 2 chr22,chr21,

Description

\

\ Transcriptome data for chromosomes 21 and 22 from Affymetrix, as described in \ "Large-Scale Transcriptional Activity in Chromosomes 21 and 22",\ Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S, Strausberg,\ R. L., Fodor, S.P.A. and Gingeras, T.R.. In general, the data presented\ is the perfect match - mis-match value. Different experiments were\ normalized by setting the average value to be the same for each\ chip. Replicates for different cell types were averaged together to\ produce the data seen in "full" mode for each cell type. In dense\ mode, or at the top of the track in full mode, "Transcriptome" displays\ the maximum value over all experiments for that probe, the idea being\ to paint as many transcribed regions as possible. \

\ To present a more\ interpretable display when zoomed out, averages have been precalculated\ over the chromosome at two different resolutions in addition to the\ raw data. For example, when zoomed out, there may appear to be a peak at\ the center of a gene rather than a signal at every exon. Zooming in\ will reveal the "raw" data for that region.\

\ NOTE: Affymetrix transcriptome annotations appear only on chromosomes 21 and 22.\ \

Credits

\

\ Thanks to Affymetrix for providing these data. Questions/Comments? Email \ sugnet@soe.ucsc.edu. \ \ regulation 0 cpgIsland CpG Islands bed 4 + CpG Islands (Islands < 300 Bases are Light Green) 0 90 0 100 0 128 228 128 0 0 0

Description

\

\ CpG islands are associated with genes, particularly housekeeping\ genes, in vertebrates. CpG islands are typically common near\ transcription start sites, and may be associated with promoter\ regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in\ vertebrate DNA because the C's in such an arrangement tend to be\ methylated. This methylation helps distinguish the newly synthesized\ DNA strand from the parent strand, which aids in the final stages of\ DNA proofreading after duplication. However, over evolutionary time\ methylated C's tend to turn into T's because of spontaneous\ deamination. The result is that CpG's are relatively rare unless\ there is selective pressure to keep them or a region is not methylated\ for some reason, perhaps having to do with the regulation of gene\ expression. CpG islands are regions where CpG's are present at\ significantly higher levels than is typical for the genome as a whole.\

\ \

Method

\

\ CpG islands are predicted by searching the sequence one base at a\ time, scoring each dinucleotide (+17 for CG and -1 for others) and\ identifying maximally scoring segments. Each segment is then\ evaluated to determine GC content (roughly >= 50%), length (> 200), and ratio of\ observed number of CG dinucleotides to the expected number on\ the basis of the GC content of the segment (> 0.6). \

\

\ The CpG count is the number of CG dinucleotides in the island. \ The Percentage CpG is the ratio of CpG nucleotide bases\ (twice the CpG count) to the length.\

\ \

Credits

\

\ This track was generated \ using a\ modification of a program developed by G. Miklem and L. Hillier. \

\ \ regulation 1 promoterStanford Stanford Promoters bed 6 Stanford Promoters 0 92 0 0 0 127 127 127 0 0 0

Description

\ \

The Stanford Human Promoters dataset was generated by\ Rick Myers' Lab\ at Stanford University and is described thoroughly in\ \ "Identification and Functional Analysis of Human\ Transcriptional Promoters."\

Briefly, Full-length human cDNAs were aligned to the human genome\ sequence to predict putative starts of transcription. Roughly 500 bp\ of putative promoter sequence was then cloned into a reporter\ construct for 150 random promoters. More than 90% of these fragments\ had significant promoter activity in at least one of the cell types\ we tested.\

The promoters highlighted in black have been experimentally confirmed as functional promoters, and the ones highlighted in gray are predicted and are in the process of being tested.\ \

Any questions or comments, email \ nathant@stanford.edu\ \ or\ shelleyj@stanford.edu. \ \ regulation 1 snpMap SNPs bed 4 . Simple Nucleotide Polymorphisms (SNPs) 0 144 0 0 0 127 127 127 0 0 0

Description

\

\ This track consolidates all the Simple Nucleotide Polymorphisms\ into a single track. It is the union of the Overlap SNPs,\ Random SNPs, Affymetrix 120K SNP, and Affymetrix 10K SNP tracks that\ previously existed in the Genome Browser.

\ \ \ Variant Sources\ \ \ \ Variant Types\ \ \

Filtering

\

\ The SNPs in this track include all known polymorphisms that\ can be mapped against the current assembly. These include known point\ mutations (Single Nucleotide Polymorphisms), insertions, deletions,\ and segmental mutations from the current build of \ dbSnp, which is \ shown in the Genome Browser release log.\

\

\ There are three major cases that are not mapped and/or annotated:\

\

\

The heuristics for the non-SNP variations (i.e. named elements and\ STRs) are quite conservative; therefore, some of these are probably lost. This\ approach was chosen to avoid false annotation of variation in\ inappropriate locations.

\ \

Supporting Details

\

\ Positional information can be found in the annotations section\ of the Genome Browser \ downloads page, \ which is organized by species and assembly. Non-positional information\ can be found in the \ shared\ data section of the same page, where it is split into tables by\ organism: \ dbSnpRsHg for Human, \ dbSnpRsMm for Mouse, and \ dbSnpRsRn for Rat.\ \

Credits

\

\ Thanks to the SNP\ Consortium and NIH for providing the public data, which are available from \ dbSnp at \ NCBI.

\

\ Thanks to Perlegen Sciences, \ Inc. for providing additional SNPs from their database.\ Additional information about the Perlegen SNP discovery process can be\ found in Patil, N. (2001) \ \ Blocks of Limited Haplotype Diversity Revealed by High-Resolution\ Scanning of Human Chromosome 21. Science 294:1719-1723.\

\

\ Thanks to Affymetrix, Inc. for developing the genotyping\ arrays. For more details on this genotyping assay, please see the\ supplemental information on the \ Affymetrix 10K SNP and \ Affymetrix 120K SNP products. Additional information, \ including genotyping data, is available from the details pages for the \ Affymetrix 120K SNP and Affymetrix 10K SNP tracks.

\ \

Terms of Use for the Affymetrix data

\

Please see the Terms and Conditions page on the Affymetrix website for \ restrictions on the use of their data. \ \ \ \ varRep 1 perlegen Perlegen Haplotypes bed 12 Perlegen Common High-Resolution Haplotype Blocks 0 145.21 0 0 0 127 127 127 1 0 1 chr21, http://www.perlegen.com/haplotype/blk/$$.html

Description

\

\ Haplotype blocks derived from common single nucleotide polymorphisms (SNPs) on Chromosome 21 by\ Perlegen Sciences, as described\ in Patil N et. al. (2001), \ "\ Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning", \ Science 294:1719-1723.\

\

\ The location of each haplotype block is represented by\ a blue horizontal line with tall vertical blue bars at the first and\ last SNPs of the block. Blocks are displayed as starting at the first\ SNP and ending at the last SNP of the block. This is slightly\ different from the representation on the Perlegen web site in which blocks are \ stretched until they abut each other. The shade of the blue indicates the minimum\ number of SNPs required to discriminate between haplotype patterns\ that account for at least 80% of genotyped chromosomes. Darker colors\ indicate that fewer SNPs are necessary. Individual SNPs are denoted by\ smaller black vertical bars. At multi-megabase resolution in dense\ display mode, clusters of tall blue bars may indicate hotspots for\ recombination. \

\

For more information on a particular block, click "Outside Link" \ on the item's details page. General information on the\ blocks is available from Perlegen's\ \ Chromosome 21 Haplotype Browser.\

\ NOTE: Perlegen annotations appear only on chromosome 21.\

\

Credits

\ Thanks to Perlegen Sciences for making these data available.\ varRep 1 haplotype Haplotype Blocks bed 12 Common Haplotype Blocks 0 145.22 0 0 0 127 127 127 1 0 1 chr22,

Description

\

\ Haplotype blocks on Chromosome 22 from \ The University of Oxford and \ The Wellcome Trust Sanger Institute, \ as described in Dawson E. et. al. (2002), \ "A first-generation linkage disequilibrium map of human chromosome 22", Nature 418:544-8. \

\ The location of each haplotype block is represented by\ a blue horizontal line with tall vertical blue bars at the first and\ last SNPs of the block. Blocks are displayed as starting at the first\ SNP and ending at the last SNP of the block. Individual SNPs are denoted by\ smaller black vertical bars. At multi-megabase resolution in dense\ display mode, clusters of tall blue bars may indicate hotspots for\ recombination. \

\ NOTE: Haplotype block annotations appear only on chromosome 22.\ \

Credits

\ Thanks to The University of Oxford and the the Sanger Institute for providing these data.\ varRep 1 genomicSuperDups Segmental Dups bed 6 Duplications of >1000 Bases of Non-RepeatMasked Sequence 0 146 0 0 0 127 127 127 0 0 0

Description

\ \

This region was detected as a putative genomic duplication within the golden path.\ Orange, yellow, dark-light gray represent similarities of >99\\%, 99-98\\% and 98-90% \ respectively. Duplications greater than 98% similarity that lack sufficient SDD \ evidence (likely missed overlaps) are shown as red. Cut off values were at least \ 1 kb of total sequence aligned (containing at least 500 bp non-RepeatMasked sequence) \ and at least 90% sequence identity. \ \

Methods

\ For a description of the 'fuguization' detection method see Bailey, JA, et. al., \ (2001). "Segmental duplications: organization and impact within the current human genome project assembly."\ Genome Res 11:1005-17. \ \

Credits

\ The data were provided by \ Jeff Bailey \ \ and Evan Eichler.\

\ varRep 1 rmsk RepeatMasker rmsk Repeating Elements by RepeatMasker 1 149.1 0 0 0 127 127 127 1 0 0

Description

\

\ This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences \ for interspersed repeats and low complexity DNA sequences. The program\ outputs a detailed annotation of the repeats that are present in the \ query sequence, as well as a modified version of the query sequence \ in which all the annotated repeats have been masked. RepeatMasker uses \ the RepBase library of repeats from the \ Genetic \ Information Research Institute (GIRI). RepBase is described in \ Jurka, J. "Repbase Update: a database and an electronic journal of \ repetitive elements". Trends Genet. 9:418-420 (2000).\

\ In full display mode, this track displays nine different classes of repeats:\