The GENCODE track is composed of all the gene models in the GENCODE v32 release. By default, only the basic gene set is displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts that GENCODE believes will be useful to the majority of users.
The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes are not displayed by default. It contains annotations on the reference chromosomes as well as assembly patches and alternative loci (haplotypes).
The following table provides statistics for the v32 release derived from the GTF file that contains annotations only on the main chromosomes. More information on how they were generated can be found in the GENCODE site.
GENCODE v32 Release Stats Genes Observed Transcripts Observed Protein-coding genes 19,965 Protein-coding transcripts 83,986 Long non-coding RNA genes 17,910 - full length protein-coding 57,935 Small non-coding RNA genes 7,576 - partial length protein-coding 26,051 Pseudogenes 14,749 Nonsense mediated decay transcripts 15,811 Immunoglobulin/T-cell receptor gene segments 645 Long non-coding RNA loci transcripts 48,351
For more information on the different gene tracks, see our Genes FAQ.
By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes. It includes options to display the comprehensive GENCODE set and pseudogenes. To customize these options, the respective boxes can be checked or unchecked at the top of this description page. Our FAQ includes examples of how to display a single transcript per gene and switching between the basic and comprehensive gene sets.
This track also includes a variety of labels which identify the transcripts when visibility is set to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID (uc001yve.4), UniProt Display ID (Q7RTP0) and OMIM ID (608145). Additional information about gene and transcript names can be found in our FAQ.
This track, in general, follows the display conventions for gene prediction tracks. The exons for putative non-coding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. The following color key is used:
This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. There is also an option to display the data as a density graph, which can be helpful for visualizing the distribution of items over a region.
The GENCODE v32 track was built from the GENCODE downloads file
gencode.v32.chr_patch_hapl_scaff.annotation.gff3.gz
. Data from other sources
were correlated with the GENCODE data to build the knownTo tables.
The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a downloadable file. These include tables that link GENCODE Genes transcripts to external datasets (such as knownToLocusLink, which maps GENCODE Genes transcripts to Entrez identifiers, previously known as Locus Link identifiers), and tables that detail some property of GENCODE Genes transcript sequences (such as knownToPfam, which identifies any Pfam domains found in the GENCODE Genes protein-coding transcripts).
One can see a full list of the associated tables in the Table Browser by selecting GENCODE Genes from the track menu; this list is then available on the table menu. Note that some of these tables refer to GENCODE Genes by its former name of Known Genes, sometimes abbreviated as known or kg. While the complete set of annotation tables is too long to describe, some of the more important tables are described below.
GENCODE Genes and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. The genePred format files for hg38 are available from our downloads directory as knownGene.txt.gz or in our GTF download directory as hg19.knownGene.gtf.gz. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog.
The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a computational pipeline developed by Jim Kent and Brian Raney.
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012 Sep;22(9):1760-74. PMID: 22955987; PMC: PMC3431492
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. PMID: 16925838; PMC: PMC1810553
A full list of GENCODE publications is available at The GENCODE Project web site.
GENCODE data are available for use without restrictions.