This directory contains GTF files for the main gene transcript sets where available. They are
sourced from the following gene model tables: ncbiRefSeq, refGene, ensGene, knownGene

Not all files are available for every assembly. For more information on the source tables 
see the respective data track description page in the assembly. For example:

Information on the different gene models can also be found in our genes FAQ:

- The "knownGene" track is the current version of GENCODE gene transcript models. For the exact
  version, see the GENCODE track on the hg38 genome browser
- The "ncbiRefSeq" track shows the RefSeq transcripts as aligned by NCBI, the "official" placement.
- The "refGene" track contains the RefSeq transcripts as aligned by UCSC. If UCSC differs from NCBI,
  then such a case could be worth a manual investigation, often these differences indicate
  transcripts that are not easy to align and where short read mapping may also run into problems and
  long-reads or more cDNA could be needed.
- The "ensGene" track contains the Ensembl annotations before the GENCODE project. This track exists
  only for record-keeping and reproducibility. The ensGene.gtf.gz file has not been updated on hg38
  since 2014 and has been removed from our download server.


The files are created using the genePredToGtf utility with the additional -utr flag. Utilities
can be found in the following directory:

An example command is as follows:
    genePredToGtf -utr hg38 ncbiRefSeq hg38.ncbiRefSeq.gtf

Additional Resources

Information on GTF format and how it is related to GFF format:

Information about the different gene models available in the Genome Browser:

More information on how the files were generated:
      Name                      Last modified      Size  Description
Parent Directory - hg38.ensGene.gtf.gz 2020-01-10 09:33 27M hg38.knownGene.gtf.gz 2023-06-28 17:13 37M hg38.ncbiRefSeq.gtf.gz 2022-10-28 16:35 40M hg38.refGene.gtf.gz 2020-01-10 09:33 23M md5sum.txt 2023-01-06 14:43 221