bigGenePred Track Format
 

The bigGenePred format stores annotation items that are a linked collection of exons, much as BED files do, but bigGenePred has additional information about the coding frames and other gene specific information. BigGenePred files are created initially from BED type files with some extra fields, using the program bedToBigBed with a special autosql file that defines the fields of the bigGenePred. The resulting bigBed files are in an indexed binary format. The main advantage of the bigBed files is that only the portions of the files needed to display a particular region are transferred to UCSC, so for large data sets bigBed is considerably faster than regular BED files. The bigBed file remains on your web accessible server (http, https, or ftp), not on the UCSC server. Only the portion that is needed for the chromosomal position you are currently viewing is locally cached as a "sparse file".

Big Gene Predictions

The following definition is used for bigGenePred gene prediction files. In alternative-splicing situations, each transcript has a row in this table.

table bigGenePred
"bigGenePred gene models"
   (
   string chrom;       "Reference sequence chromosome or scaffold"
   uint   chromStart;  "Start position in chromosome"
   uint   chromEnd;    "End position in chromosome"
   string name;        "Name or ID of item, ideally both human readable and unique"
   uint score;         "Score (0-1000)"
   char[1] strand;     "+ or - for strand"
   uint thickStart;    "Start of where display should be thick (start codon)"
   uint thickEnd;      "End of where display should be thick (stop codon)"
   uint reserved;       "RGB value (use R,G,B string in input file)"
   int blockCount;     "Number of blocks"
   int[blockCount] blockSizes; "Comma separated list of block sizes"
   int[blockCount] chromStarts; "Start positions relative to chromStart"
   string name2;       "Alternative/human readable name"
   string cdsStartStat; "enum('none','unk','incmpl','cmpl')"
   string cdsEndStat;   "enum('none','unk','incmpl','cmpl')"
   int[blockCount] exonFrames; "Exon frame {0,1,2}, or -1 if no frame for exon"
   string type;        "Transcript type"
   string geneName;    "Primary identifier for gene"
   string geneName2;   "Alternative/human readable gene name"
   string geneType;    "Gene type"
   )

See this page for help in selecting a graphing track data format that is most approriate for the type of data you have.

Note that the bedToBigBed utility uses a substantial amount of memory; somewhere on the order of 1/4 times more RAM than the uncompressed BED input file.

To create a bigGenePred track, follow these steps:

  1. Create a bigGenePred format file that has the first twelve fields described by a normal BED file described here.
    • Your bigGenePred file must have the extra eight fields described in the AutoSQL file above.
    • Your bigGenePred file must be sorted by chrom then chromStart. You can use the UNIX sort command to do this: sort -k1,1 -k2,2n unsorted.bed > input.bed
  2. Download the bedToBigBed program from the directory of binary utilities.
  3. Use the fetchChromSizes script from the same directory to create the chrom.sizes file for the UCSC database you are working with (e.g. hg19).
  4. Create the bigBed file from your sorted bigGenePred input file using the bedToBigBed utility like so: bedToBigBed -as=bigGenePred.as bigGenePred.txt chrom.sizes myBigGenePred.bb
  5. Move the newly created bigBed file (myBigGenePred.bb) to a http, https, or ftp location.
  6. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the "track" line will look something like this:
    track type=bigGenePred name="My Big GenePred" description="A Gene Set Built from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb
  7. Paste this custom track line into the text box in the custom track management page.
The bedToBigBed program can also be run with several additional options. A full list of the available options can be seen by running bedToBigBed by itself with no arguments to display the usage message.

Example One

In this example, you will use an existing bigGenePred file to create a bigGenePred custom track. A bigGenePred file that contains data on chromosome 21 on the hg19 assembly has been placed on our http server. You can create a custom track using this bigGenePred file by constructing a "track" line that references this file like so:

track type=bigGenePred name="bigGenePred Example One" description="A bigGenePred file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigGenePred.bb

Include the following "browser" line to ensure that the custom track opens at the correct position:

browser position chr21:33,031,597-33,041,570

Paste the "browser" line and "track" line into the custom track management page for the human assembly hg19 (Feb. 2009), then press the submit button. On the following page, press the chr21 link in the custom track listing to view the bigBed track in the Genome Browser.

Example Two

In this example, you will create your own bigGenePred file from an existing bigGenePred input file.

  • Save this BigGenePred input file to your machine (this satisfies steps 1 and 2 above).
  • Save this text file to your machine.
  • Save this text file to your machine. It contains the chrom.sizes for the human (hg19) assembly (this satisfies step 4 above).
  • Download the bedToBigBed utility (see step 3).
  • Run the utility to create the bigBed output file (see step 5):
    bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePred.txt hg19.chrom.sizes bigGenePred.bb
  • Place the bigBed file you just created (bigGenePred.bb) on a web-accessible server (see step 6).
  • Construct a "track" line that points to your bigGenePred file (see step 7).
  • Create the custom track on the human assembly hg19 (Feb. 2009), and view it in the genome browser (see step 8). Note that the original bigGenePred.txt file contains data on only chromsome 21.

Sharing Your Data with Others

If you would like to share your bigGenePred data track with a colleague, learn how to create a URL by looking at Example 11 on this page.

Extracting Data from the bigBed Format

Because the bigGenePred files are a extension of bigBed files, they are indexed binary files, which can be difficult to extract data from. Consequently, we have developed the following two programs, both of which are available from the directory of binary utilities.

  • bigBedToBed — this program converts a bigBed file to ASCII BED format.
  • bigBedSummary — this program extracts summary information from a bigBed file.
  • bigBedInfo — this program prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.

Troubleshooting

If you get an error when you run the bedToBigBed program, it may be because your input bigGenePred file has data off the end of a chromosome. In this case, use the bedClip program here before the bedToBigBed program. It will remove the row(s) in your input BED file that are off the end of a chromosome.