Description

This track shows the tandem repeats found within the consensus sequence of a repeat element as identified by TRF (Tandem Repeat Finder).

Display Conventions and Configuration

Tandem Repeats and their sequences are marked on the Repeat Browser. These annotations may be useful for repeats like SVA (V = variable nucleotide tandem repeat). The sequence of the repeat can be seen in full mode. If the length of the repeat is longer than 256 characters (max length for the name field) then a truncated version of the name is stored in the bigBed file (the full repeat is still annotated).

Methods

Tandem Repeat Finder was run on the collection of consensus sequences making up hg38reps.fa.

trfBig ../hg38reps/hg38reps.fa trf.out bedAt=trf.raw

cat trf.tab | gawk 'BEGIN {FS="\t"; OFS="\t"} {$4=$16; if (length($4) > 256){$4=substr($4,1,206)"_truncated"}; NR=14; print}' | cut -f-15 > trf.bed

bedSort trf.bed trf.bed

bedToBigBed trf.bed ../hg38reps/hg38reps.sizes trf.bb -type=bed4+11

References

See more information on the TRF Unix Help web page: https://tandem.bu.edu/trf/trf.unix.help.html

G. Benson, "Tandem repeats finder: a program to analyze DNA sequences" Nucleic Acids Research (1999) Vol. 27, No. 2, pp. 573-580.

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences. In order to use the program, the user submits a sequence in FASTA format. There is no need to specify the pattern, the size of the pattern or any other parameter. The output consists of two files: a repeat table file and an alignment file. The repeat table contains information about each repeat, including its location, size, number of copies and nucleotide content. Clicking on the location indices for one of the table entries opens a second web browser that shows an alignment of the copies against a consensus pattern. The program is very fast, analyzing sequences on the order of .5Mb in just a few seconds. Submitted sequences may be of arbitrary length. Repeats with pattern size in the range from 1 to 2000 bases are detected. Sequence information sent to the server is confidential and deleted after program execution. Example of output.

Email: markd@ucsc.edu or mhaeussl@ucsc.edu