TOGA (Tool to infer Orthologs from Genome Alignments) is a homology-based method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost.
This track has 60,090 items in the track, covering 492,175,044 bases in the sequence which is % 18.91 of the total sequence of size 2,602,275,310 nucleotides.
As input, TOGA uses a gene annotation of a reference species (human/hg38 for mammals, chicken/galGal6 for birds) and a whole genome alignment between the reference and query genome.
TOGA implements a novel paradigm that relies on alignments of intronic and intergenic regions and uses machine learning to accurately distinguish orthologs from paralogs or processed pseudogenes.
To annotate genes, CESAR 2.0 is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species.
Each annotated transcript is shown in a color-coded classification as
Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the protein sequence and protein/exon alignments.
The data for this track is available from the bigBed file format with the command line access tool bigBedToBed available from the utilities download directory hgdownload.soe.ucsc.edu/admin/exe/linux_x86_64.
To extract from the bigBed file:
bigBedToBed "https://hgdownload.soe.ucsc.edu/hubs/GCA/004/024/605/GCA_004024605.1/bbi/HLTOGAannotVsHg38v1.bb" togaData.bedwith the result in the togaData.bed file.
This data was prepared by the Michael Hiller Lab
The TOGA software is available from github.com/hillerlab/TOGA
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A, Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M. TOGA integrates gene annotation with orthology inference at scale. bioRxiv preprint September 2022