Description

Ensembl genes annotations of the HPRC assemblies, version: 2022_08 on the 24 May 2021 Homo sapiens/GCA_018473295.1_HG03540.pri.mat.f1_v2 genome assembly.
Gene count: 233,386; Bases covered: 1,730,907,884 (149,987,925 bases in exons only)

Methods

Ensembl annotation of the human assemblies has been produced via a new mapping pipeline:

A subset of the GENCODE 38 genes and transcripts have been annotated on each of the haploid assemblies. The subset excludes readthrough genes and genes on patches or haplotypes. For each gene, anchor sequences built from the surrounding region were used to locate the most likely corresponding region(s) in the target genome. A pairwise alignment of the reference and target regions was then carried out and used to map the exon coordinates and other features of the gene. In addition to the primary mapping, potential recent duplications and collapsed paralogues were identified by aligning canonical transcripts across the entire genome and searching for new mappings that did not overlap existing annotations. For more details on the annotation process, please refer to the preprint publication (see "Methods" section: "Ensembl Mapping Pipeline for Assembly Annotation").

Data availability

Ensembl Human Pangenome Reference Consortium: https://projects.ensembl.org/hprc/

The bigGenePred file in this assembly hub can be obtained from: https://hgdownload.soe.ucsc.edu/hubs/GCA/018/473/295/GCA_018473295.1/bbi/GCA_018473295.1_HG03540.pri.mat.f1_v2.ebiGene.bb

References

A Draft Human Pangenome Reference
Wen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, Julian K. Lucas, Jean Monlong, Haley J. Abel, Silvia Buonaiuto, Xian H. Chang, Haoyu Cheng, Justin Chu, Vincenza Colonna, Jordan M. Eizenga, Xiaowen Feng, Christian Fischer, Robert S. Fulton, Shilpa Garg, Cristian Groza, Andrea Guarracino, William T Harvey, Simon Heumos, Kerstin Howe, Miten Jain, Tsung-Yu Lu, Charles Markello, Fergal J. Martin, Matthew W. Mitchell, Katherine M. Munson, Moses Njagi Mwaniki, Adam M. Novak, Hugh E. Olsen, Trevor Pesout, David Porubsky, Pjotr Prins, Jonas A. Sibbesen, Chad Tomlinson, Flavia Villani, Mitchell R. Vollger, Human Pangenome Reference Consortium, Guillaume Bourque, Mark JP Chaisson, Paul Flicek, Adam M. Phillippy, Justin M. Zook, Evan E. Eichler, David Haussler, Erich D. Jarvis, Karen H. Miga, Ting Wang, Erik Garrison, Tobias Marschall, Ira Hall, Heng Li, Benedict Paten
bioRxiv: 2022.07.09.499321; doi: https://doi.org/10.1101/2022.07.09.499321>

Contact

For inquiries, please contact:

Contact Ensembl

Credits

Ensembl