The Genome in a Bottle (GIAB) Problematic Regions tracks provide stratifications of the genome to evaluate variant calls in complex regions. It is designed for use with Global Alliance for Genomic Health (GA4GH) benchmarking tools like hap.py and includes regions with low complexity, segmental duplications, functional regions, and difficult-to-sequence areas. Developed in collaboration with GA4GH, the Genome in a Bottle (GIAB) consortium, and the Telomere-to-Telomere Consortium (T2T), the dataset aims to standardize the analysis of genetic variation by offering pre-defined BED files for stratifying true and false positives in genomic studies, facilitating accurate assessments in complex areas of the genome.
The creation of the GIAB Problematic Regions tracks involves using a pipeline and configuration to generate stratification BED files that categorize genomic regions based on specific challenges, such as low complexity or difficult mapping, to facilitate accurate benchmarking of variant calls. For more information on the pipeline and configuration used, please visit the following webpage: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v3.5/README.md.
If you have questions or comments, please write to Justin Zook (jzook@nist.gov).
Dwarshuis N, Kalra D, McDaniel J, Sanio P, Alvarez Jerez P, Jadhav B, Huang WE, Mondal R, Busby B, Olson ND et al. The GIAB genomic stratifications resource for human reference genomes. Nat Commun. 2024 Oct 19;15(1):9029. PMID: 39424793; PMC: PMC11489684
Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM, Moore BL, Gonzalez-Porta M, Eberle MA, Tezak Z, Lababidi S et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol. 2019 May;37(5):555-560. PMID: 30858580; PMC: PMC6699627