Description

This track represents % of simple-sequence (2-mer) repeat pattern. Sequences composed of GA/TC/GC/AT bases are counted if one of the bases repeats (e.g. AAAATTTTAAATT are counted as 13 ATs). Patterns are obtained from every non-verlapping 128 bp window. Values displayed are the % of bases forming the specific repeat type, with a maximum value of 100. These sequence patterns are useful for predicting HiFi or ONT coverage biases, as described in Nurk et al. 2022 (Fig. 3) and Mc Cartney et al., 2022.

Display Conventions and Configuration

Methods

The track was generated using Seqrequester, a Meryl and Canu utility module. The tracks can be generated using seqrequester microsatellite. The code to generate these patterns can be found at T2T-Polish/pattern.

Release history

Credits

For inquiries, please contact us at Seqrequester or T2T-Polish.

References

  1. Nurk S, Koren S, Rhie A, Rautiainen M et al. The complete sequence of a human genome. Science (2022) doi: 10.1126/science.abj6987
  2. Mc Cartney AM, Shafin K, Alonge M et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. Methods (2022) doi: 10.1038/s41592-022-01440-3