RepeatModeler Version 2.0.7 =========================== Using output directory = /dev/shm/rModeler.b55xnB/RM_3197284.MonMar92116382026 Search Engine = rmblast 2.14.1+ Threads = 32 Dependencies: TRF 4.09, RECON , RepeatScout 1.0.7, RepeatMasker 4.2.1, RepeatAfterMe 0.0.7 LTR Structural Analysis: Disabled [use -LTRStruct to enable] Random Number Seed: 1773116197 Database = /dev/shm/rModeler.b55xnB/GCA_963969275.1_fRhaChi2.1_alternate_haplotype - Sequences = 2406 - Bases = 738725802 - N50 = 1057267 - Contig Histogram: Size(bp) Count ----------------------------------------------------------------------- 6034480-6465237 | [ 1 ] 5603724-6034480 | [ ] 5172968-5603724 | [ ] 4742211-5172967 | [ ] 4311455-4742211 | [ 1 ] 3880699-4311455 | [ 3 ] 3449943-3880699 | [ 6 ] 3019186-3449942 | [ 5 ] 2588430-3019186 | [ 8 ] 2157674-2588430 | [ 13 ] 1726918-2157674 | [ 30 ] 1296161-1726917 |** [ 77 ] 865405-1296161 |*** [ 138 ] 434649-865405 |****** [ 231 ] 3893-434649 |************************************************** [ 1893 ] Storage Throughput = excellent ( 1159.13 MB/s ) RepeatModeler Round # 1 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 40000000 bp - Final Sample Size = 40038058 bp ( 40038058 non ambiguous ) - Num Contigs Represented = 617 - Sequence extraction : 00:00:05 (hh:mm:ss) Elapsed Time -- Running RepeatScout on the sequences... - RepeatScout: Running build_lmer_table ( l = 14, min = 10 ).. - RepeatScout: Running RepeatScout.. : 592 raw families identified - RepeatScout: Running filtering stage.. 549 families remaining - RepeatScout: 00:06:53 (hh:mm:ss) Elapsed Time - Collecting repeat instances... - Refining 517 families... 01:01:44 (hh:mm:ss) Elapsed Time - Redundant Families and Large Satellite Filtering.. : 13 satellite(s), 172 contained, found in 00:00:08 (hh:mm:ss) Elapsed Time Family Refinement: 00:00:09 (hh:mm:ss) Elapsed Time Round Time: 01:09:00 (hh:mm:ss) Elapsed Time : 331 families discovered. RepeatModeler Round # 2 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 10000000 bp - Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:45 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 7774 repeats masked totaling 2139495 bp(s). - TE Masking time 00:00:23 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 10032383 bp Num Contigs Represented = 224 Non ambiguous bp: Initial: 10032383 bp After Masking: 7462770 bp Masked: 25.61 % -- Input Database Coverage: 10032383 bp out of 738725802 bp ( 1.36 % ) Sampling Time: 00:01:11 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 34191 Comparison Time: 00:05:24 (hh:mm:ss) Elapsed Time, 20300 HSPs Collected Number of families returned by RECON: 1198 Round Time: 00:07:21 (hh:mm:ss) Elapsed Time : 40 families discovered. RepeatModeler Round # 3 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 30000000 bp - Sequence extraction : 00:00:04 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:02:33 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 29037 repeats masked totaling 7823543 bp(s). - TE Masking time 00:01:14 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 30005610 bp Num Contigs Represented = 516 Non ambiguous bp: Initial: 30005610 bp After Masking: 20626080 bp Masked: 31.26 % -- Input Database Coverage: 40037993 bp out of 738725802 bp ( 5.42 % ) Sampling Time: 00:03:54 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 310078 Comparison Time: 00:29:12 (hh:mm:ss) Elapsed Time, 56110 HSPs Collected Number of families returned by RECON: 3680 Round Time: 00:34:26 (hh:mm:ss) Elapsed Time : 83 families discovered. RepeatModeler Round # 4 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 90000000 bp - Sequence extraction : 00:00:12 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:05:53 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 99392 repeats masked totaling 25445521 bp(s). - TE Masking time 00:04:08 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 90013018 bp Num Contigs Represented = 961 Non ambiguous bp: Initial: 90013018 bp After Masking: 60211741 bp Masked: 33.11 % -- Input Database Coverage: 130051011 bp out of 738725802 bp ( 17.60 % ) Sampling Time: 00:10:24 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 2881200 Comparison Time: 03:30:09 (hh:mm:ss) Elapsed Time, 199607 HSPs Collected Number of families returned by RECON: 11790 Round Time: 03:49:06 (hh:mm:ss) Elapsed Time : 368 families discovered. RepeatModeler Round # 5 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 270000000 bp - Sequence extraction : 00:00:35 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:35:07 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 370142 repeats masked totaling 90900597 bp(s). - TE Masking time 00:24:31 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 270002539 bp Num Contigs Represented = 1557 Non ambiguous bp: Initial: 270002539 bp After Masking: 166921988 bp Masked: 38.18 % -- Input Database Coverage: 400053550 bp out of 738725802 bp ( 54.15 % ) Sampling Time: 01:00:45 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 25779790 Comparison Time: 26:29:26 (hh:mm:ss) Elapsed Time, 576991 HSPs Collected Number of families returned by RECON: 41832 Round Time: 28:23:48 (hh:mm:ss) Elapsed Time : 840 families discovered. RepeatScout/RECON discovery complete: 1662 families found # # RepeatClassifier # # Version 2.0.7 # Threads: 32 # Current Working Directory: /dev/shm/rModeler.b55xnB/RM_3197284.MonMar92116382026 # Protein Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatPeps.lib # - 18011 proteins # Consensi Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatMasker.lib # - 26292 consensus sequences - Looking for simple/tandem and low complexity sequences.. - Looking for similarity to known repeat proteins.. - Looking for similarity to known repeat consensi.. Classification Time: 00:14:51 (hh:mm:ss) Elapsed Time Program Time: 34:18:32 (hh:mm:ss) Elapsed Time