You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running RepeatModeler, I am consistently getting the same issue at the clustering step of the LTR Pipeline. I am getting essentially the same error message as in #241, however when I try shortening the sequence identifiers (I have also tried shortening the genome name, and the database name) to less than 13 characters as described in #241, I am still getting the same exact issue.
I have tried using three different genomes, all of which are giving me the same error. The RECON/ RepeatScout pipeline seems to be working fine, and I am getting a -families.fa file which has the consensus families excluding LTRs.
This is the error report I am getting in the stderr file: LTRPipeline : Error - could not open /home/sjd028/RepeatModelerTesting/AterTest/RM_1178777.SatOct51620362024/LTR_2708924.WedOct91432322024/clusters.dat! at /opt/RepeatModeler/LTRPipeline line 333.
This is the error I am getting in the stdout file:
_LTR Structural Analysis
Running LtrHarvest... : 00:35:17 (hh:mm:ss) Elapsed Time
Running Ltr_retriever... : 00:43:56 (hh:mm:ss) Elapsed Time
Aligning instances... : 00:04:37 (hh:mm:ss) Elapsed Time
Clustering...LTRPipeline: Error - could not cluster MAFFT results.
: 00:00:00 (hh:mm:ss) Elapsed Time
LTRPipeline Time: 01:23:53 (hh:mm:ss) Elapsed Time_
Reproduction steps
I ran RepeatModeler as a singularity on a computing cluster, giving the job 8 cores at 16Gb per core. This is the command I used: singularity run $dfam RepeatModeler -database AterDbTest1 -threads 20 -LTRStruct
Host system
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian
Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024
The text was updated successfully, but these errors were encountered:
First of all, you are allocating 8 cores for your job but telling RepeatModeler it has access to 20. While I am surprised your job wasn't killed sooner when it was running rmblast, it could be that mafft is overallocating cores and the job is getting killed. MAFFT is memory intensive, I would double check that you are actually giving your jobs 8x16GB, which should be adequate, but perhaps you are giving it less than that? Finally, you can rerun the LTR analysis separately for testing purposes like so: "./LTRPipeline -debug -threads # genome.fa" (NOTE: you give it the original genome in fasta format for this command ). This will generate more screen logging of what it is doing at each stage and keep additional files in the LTR_######## output directory.
Describe the issue
When running RepeatModeler, I am consistently getting the same issue at the clustering step of the LTR Pipeline. I am getting essentially the same error message as in #241, however when I try shortening the sequence identifiers (I have also tried shortening the genome name, and the database name) to less than 13 characters as described in #241, I am still getting the same exact issue.
I have tried using three different genomes, all of which are giving me the same error. The RECON/ RepeatScout pipeline seems to be working fine, and I am getting a -families.fa file which has the consensus families excluding LTRs.
This is the error report I am getting in the stderr file:
LTRPipeline : Error - could not open /home/sjd028/RepeatModelerTesting/AterTest/RM_1178777.SatOct51620362024/LTR_2708924.WedOct91432322024/clusters.dat! at /opt/RepeatModeler/LTRPipeline line 333.
This is the error I am getting in the stdout file:
_LTR Structural Analysis
Running LtrHarvest... : 00:35:17 (hh:mm:ss) Elapsed Time
Running Ltr_retriever... : 00:43:56 (hh:mm:ss) Elapsed Time
Aligning instances... : 00:04:37 (hh:mm:ss) Elapsed Time
Clustering...LTRPipeline: Error - could not cluster MAFFT results.
: 00:00:00 (hh:mm:ss) Elapsed Time
LTRPipeline Time: 01:23:53 (hh:mm:ss) Elapsed Time_
Reproduction steps
I ran RepeatModeler as a singularity on a computing cluster, giving the job 8 cores at 16Gb per core. This is the command I used:
singularity run $dfam RepeatModeler -database AterDbTest1 -threads 20 -LTRStruct
I tried three different genomes:
Drosophila melanogaster: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000778455.1/
Abscondita terminalis (firefly): https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013368085.1/
Lamprigera yunanna (firefly): https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013368075.1/
Log output
File structure output:
AterDbTest1-rmod.log:
AterDbTest1-rmod.log
slurm (computing cluster job manager) output file:
slurm.hpc-4.272297.stdout.txt
Host system
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian
Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024
The text was updated successfully, but these errors were encountered: