Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering step of LTR pipeline fails #260

Open
sjd028 opened this issue Oct 23, 2024 · 2 comments
Open

Clustering step of LTR pipeline fails #260

sjd028 opened this issue Oct 23, 2024 · 2 comments
Labels

Comments

@sjd028
Copy link

sjd028 commented Oct 23, 2024

Describe the issue

When running RepeatModeler, I am consistently getting the same issue at the clustering step of the LTR Pipeline. I am getting essentially the same error message as in #241, however when I try shortening the sequence identifiers (I have also tried shortening the genome name, and the database name) to less than 13 characters as described in #241, I am still getting the same exact issue.

I have tried using three different genomes, all of which are giving me the same error. The RECON/ RepeatScout pipeline seems to be working fine, and I am getting a -families.fa file which has the consensus families excluding LTRs.

This is the error report I am getting in the stderr file:
LTRPipeline : Error - could not open /home/sjd028/RepeatModelerTesting/AterTest/RM_1178777.SatOct51620362024/LTR_2708924.WedOct91432322024/clusters.dat! at /opt/RepeatModeler/LTRPipeline line 333.

This is the error I am getting in the stdout file:
_LTR Structural Analysis

Running LtrHarvest... : 00:35:17 (hh:mm:ss) Elapsed Time
Running Ltr_retriever... : 00:43:56 (hh:mm:ss) Elapsed Time
Aligning instances... : 00:04:37 (hh:mm:ss) Elapsed Time
Clustering...LTRPipeline: Error - could not cluster MAFFT results.
: 00:00:00 (hh:mm:ss) Elapsed Time
LTRPipeline Time: 01:23:53 (hh:mm:ss) Elapsed Time_

Reproduction steps
I ran RepeatModeler as a singularity on a computing cluster, giving the job 8 cores at 16Gb per core. This is the command I used:
singularity run $dfam RepeatModeler -database AterDbTest1 -threads 20 -LTRStruct

I tried three different genomes:
Drosophila melanogaster: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000778455.1/
Abscondita terminalis (firefly): https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013368085.1/
Lamprigera yunanna (firefly): https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013368075.1/

Log output
File structure output:
image
AterDbTest1-rmod.log:
AterDbTest1-rmod.log
slurm (computing cluster job manager) output file:
slurm.hpc-4.272297.stdout.txt

Host system
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian

Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024

@sjd028 sjd028 added the bug label Oct 23, 2024
@sjd028
Copy link
Author

sjd028 commented Nov 5, 2024

Additional info about host system:

The Dfam TETools container was installed using singularity. The version of RepeatModeler is 2.0.5. The version of the TETools package is 1.88.

@rmhubley
Copy link
Member

First of all, you are allocating 8 cores for your job but telling RepeatModeler it has access to 20. While I am surprised your job wasn't killed sooner when it was running rmblast, it could be that mafft is overallocating cores and the job is getting killed. MAFFT is memory intensive, I would double check that you are actually giving your jobs 8x16GB, which should be adequate, but perhaps you are giving it less than that? Finally, you can rerun the LTR analysis separately for testing purposes like so: "./LTRPipeline -debug -threads # genome.fa" (NOTE: you give it the original genome in fasta format for this command ). This will generate more screen logging of what it is doing at each stage and keep additional files in the LTR_######## output directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants