-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eleredef - long runtime #72
Comments
Unfortunately not - RECON is not actively maintained, and it would take some knowledge of the underlying algorithm to know if skipping that step is possible or to add time estimates. But it is unusual for it to take this long.
500 million HSPs seems very high to me, even in round 5 - for comparison, a run we have done on hg38 found 2 million HSPs in round 5. Is your genome particularly repeat-rich or otherwise interesting? Did other rounds find similarly high numbers of HSPs, or only this one? It is also possible you were unlucky and got a "bad" (overly complex to analyze) portion of the genome during sampling and that a new run with a different seed would be just fine. |
Thanks for the reply! Yep, it's repeat rich. One run that did complete on another assembly of the same genome found 80% repeats, and it's a 1.2 GB genome. For that run I used RepeatModeler / RepeatMasker that I had manually installed in a Singularity container, rather than using I don't know how many HSPs there were in round 4, because it's a |
You can currently count the number of lines in the file |
Right, it's much less in round 4.
|
Can you post the rest of the log file (whatever you do have)? And another thought - since you are running the TETools container, can you provide the command line you used to start the container as well? |
It's launched via snakemake, which activates the singularity image. Ends up being a pretty complicated command:
The singularity container is just a local image pulled from
Here's the log (minus a few thousand lines showing the progress)
...
|
I was worried about TRF, but it looks like you took care of it. @rmhubley might have a better idea, but all I can think of now is to try it again and see if you have better luck with different samples being chosen from the genome. |
Hi @jebrosen, just an update - I've tried this 14 times now with variations of this genome and I always get > 500 M HSPs at round 5. In the latest batch of attempts I filtered out contigs < 50 kb, in case the large number of short contigs was causing the issue, but I'm still seeing the same thing. |
In this step I run five days and have not been completed! This is should be updated for RepeatModeler |
Hi,
I'm wondering if there's anything I can do with a run of RepeatModeler that is taking a long time with eleredef. Is there a way to skip this step? Or to tell how long it's going to take, or work out how far through it is?
I'm running RepeatModeler from the Dfam TE Tools Container (
dfam/tetools:1.1
) on a 1.2 GB genome like this:But eleredef has been running on seqnames for about two weeks, using >100 GB of RAM, e.g.
(That's 17.8 thousand minutes, i.e. 1.8 weeks.)
Here's the output from RepeatModeler:
I don't really know what eleredef is doing, but seqnames is 12,315 lines and there are 6622 batch-*.fa files in the round-5 folder. There are currently 143,068 files in round-5/ele_redef_res.
Thanks!
The text was updated successfully, but these errors were encountered: