-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
makeblastdb file output appears to be too big #122
Comments
That is not normal. Can you share the FASTA file? My email to my Google account if it is private? |
It is happening with any standard genome fasta file, doesn't appear to be file specific. |
The issue is with version 2.10.0+, I also have an older version 2.2.3+ that doesn't produce these big files. Both are the windows 64-bit versions of Blast+. V2.10.0+ is creating .ndb and .ntf files that are 297 GB in size. |
Ah. I wonder if this is due to the new v5 BLAST database format? It would be surprising but not impossible that they are optimised for larger database. The Galaxy wrappers / provided BLAST database datatype doesn't actually know about the new extensions, but that is a separate problem: https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatypes/blast.py#L244 I have not made time to explore this yet - and have limited time this week due to childcare. |
Duly noted. From a resources perspective, we will stick to the prior version for the time being. |
Command used was:
makeblastdb -in seq-contigs.fasta -out seqdb -parse_seqids -dbtype nucl
From a 4 MB fasta file, this is creating blast databases of size 500 GB+. Is this normal? What could be wrong with what I am doing?
The text was updated successfully, but these errors were encountered: