Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeblastdb file output appears to be too big #122

Open
balags1 opened this issue Mar 17, 2020 · 5 comments
Open

makeblastdb file output appears to be too big #122

balags1 opened this issue Mar 17, 2020 · 5 comments

Comments

@balags1
Copy link

balags1 commented Mar 17, 2020

Command used was:
makeblastdb -in seq-contigs.fasta -out seqdb -parse_seqids -dbtype nucl

From a 4 MB fasta file, this is creating blast databases of size 500 GB+. Is this normal? What could be wrong with what I am doing?

@peterjc
Copy link
Owner

peterjc commented Mar 17, 2020

That is not normal. Can you share the FASTA file? My email to my Google account if it is private?

@balags1
Copy link
Author

balags1 commented Mar 17, 2020

It is happening with any standard genome fasta file, doesn't appear to be file specific.

@balags1
Copy link
Author

balags1 commented Mar 17, 2020

NZ_CP015724.1.fasta.txt

The issue is with version 2.10.0+, I also have an older version 2.2.3+ that doesn't produce these big files. Both are the windows 64-bit versions of Blast+. V2.10.0+ is creating .ndb and .ntf files that are 297 GB in size.

@peterjc
Copy link
Owner

peterjc commented Mar 17, 2020

Ah. I wonder if this is due to the new v5 BLAST database format? It would be surprising but not impossible that they are optimised for larger database.

The Galaxy wrappers / provided BLAST database datatype doesn't actually know about the new extensions, but that is a separate problem:

https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatypes/blast.py#L244

I have not made time to explore this yet - and have limited time this week due to childcare.

@balags1
Copy link
Author

balags1 commented Mar 17, 2020

Duly noted. From a resources perspective, we will stick to the prior version for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants