-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BLASTDBv5 datatype (for blast >=2.8.1) #9939
Conversation
Have you any thoughts on if we need to update this and the datatypes on the tool shed, or just Galaxy itself, or both? If we do it on the Tool Shed, we could extend the existing data package or add a second one. I don't think that will make much difference, but using the tool shed will be required for rolling this out to older versions of Galaxy. So I think we should do toolshed first, then add to them Galaxy itself once stable and proven. Update Or both at once. |
I'd prefer focusing on maintaining the datatypes only within the Galaxy code, just as you proposed in peterjc/galaxy_blast#124 (comment). That's also what is written in iuc standards Not sure what to do for older Galaxy versions... I guess Usegalaxy.* instances, and other big instances are often up-to-date with latest releases, so probably not a problem in most cases? This kind of low-impact PR can probably be backported to 20.05, and maybe previous releases too if needed |
If the Galaxy team are willing to accept back ports to the older release, that'd help minimise the need to do this via the ToolShed. |
Do the newer BLAST versions still work with old data bases? Otherwise the data table / data manager might need an update https://github.com/peterjc/galaxy_blast/tree/master/data_managers/ncbi_blastdb |
Yes old databases should be usable by newer versions |
Coincidentally I'm struggling with the blastdb data types (#9885). Somehow it seems impossible to create a test for basic composite data types. By a quick look in the NCBI tool wrappers it seemed to me that this is untested. Any plans to add such tests? Is this working in practice (for my work I do not use BLAST)? |
Commented on #9885, as far as I know, upload of BLAST DB files never worked - with the knock on effect of preventing their direct use in tests with planemo. One workaround is testing a workflow instead. Another workaround which the BLAST+ wrapper tool suite uses is using test databases via a |
Resuming work on peterjc/galaxy_blast#123 |
I'm catching up on some BLAST+ wrapper work (prompted by @abretaud etc), and on peterjc/galaxy_blast#129 will declare the old BLAST datatype definitions on the Tool Shed obsolete and stop using them. Having the NCBI BLAST Database v5 format directly in Galaxy is preferable (i.e. this pull request or one like it). Likewise adding the NCBI BLAST XML v2 format - peterjc/galaxy_blast#65 - but that can be done separately. |
Hi,
Sample fasta file (with the Uniprot header : >db|UniqueIdentifier|EntryName) Blastdb (version 4):
Search with UniqueIdentifier:
Search with EntryName:
Both Q8I6R7 or ACN2_ACAGO works Blastdb (version 5):
Search with UniqueIdentifier:
Search with EntryName:
Using the V5 indexes we are now unable to use EntryName as argument for -entry. You need to be aware of this, it was very convient to use both EntryName and UniqueIdentifier. |
Thanks @FredericBGA - that does look like an NCBI BLAST+ bug, hopefully something that can be fixed and not a design limitation of the V5 DB format. |
Yep I don't think it's blocking this PR, let's hope they'll fix it in a future version. |
If you use |
People will still be able to use older tool versions if they want, and the v4 datatypes will still be there, so this should not break anything for most people |
It has crossed my mind that we could tweak the class definition and use the same datatype for both V4 and V5 databases, on the assumption that most tools will eventually transition. There would be pain during the transition though... |
I've got an answer from NCBI: Hi, Thanks for following up. As far as I can tell, this is expected behavior for dbV5 since the defline parsing is limited to the first string. Because of this, it is not aware of the pipe separated locus_id field since that field is not indexed and cannot retrieve records given the locus_tag input. The following is my test on custom generated test db and well as our swissport production database. $ efetch -db protein -id p12345 -format fasta
$ efetch -db protein -id p12345 -format fasta | makeblastdb -dbtype prot -parse_seqids -out x2 -title "p12345 stdin" -in - $ blastdbcmd -db x2 -entry AATM_RABIT $ blastdbcmd -db swissprot -entry AATM_RABIT $ blastdbcmd -db swissprot -entry p12345
$ blastdbcmd -db x2 -entry p12345
Regards, |
Awesome - thanks for the discussion all. This should be good regardless of the blast issues being resolved right? |
Thanks all. Still trying to catch up on the BLAST+ wrapper backlog, but having this in the next Galaxy release will help a lot later. |
Need to wait for new v5 BLAST DB datatypes to be in a released version of Galaxy before using them. See galaxyproject/galaxy#9939
Need to wait for new v5 BLAST DB datatypes to be in a released version of Galaxy before using them. See galaxyproject/galaxy#9939
In preparation for wrapping NCBI BLAST+ 2.10, which adds support for setting the preferred DB version. Need to wait for new v5 BLAST DB datatypes to be in a released version of Galaxy before using them. See galaxyproject/galaxy#9939
Thanks for the merge @jmchilton ! |
In preparation for wrapping NCBI BLAST+ 2.10, which adds support for setting the preferred DB version. Need to wait for new v5 BLAST DB datatypes to be in a released version of Galaxy before using them. See galaxyproject/galaxy#9939
Add support for the new version of blast databases, introduced un 2.8.1, and now the default in 2.10.0 (https://www.ncbi.nlm.nih.gov/books/NBK131777/)
Should fix peterjc/galaxy_blast#124 (ping @peterjc)