Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for -remote #39

Open
peterjc opened this issue Mar 19, 2014 · 8 comments
Open

Support for -remote #39

peterjc opened this issue Mar 19, 2014 · 8 comments

Comments

@peterjc
Copy link
Owner

peterjc commented Mar 19, 2014

Filing an overdue issue for this previously discussed enhancement. See early work by @jj-umn as part of Galaxy-P on this branch (checked in by @jmchilton): https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using -remote makes several 'new' options available including -entrez_query which can be used to filter by taxonomy etc, but also removes other options.

Given the number of options which change, and the concerns about potential abuse of the NCBI servers (which could lead to entire Galaxy instances being black listed), my preference is for a sister set of tools. i.e. We'd have the current (local) BLASTP as one tool, and a new sister tool for remote BLASTP (run at the NCBI).

@jj-umn
Copy link
Contributor

jj-umn commented Mar 19, 2014

I agree that the remote options should be in separate tools.
We should be able to maintain consistency by using macros for common sections.

Thanks,

JJ

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

@peterjc
Copy link
Owner Author

peterjc commented Mar 19, 2014

Do you agree the remote wrappers (which will be a subset since all the database and masking tools won't apply) should be a separate suite on the ToolShed?

If we are careful they can use the same ncbi_macros.xml file (via a symlink if the remote tools get a separate folder under git).

@jj-umn
Copy link
Contributor

jj-umn commented Mar 19, 2014

On 3/19/14, 9:39 AM, Peter Cock wrote:

Do you agree the remote wrappers (which will be a subset since all the database and masking tools won't apply) should be a separate suite on the ToolShed?

If we are careful they can use the same |ncbi_macros.xml| file (via a symlink if the remote tools get a separate folder under git).


Reply to this email directly or view it on GitHub #39 (comment).

That sounds workable.

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

@bgruening
Copy link
Contributor

I'm also for a separate repository. @jj-umn how do you control the max. number of requests to the NCBI Server? I'm a little bit worried if the effort is worth the results. Are there so many users that are not able to setup there own blast database?

@jj-umn
Copy link
Contributor

jj-umn commented Mar 19, 2014

Pratik is the researcher from whom the remote blastp was developed.
If I remember correctly, the primary reason for the remote option was to get blast results for particular organisms in order to search for novel proteins.

Pratik, is this functionality still needed?

Thanks,

JJ

On 3/19/14, 4:57 AM, Peter Cock wrote:

Filing an overdue issue for this previously discussed enhancement. See early work by @jj-umn https://github.com/jj-umn as part of Galaxy-P on this branch (checked in my @jmchilton https://github.com/jmchilton): https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using |-remote| makes several 'new' options available including |-entrez_query| which can be used to filter by taxonomy etc, but also removes other options.

Given the number of options which change, and the concerns about potential abuse of the NCBI servers (which could lead to entire Galaxy instances being black listed), my preference is for a sister set of tools. i.e. We'd have the current (local) BLASTP as one tool, and a new sister tool for remote BLASTP (run at the NCBI).


Reply to this email directly or view it on GitHub #39.

On 3/19/14, 11:30 AM, Björn Grüning wrote:

I'm also for a separate repository. @jj-umn https://github.com/jj-umn how do you control the max. number of requests to the NCBI Server? I'm a little bit worried if the effort is worth the results. Are there so many users that are not able to setup there own blast database?


Reply to this email directly or view it on GitHub #39 (comment).

Using |-remote| makes several 'new' options available including |-entrez_query| which can be used to filter by taxonomy etc, but also removes other options.

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

@jj-umn
Copy link
Contributor

jj-umn commented Mar 19, 2014

Hello JJ,

Yes - we need this tool. I do use it within Galaxy-P for BLAST searches for
proteogenomics work and metaproteomics work.

Thanks,

Pratik

Pratik Jagtap,
Managing Director,
Center for Mass Spectrometry and Proteomics,
43 Gortner Laboratory
1479 Gortner Avenue
St. Paul, MN 55108
Phone: 612-624-9275

On Wed, Mar 19, 2014 at 3:49 PM, Jim Johnson [email protected] wrote:

Pratik is the researcher from whom the remote blastp was developed.
If I remember correctly, the primary reason for the remote option was to
get blast results for particular organisms in order to search for novel
proteins.

Pratik, is this functionality still needed?

Thanks,

JJ

On 3/19/14, 4:57 AM, Peter Cock wrote:

Filing an overdue issue for this previously discussed enhancement. See
early work by @jj-umn https://github.com/jj-umn as part of Galaxy-P on
this branch (checked in my @jmchilton https://github.com/jmchilton):
https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using -remote makes several 'new' options available including
-entrez_query which can be used to filter by taxonomy etc, but also
removes other options.

Given the number of options which change, and the concerns about potential
abuse of the NCBI servers (which could lead to entire Galaxy instances
being black listed), my preference is for a sister set of tools. i.e. We'd
have the current (local) BLASTP as one tool, and a new sister tool for
remote BLASTP (run at the NCBI).

Reply to this email directly or view it on GitHubhttps://github.com//issues/39
.

On 3/19/14, 11:30 AM, Björn Grüning wrote:

I'm also for a separate repository. @jj-umn https://github.com/jj-umnhow do you control the max. number of requests to the NCBI Server? I'm a
little bit worried if the effort is worth the results. Are there so many
users that are not able to setup there own blast database?

Reply to this email directly or view it on GitHubhttps://github.com//issues/39#issuecomment-38073298
.

Using -remote makes several 'new' options available including
-entrez_query which can be used to filter by taxonomy etc, but also
removes other options.

James E. Johnson, Minnesota Supercomputing Institute, University of
Minnesota

@peterjc
Copy link
Owner Author

peterjc commented Mar 19, 2014

Thanks @jj-umn - good to know there is a clear motivation for this, and the -entrez_query feature in particular.

I do appreciate that species filtering is an important use case, and it seems like there ought to be a neat way to do this with a local database (other than using the tabular output and filtering on the taxonomy as a post-processing step). Note we've got issue #36 for filtering by taxonomy, which may be possible for blastn (only) via -window_masker_taxid. But that isn't very general.

We typically solve/avoid this with custom organism specific BLAST databases, often for draft genomes which have not yet been published. As another example I used Entrez to build a complete virus database http://blastedbio.blogspot.co.uk/2013/11/entrez-trouble-with-chimeras.html

@peterjc
Copy link
Owner Author

peterjc commented Oct 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants