Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snakemake #8

Closed
aryarm opened this issue Oct 17, 2018 · 6 comments
Closed

snakemake #8

aryarm opened this issue Oct 17, 2018 · 6 comments
Labels

Comments

@aryarm
Copy link
Member

aryarm commented Oct 17, 2018

once we have a working pipeline, it might be nice to convert it to a snakemake pipeline so that it can be run in parallel on a cluster machine

@aryarm aryarm assigned azwildcat and aryarm and unassigned azwildcat Oct 19, 2018
@aryarm aryarm added the future label Oct 19, 2018
@aryarm
Copy link
Member Author

aryarm commented Apr 26, 2019

Before we do this, we should make sure to split the pipeline up into smaller steps, since snakemake likes to have control over every small step in the pipeline so that it can manage job control. This will probably require breaking up each of the python scripts.

Some of the python code simply calls terminal commands. So we might consider converting some of the python code to bash scripts, which will be easier to terminate and control from snakemake.

@aryarm aryarm removed their assignment May 3, 2019
@JarredAllen JarredAllen self-assigned this Jun 6, 2019
@JarredAllen
Copy link
Contributor

JarredAllen commented Jun 6, 2019

I think I created a snakemake pipeline for this with commit 085775e. However, Snakemake requires using python3, so this is on hold until I can get the code onto a newer linux server.

@aryarm
Copy link
Member Author

aryarm commented Jun 6, 2019

Nice! It looks good. You'll probably need a config file eventually, so you can fill the wildcards in the Snakefile (otherwise, snakemake won't know which wildcards to use when it calls your rules). I usually use the config file to specify the paths to inputs to the pipeline. Additionally, config variables can be overridden from the command line, so it's super to easy to switch out inputs (or use a subset of them if you write some code in your Snakefile to do it).
Here's an example Snakemake pipeline I've worked with, if that helps.

One nice thing about snakemake is that you can run separate parts of the pipeline in different conda environments (ie you can call the entire pipeline from a python3 environment but have it execute its rules in a python2 environment). I started trying to create an environment file in f73028b. Also see issue #6.

@JarredAllen
Copy link
Contributor

The steps which are completed have been written into a Snakefile and the pipeline is working.

Future steps to do:

  • extract the command-line options to a config file, allowing for them to be more easily changed
  • automatically detect the number of ROIs in the snakefile, instead of requiring the user to check.
  • automate more steps (recombine tracks at the end, detect ROIs, etc.)

@JarredAllen
Copy link
Contributor

The last two steps have been completed, so the only thing left to do with the pipeline is to extract the command-line options and other things which I may be tweaked into a separate configuration file, instead of the current setup, where one is a global constant defined in the pipeline and the others are forced to be the default values.

@JarredAllen
Copy link
Contributor

I moved that last step into a new issue because it's a separate thing from making a pipeline.

The issue is here:
#20

@JarredAllen JarredAllen removed their assignment Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants