snakemake #8

aryarm · 2018-10-17T00:01:52Z

once we have a working pipeline, it might be nice to convert it to a snakemake pipeline so that it can be run in parallel on a cluster machine

aryarm · 2019-04-26T22:33:31Z

Before we do this, we should make sure to split the pipeline up into smaller steps, since snakemake likes to have control over every small step in the pipeline so that it can manage job control. This will probably require breaking up each of the python scripts.

Some of the python code simply calls terminal commands. So we might consider converting some of the python code to bash scripts, which will be easier to terminate and control from snakemake.

JarredAllen · 2019-06-06T20:42:40Z

I think I created a snakemake pipeline for this with commit 085775e. However, Snakemake requires using python3, so this is on hold until I can get the code onto a newer linux server.

aryarm · 2019-06-06T23:41:28Z

Nice! It looks good. You'll probably need a config file eventually, so you can fill the wildcards in the Snakefile (otherwise, snakemake won't know which wildcards to use when it calls your rules). I usually use the config file to specify the paths to inputs to the pipeline. Additionally, config variables can be overridden from the command line, so it's super to easy to switch out inputs (or use a subset of them if you write some code in your Snakefile to do it).
Here's an example Snakemake pipeline I've worked with, if that helps.

One nice thing about snakemake is that you can run separate parts of the pipeline in different conda environments (ie you can call the entire pipeline from a python3 environment but have it execute its rules in a python2 environment). I started trying to create an environment file in f73028b. Also see issue #6.

JarredAllen · 2019-06-11T22:18:42Z

The steps which are completed have been written into a Snakefile and the pipeline is working.

Future steps to do:

extract the command-line options to a config file, allowing for them to be more easily changed
automatically detect the number of ROIs in the snakefile, instead of requiring the user to check.
automate more steps (recombine tracks at the end, detect ROIs, etc.)

JarredAllen · 2019-06-24T17:30:34Z

The last two steps have been completed, so the only thing left to do with the pipeline is to extract the command-line options and other things which I may be tweaked into a separate configuration file, instead of the current setup, where one is a global constant defined in the pipeline and the others are forced to be the default values.

JarredAllen · 2019-06-24T17:39:29Z

I moved that last step into a new issue because it's a separate thing from making a pipeline.

The issue is here:
#20

aryarm assigned azwildcat and aryarm and unassigned azwildcat Oct 19, 2018

aryarm added the future label Oct 19, 2018

aryarm removed their assignment May 3, 2019

JarredAllen self-assigned this Jun 6, 2019

JarredAllen closed this as completed Jun 24, 2019

JarredAllen removed their assignment Jul 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snakemake #8

snakemake #8

aryarm commented Oct 17, 2018

aryarm commented Apr 26, 2019

JarredAllen commented Jun 6, 2019 •

edited

Loading

aryarm commented Jun 6, 2019 •

edited

Loading

JarredAllen commented Jun 11, 2019

JarredAllen commented Jun 24, 2019

JarredAllen commented Jun 24, 2019

snakemake #8

snakemake #8

Comments

aryarm commented Oct 17, 2018

aryarm commented Apr 26, 2019

JarredAllen commented Jun 6, 2019 • edited Loading

aryarm commented Jun 6, 2019 • edited Loading

JarredAllen commented Jun 11, 2019

JarredAllen commented Jun 24, 2019

JarredAllen commented Jun 24, 2019

JarredAllen commented Jun 6, 2019 •

edited

Loading

aryarm commented Jun 6, 2019 •

edited

Loading