Skip to content

Latest commit

 

History

History
96 lines (71 loc) · 3.25 KB

README.md

File metadata and controls

96 lines (71 loc) · 3.25 KB

This is based on EBI gene expression group repository https://github.com/ebi-gene-expression-group/galaxy-workflow-executor

A sample invocation would be:

python run_galaxy_workflow.py -C galaxy_credentials.yml -i input_files.yaml -W GC-lite-workflow.ga -k -H testCLI -P wf-parameters.json -G usegalaxy_eu

For instruction details and technical issues regarding the CLI executation please refer to these repositories: https://github.com/ebi-gene-expression-group/galaxy-workflow-executor and https://github.com/ebi-gene-expression-group/scxa-workflows

Galaxy workflow executor

This setup uses bioblend to run a Galaxy workflow through the cli:

  • Inputs:
    • Galaxy workflow as JSON file (from share workflow -> download).
    • Parameters dictionary as JSON
    • Input files defined in YAML
    • Steps with allowed errors in YAML (optional)
    • History name (optional)

Galaxy workflow

The workflow should be annotated with labels, ideally for all steps, but at least for the steps where you want to be able to set parameters through the parameters dictionary. It should be the JSON file resulting from Workflows (upper menu) -> Share workflow (on the drop down menu of the workflow, in the workflow list) -> Download (in the following screen).

Parameters JSON

It should follow the following structure:

{
    "step_label_x": {
        "param_name": "value",
        ....
        "nested_param_name": {
            "n_param_name": "n_value",
            ....
            "x_param_name": "x_value"
        }

    },
    "step_label_x2": {
        ....
    },
    ....
    "other_galaxy_setup_params": { ... }
}

Input files in YAML

It should point to the files in the file system, set a name (which needs to match with a workflow input label) and file type (among those recognized by Galaxy).

The structure of the YAML file for inputs is:

matrix:
  path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx
  type: txt
genes:
  path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx_rows
  type: tsv
barcodes:
  path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx_cols
  type: tsv
gtf:
  dataset_id: fe139k21xsak

where in this example case the Galaxy workflow should have input labels called matrix, genes, barcodes and gtf. The paths need to exist in the local file system, if path is set within an input. Alternatively to a path in the local file system, if the file is already on the Galaxy instance, the dataset_id of the file can be given instead, as shown for the gtf case here.

Steps with allowed errors

This optional YAML file indicates the executor which steps are allowed to fail without the overal execution being considered failed and hence retrieving result files anyway. This is to make room to the fact that on a production setup, there might be border conditions on datasets that could produce acceptable failures.

The structure of the file relies on the labels for steps used in the workflow and parameters files

step_label_x:
  - any
step_label_z:
  - 1
  - 43

The above example means that the step with label step_label_x can fail with any error code, whereas step with label step_label_z will only be allowed to fail with codes 1 or 43.