Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Merge pull request #280 from vib-singlecell-nf/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
cflerin authored Dec 14, 2020
2 parents 6beddf1 + f8b01a9 commit 91e5724
Show file tree
Hide file tree
Showing 8 changed files with 97 additions and 152 deletions.
31 changes: 31 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
*checkpoint.ipynb
*checkpoint*
*checkpoint.py
*.test.ipynb
*.csv
*.loom
*.pickle
*.pyc
*.html
*egg*
.vscode
.nextflow
.nextflow*
data
refdata
work
out/notebooks
src/scenic/out
src/scenic/notebooks
src/scenic/data
refdata
data/10x/tiny
work/
out/
tests/
debug/
*.swp
*.swo
docs/_build/
src/*/.git

4 changes: 3 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ A repository of pipelines for single-cell data analysis in Nextflow DSL2.

**Full documentation** is available on `Read the Docs <https://vsn-pipelines.readthedocs.io/en/latest/>`_, or take a look at the `Quick Start <https://vsn-pipelines.readthedocs.io/en/latest/getting-started.html#quick-start>`_ guide.

This main repo contains multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools, which are organized into submodules within the VIB-Singlecell-NF_ organization.
This main repo contains multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools, which are organized into subfolders within the ``src/`` directory.
The VIB-Singlecell-NF_ organization contains this main repo along with a collection of example runs (`VSN-Pipelines-examples <https://vsn-pipelines-examples.readthedocs.io/en/latest/>`_).
Currently available workflows are listed below.

If VSN-Pipelines is useful for your research, consider citing:
Expand Down Expand Up @@ -109,6 +110,7 @@ Sample Aggregation Workflows


---

In addition, the pySCENIC_ implementation of the SCENIC_ workflow is integrated here and can be run in conjunction with any of the above workflows.
The output of each of the main workflows is a loom_-format file, which is ready for import into the interactive single-cell web visualization tool SCope_.
In addition, data is also output in h5ad format, and reports are generated for the major pipeline steps.
Expand Down
82 changes: 42 additions & 40 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Development
Create module
-------------

Tool-based modules are located in ``src/<tool-name>``, and each module has a specific structure for scripts and Nextflow processes (see `Repository structure`_ below).

Case study: Add `Harmony`
*************************

Expand All @@ -19,40 +21,42 @@ Links:

Steps:

#. Ask the `VIB-SingleCell-NF` administrators to create a new repository (in this case: ``harmony``) or create one on your GitHub account that could be brought into the `VIB-SingleCell-NF` organization.

When using your own repo, you MUST start from the `template repository`_ in the vib-singlecell-nf organisation. Click the green "Use this template" button and provide a name for your new repo. Make sure the "Include all branches" checkbox is checked.

.. _`template repository`: https://github.com/vib-singlecell-nf/template

#. Create a new issue on ``vsn-pipelines`` GitHub repository explaining which module you are going to add (e.g.: `Add Harmony batch correction method`).


#. `Fork the`_ ``vsn-pipelines`` repository to your own GitHub account.
#. `Fork the`_ ``vsn-pipelines`` repository to your own GitHub account (if you are an external collaborator).

.. _`Fork the`: https://help.github.com/en/github/getting-started-with-github/fork-a-repo

#. From your ``vsn-pipelines`` GitHub repository, create a new branch called ``feature/[github-issue-id]-[description]``.
#. From your local copy of ``vsn-pipelines`` GitHub repository, create a new branch called ``feature/[github-issue-id]-[description]``.

In this case,

- ``[github-issue-id] = 115``
- ``[description] = add_harmony_batch_correction_method``

It is highly recommended to start from the ``develop`` branch:

.. code:: bash
git checkout develop
git fetch
git pull
git checkout -b feature/115-add_harmony_batch_correction_method
#. From within the ``src`` directory of the ``vsn-pipelines`` repo, run the ``add_new_submodule.sh`` script.
#. Use the `template repository`_ in the vib-singlecell-nf organisation to create the framework for the new module in ``src/<tool-name>``:

.. code:: bash
./add_new_submodule.sh [git-repo-url] -d
git clone --depth=1 https://github.com/vib-singlecell-nf/template.git src/harmony
``[git-repo-url]`` = https://github.com/vib-singlecell-nf/harmony.git (Git Repository URL from `VSN-SingleCell-NF` or from your GitHub account)
``-d`` tracks the develop branch of the new repository, which is where you should work until the module is working.
.. _`template repository`: https://github.com/vib-singlecell-nf/template

If you are using VSCode and you don't see the new submodule appearing in ``SOURCE CONTROL PROVIDERS``, open any file from ``src/harmony`` (e.g.: LICENSE)
#. Now, you can start to edit file in the tool module that is now located in ``src/<tool-name>``.
Optionally, you can delete the ``.git`` directory in the new module to avoid confusion in future local development:

.. code:: bash
rm -rf src/harmony/.git
#. Create the Dockerfile recipe
Expand Down Expand Up @@ -81,11 +85,11 @@ Steps:
apt-get clean
#. Update the ``nextflow.config`` file to create the ``harmony.config`` configuration file.
#. Rename the ``nextflow.config`` file to create the ``harmony.config`` configuration file.

* Each process's options should be in their own level. With a single proccess, you do not need one extra level.
* Each process's options should be in their own level. With a single process, you do not need one extra level.

.. code:: dockerfile
.. code:: groovy
params {
sc {
Expand Down Expand Up @@ -225,7 +229,7 @@ Steps:
#. Create the Nextflow process that will run the Harmony R script defined in 7.
#. Create the Nextflow process that will run the Harmony R script defined in the previous step.

.. code:: groovy
Expand Down Expand Up @@ -260,9 +264,9 @@ Steps:
}
#. Create a Nextflow module that will call the Nextflow process defined in 8. and perform some other tasks (dimensionality reduction, cluster identification, marker genes identification and report generation)
#. Create a Nextflow "subworkflow" that will call the Nextflow process defined in the previous step and perform some other tasks (dimensionality reduction, cluster identification, marker genes identification and report generation)

This step is not required. However it this step is skipped, the code would still need to added into the main ``harmony`` workflow (`workflows/harmony.nf`, see step 10)
This step is not required. However it this step is skipped, the code would still need to added into the main ``harmony`` workflow (`workflows/harmony.nf`, see the next step)

.. code:: groovy
Expand Down Expand Up @@ -408,7 +412,7 @@ Steps:
}
#. In the ``vsn-pipelines``, create a new main workflow called ``harmony.nf`` under ``workflows``
#. In the ``vsn-pipelines``, create a new main workflow called ``harmony.nf`` under ``workflows/``:

.. code:: groovy
Expand Down Expand Up @@ -599,7 +603,20 @@ Steps:
#. Add a new Nextflow profile in ``nextflow.config`` of the ``vsn-pipelines`` repository
#. Add a new Nextflow profile in the ``profiles`` section of the main ``nextflow.config`` of the ``vsn-pipelines`` repository:

.. code:: groovy
profiles {
harmony {
includeConfig 'src/scanpy/scanpy.config'
includeConfig 'src/harmony/harmony.config'
}
...
}
#. Finally add a new entry in ``main.nf`` of the ``vsn-pipelines`` repository

.. code:: groovy
Expand All @@ -624,29 +641,14 @@ Steps:
}
#. Finally add a new entry in main.nf of the ``vsn-pipelines`` repository
You should now be able to configure (``nextflow config ...``) and run the ``harmony`` pipeline (``nextflow run ...``).

.. code:: groovy
harmony {
includeConfig 'src/scanpy/scanpy.config'
includeConfig 'src/harmony/harmony.config'
}
You should now be able to configure (``nextflow config``) and run the ``harmony`` pipeline (``nextflow run``).

#. After confirming that your module is functional, you should merge your changes in the tool repo into the ``master`` branch.
#. After confirming that your module is functional, you should create a pull request to merge your changes into the ``develop`` branch.

- Make sure you have removed all references to ``TEMPLATE`` in your repository
- Include some basic documentation for your module so people know what it does and how to use it.

#. Once merged into ``master`` you should update the submodule in the ``vsn-pipelines`` repo to point to the correct branch

.. code:: bash
git submodule set-branch --default src/harmony
#. Finally, add your new and updated files alongside the updated ``.gitmodules`` file and ``src/harmony`` files to a new commit and submit a pull request on the ``vsn-pipelines`` repo to have your new module integrated.
The pull request will be reviewed and accepted once it is confirmed to be working. Once the ``develop`` branch is merged into ``master``, the new tool will be part of the new release of VSN Pipelines!

Repository structure
--------------------
Expand Down
28 changes: 14 additions & 14 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,14 @@ Finally run the pipeline,
Set the seed
------------
Some steps in the pipelines are nondeterministic. In order to have reproducible results, a seed is set by default to:
Some steps in the pipelines are non-deterministic. In order to have reproducible results, a seed is set by default to:

.. code:: groovy
workflow.manifest.version.replaceAll("\\.","").toInteger()
The seed is a number derived from the the version of the pipeline used at the time of the analysis run.
To override the seed (integer) you have edit the nextflow.config file with:
The seed is a number derived from the version of the pipeline used at the time of the analysis run.
To override the seed (integer) you have edit the ``nextflow.config`` file with:

.. code:: groovy
Expand Down Expand Up @@ -154,19 +154,19 @@ Two methods (``params.sc.cell_annotate.method``) are available:

If you have a single file containing the metadata information of all your samples, use ``aio`` method otherwise use ``obo``.

For both methods, here are the mandatory params to set:
For both methods, here are the mandatory parameters to set:

- ``off`` should be set to ``h5ad``
- ``method`` choose either ``obo`` or ``aio``
- ``annotationColumnNames`` is an array of columns names from ``cellMetaDataFilePath`` containing different annotation metadata to add.

If ``aio`` used, the following additional params are required:
If ``aio`` used, the following additional parameters are required:

- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 2 columns: a column containing all the cell IDs and an annotation column.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **can** have unique values; if it's not the case, it's important that the combination of the values from the ``indexColumnName`` and the ``sampleColumnName`` are unique.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sure that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.

If ``obo`` is used, the following params are required:
If ``obo`` is used, the following parameters are required:

- ``cellMetaDataFilePath``

Expand Down Expand Up @@ -267,7 +267,7 @@ Two methods (``params.sc.cell_filter.method``) are available:

If you have a single file containing the metadata information of all your samples, use ``external`` method otherwise use ``internal``.

For both methods, here are the mandatory params to set:
For both methods, here are the mandatory parameters to set:

- ``off`` should be set to ``h5ad``
- ``method`` choose either ``internal`` or ``external``
Expand All @@ -276,20 +276,20 @@ For both methods, here are the mandatory params to set:
- ``id`` is a short identifier for the filter
- ``valuesToKeepFromFilterColumn`` is array of values from the ``filterColumnName`` that should be kept (other values will be filtered out).

If ``internal`` used, the following additional params are required:
If ``internal`` used, the following additional parameters are required:

- ``filters`` is a List of Maps where each Map is required to have the following parameters:

- ``sampleColumnName`` is the column name containing the sample ID/name information. It should exist in the ``obs`` column attribute of the h5ad.
- ``filterColumnName`` is the column name that will be used to filter out cells. It should exist in the ``obs`` column attribute of the h5ad.

If ``external`` used, the following additional params are required:
If ``external`` used, the following additional parameters are required:

- ``filters`` is a List of Maps where each Map is required to have the following parameters:

- ``cellMetaDataFilePath`` is a file path pointing to a single .tsv file (with header) with at least 3 columns: a column containing all the cell IDs, another containing the sample ID/name information, and a column to use for the filtering.
- ``indexColumnName`` is the column name from ``cellMetaDataFilePath`` containing the cell IDs information. This column **must** have unique values.
- `optional` ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sur that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- `optional` ``sampleColumnName`` is the column name from ``cellMetaDataFilePath`` containing the sample ID/name information. Make sure that the values from this column match the samples IDs inferred from the data files. To know how those are inferred, please read the `Input Data Formats`_ section.
- `optional` ``filterColumnName`` is the column name from ``cellMetaDataFilePath`` which be used to filter out cells.


Expand Down Expand Up @@ -348,8 +348,8 @@ If you want to apply custom parameters for some specific samples and have a "gen
}
}
Using this config, the param ``params.sc.scanpy.cellFilterMinNGenes`` will be applied with a threshold value of ``600`` to ``1k_pbmc_v2_chemistry``. The rest of the samples will use the value ``800`` to filter the cells having less than that number of genes.
This strategy can be applied to any other paramameter of the config.
Using this config, the parameter ``params.sc.scanpy.cellFilterMinNGenes`` will be applied with a threshold value of ``600`` to ``1k_pbmc_v2_chemistry``. The rest of the samples will use the value ``800`` to filter the cells having less than that number of genes.
This strategy can be applied to any other parameter of the config.


Parameter exploration
Expand Down Expand Up @@ -437,4 +437,4 @@ The following command, will create a Nextflow config which the pipeline will und
-profile min,[data-profile],scanpy_data_transformation,scanpy_normalization,[...],singularity > nextflow.config
- ``[data-profile]``: Can be one of the different possible data profiles e.g.: ``h5ad``
- ``[...]``: Can be other profiles like ``bbknn``, ``harmony``, ``pcacv``, ...
- ``[...]``: Can be other profiles like ``bbknn``, ``harmony``, ``pcacv``, ...
2 changes: 1 addition & 1 deletion docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,6 @@ The pipelines will generate 3 types of results in the output directory (`params.

- See the example output report from the 1k PBMC data `here <http://htmlpreview.github.io/?https://github.com/vib-singlecell-nf/vsn-pipelines/blob/master/notebooks/10x_PBMC.merged_report.html>`_

- ``pipeline_reports``: nextflow dag, execution, timeline, and trace reports
- ``pipeline_reports``: Nextflow dag, execution, timeline, and trace reports

If you would like to use the pipelines on a custom dataset, please see the `pipelines <./pipelines.html>`_ section below.
10 changes: 5 additions & 5 deletions docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This pipeline can be configured and run on custom data with a few steps.
The recommended method is to first run ``nextflow config ...`` to generate a complete config file (with the default parameters) in your working directory.
The tool-specific parameters, as well as Docker/Singularity profiles, are included when specifying the appropriate profiles to ``nextflow config``.

1. First, update to the latest pipeline version (this will update the nextflow cache of the repository, typically located in ``~/.nextflow/assets/vib-singlecell-nf/``)::
1. First, update to the latest pipeline version (this will update the Nextflow cache of the repository, typically located in ``~/.nextflow/assets/vib-singlecell-nf/``)::

nextflow pull vib-singlecell-nf/vsn-pipelines

Expand Down Expand Up @@ -502,14 +502,14 @@ The output is a loom file with the results embedded.
Utility Pipelines
*****************

Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perfom small incremental processing steps.
Contrary to the aformentioned pipelines, these are not end-to-end. They are used to perform small incremental processing steps.

**cell_annotate**
-----------------

Runs the ``cell_annotate`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section.
about this cell-based annotation feature please visit `Cell-based metadata annotation`_ section.

.. _`Cell-based metadata annotation`: https://vsn-pipelines.readthedocs.io/en/latest/features.html#cell-based-metadata-annotation

Expand Down Expand Up @@ -561,7 +561,7 @@ Now we can run it with the following command:

Runs the ``cell_annotate_filter`` workflow which will perform a cell-based annotation of the data using a set of provided .tsv metadata files following by a cell-based filtering.
We show a use case here below with 10x Genomics data were it will annotate different samples using the ``obo`` method. For more information
about this cell-based annotation feautre please visit `Cell-based metadata annotation`_ section and `Cell-based metadata filtering`_ section.
about this cell-based annotation feature please visit `Cell-based metadata annotation`_ section and `Cell-based metadata filtering`_ section.

.. _`Cell-based metadata filtering`: https://vsn-pipelines.readthedocs.io/en/latest/features.html#cell-based-metadata-filtering

Expand Down Expand Up @@ -752,7 +752,7 @@ In the generated .config file, make sure the ``file_paths`` parameter is set wit

- The ``suffix`` parameter is used to infer the sample name from the file paths (it is removed from the input file path to derive a sample name).

In case there are multiple .h5ad files that need to be processed with different suffixes, the multi-labelled strategy should be used to define the h5ad param::
In case there are multiple .h5ad files that need to be processed with different suffixes, the multi-labelled strategy should be used to define the h5ad parameter::

[...]
data {
Expand Down
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ manifest {
name = 'vib-singlecell-nf/vsn-pipelines'
description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
version = '0.23.0'
version = '0.24.0'
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!20.04.1' // with ! prefix, stop execution if current version does not match required version.
Expand Down
Loading

0 comments on commit 91e5724

Please sign in to comment.