Merge pull request #95 from vib-singlecell-nf/develop

Develop Former-commit-id: 9236824
vib-singlecell-nf · Jan 17, 2020 · 29ab89d · 29ab89d
2 parents 1cb3f27 + d2e9462
commit 29ab89d
Show file tree

Hide file tree

Showing 30 changed files with 414 additions and 30 deletions.
diff --git a/.github/workflows/bbknn.yml b/.github/workflows/bbknn.yml
@@ -19,7 +19,7 @@ jobs:
         submodules: true
     - name: Install Nextflow
       run: |
-        export NXF_VER='19.10.0'
+        export NXF_VER='19.12.0-edge'
         wget -qO- get.nextflow.io | bash
         sudo mv nextflow /usr/local/bin/
     - name: Get sample data

diff --git a/.github/workflows/scenic.yml b/.github/workflows/scenic.yml
@@ -19,7 +19,7 @@ jobs:
         submodules: true
     - name: Install Nextflow
       run: |
-        export NXF_VER='19.10.0'
+        export NXF_VER='19.12.0-edge'
         wget -qO- get.nextflow.io | bash
         sudo mv nextflow /usr/local/bin/
     - name: Run scenic test

diff --git a/.github/workflows/single_sample.yml b/.github/workflows/single_sample.yml
@@ -19,7 +19,7 @@ jobs:
         submodules: true
     - name: Install Nextflow
       run: |
-        export NXF_VER='19.10.0'
+        export NXF_VER='19.12.0-edge'
         wget -qO- get.nextflow.io | bash
         sudo mv nextflow /usr/local/bin/
     - name: Get sample data

diff --git a/.github/workflows/single_sample_scenic.yml b/.github/workflows/single_sample_scenic.yml
@@ -19,7 +19,7 @@ jobs:
         submodules: true
     - name: Install Nextflow
       run: |
-        export NXF_VER='19.10.0'
+        export NXF_VER='19.12.0-edge'
         wget -qO- get.nextflow.io | bash
         sudo mv nextflow /usr/local/bin/
     - name: Get sample data

diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@
 *.pyc
 *.html
 *egg*
+.vscode
 .nextflow
 .nextflow*
 data

diff --git a/.gitmodules b/.gitmodules
@@ -31,3 +31,6 @@
 [submodule "src/flybaser"]
 	path = src/flybaser
 	url = https://github.com/vib-singlecell-nf/flybaser.git
+[submodule "src/pcacv"]
+	path = src/pcacv
+	url = https://github.com/vib-singlecell-nf/pcacv.git
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # vib-singlecell-nf
 
-[![Nextflow](https://img.shields.io/badge/nextflow-19.10.0-brightgreen.svg)](https://www.nextflow.io/)
+[![Nextflow](https://img.shields.io/badge/nextflow-19.12.0-brightgreen.svg)](https://www.nextflow.io/)
 
 A repository of pipelines for single-cell data in Nextflow DSL2.
 
@@ -203,13 +203,69 @@ cellranger_outs_dir_path = "/home/data/cellranger/Sample*/outs/"
 ```
 will recursively find all 10x samples in that directory.
 
-# Repository structure
+# Advanced
 
-## Root
+## Select the optimal number of principal components
+
+When generating the config using `nextflow config` (see above), add the `pcacv` profile.
+
+Remarks:
+- Make sure `nComps` config parameter (under `dim_reduction` > `pca`) is not set.
+- If `nPcs` is not set for t-SNE or UMAP config entries, then all the PCs from the PCA will be used in the computation.
+
+Currently, only the Scanpy related pipelines have this feature implemented.
+
+## Cell-based metadata annotation
+
+If you have (pre-computed) cell-based metadata and you'd like to add them as annotations, please read [cell-based metadata annotation](https://github.com/vib-singlecell-nf/vib-singlecell-nf/tree/develop/src/utils#cell-based-metadata-annotation).
+
+## Sample-based metadata annotation
+
+If you have sample-based metadata and you'd like to annotate the cells with these annotations, please read [sample-based metadata annotation](https://github.com/vib-singlecell-nf/vib-singlecell-nf/tree/develop/src/utils#sample-based-metadata-annotation).
+
+## Multi-sample parameters
+
+It's possible to define custom parameters for the different samples. It's as easy as defining a hashmap in groovy or a dictionary-like structure in Python.
+You'll just have to repeat the following structure for the parameters which you want to enable the multi-sample feature for:
+
+```
+params {
+    sc {
+        scanpy {
+         container = 'aertslab/sctx-scanpy:0.5.0'
+         filter {
+            report_ipynb = '/src/scanpy/bin/reports/sc_filter_qc_report.ipynb'
+            // Here we enable the multi-sample feature for the cellFilterMinNgenes parameter
+            cellFilterMinNGenes = [
+                '1k_pbmc_v2_chemistry': 600,
+                '1k_pbmc_v3_chemistry': 800
+            ]
+            // cellFilterMaxNGenes will be set to 4000 for all the samples
+            cellFilterMaxNGenes = 4000
+            // Here we again enable the multi-sample feature for the cellFilterMaxPercentMito parameter
+            cellFilterMaxPercentMito = [
+                '1k_pbmc_v2_chemistry': 0.15,
+                '1k_pbmc_v3_chemistry': 0.05
+            ]
+            // geneFilterMinNCells will be set to 3 for all the samples
+            geneFilterMinNCells = 3
+            iff = '10x_mtx'
+            off = 'h5ad'
+            outdir = 'out'
+         }
+    }
+}
+```
+
+# Development
+
+## Repository structure
+
+### Root
 The repository root contains a `main.nf` and associated `nextflow.config`.
 The root `main.nf` imports and calls sub-workflows defined in the modules.
 
-## Modules
+### Modules
 A "module" consists of a folder labeled with the tool name (Scanpy, SCENIC, utils, etc.), with subfolders for
 * `bin/` (scripts passed into the container)
 * `processes/` (where Nextflow processes are defined)
@@ -254,7 +310,7 @@ src/
         └── utils.nf
 ```
 
-## Workflows
+### Workflows
 
 Workflows (chains of nf processes) are defined in the module root folder (e.g. [src/Scanpy/bec_bbknn.nf](https://github.com/vib-singlecell-nf/vib-singlecell-nf/blob/module_refactor/src/scanpy/bec_bbknn.nf))
 Workflows import multiple processes and define the workflow by name:
@@ -274,7 +330,7 @@ workflow CELLRANGER {
 
 ```
 
-### Workflow imports
+#### Workflow imports
 Entire **sub-workflows** can also be imported in other workflows with one command (inheriting all of the process imports from the workflow definition):
 ```groovy
 include CELLRANGER from '../cellranger/main.nf' params(params)
@@ -296,7 +352,7 @@ workflow {
 ```
 
 
-## Parameters structure
+### Parameters structure
 Parameters are stored in a separate config file per workflow, plus the main `nextflow.config`. 
 These parameters are merged when starting the run using e.g.:
 ```groovy
@@ -362,8 +418,6 @@ params {
 
 ```
 
-# Development
-
 ## Module testing
 
 Modules and processes can be tested independently, you can find an example in `src/utils/main.test.nf`.

diff --git a/nextflow.config b/nextflow.config
@@ -6,7 +6,7 @@ manifest {
     version = '0.6.1'
     mainScript = 'main.nf'
     defaultBranch = 'master'
-    nextflowVersion = '!19.10.0' // with ! prefix, stop execution if current version does not match required version.
+    nextflowVersion = '!19.12.0-edge' // with ! prefix, stop execution if current version does not match required version.
 }
 
 params {
@@ -116,6 +116,11 @@ profiles {
         includeConfig 'src/sratoolkit/sratoolkit.config'
     }
 
+    // feature profiles
+    pcacv {
+        includeConfig 'src/pcacv/pcacv.config'
+    }
+
     // utility profiles
     utils_sample_annotate {
         includeConfig 'src/utils/conf/sample_annotate.config'

diff --git a/src/channels/singleend.nf b/src/channels/singleend.nf
@@ -9,7 +9,7 @@ def extractSample(path) {
 
 workflow getChannel {
 
-    get:
+    take:
         glob
 
     main:

diff --git a/src/channels/sra.nf b/src/channels/sra.nf
@@ -2,7 +2,7 @@ nextflow.preview.dsl=2
 
 workflow getChannel {
 
-    get:
+    take:
         // Expects sra Map [[id: "id1", samples: ["glob1", ...]], ...]
         sra
 

diff --git a/src/channels/tenx.nf b/src/channels/tenx.nf
@@ -8,7 +8,7 @@ def extractSample(path) {
 
 workflow getChannel {
 
-    get:
+    take:
         glob
 
     main:

diff --git a/src/edirect b/src/edirect
diff --git a/src/pcacv b/src/pcacv
diff --git a/src/scanpy b/src/scanpy
diff --git a/src/scenic b/src/scenic
diff --git a/src/sratoolkit b/src/sratoolkit
diff --git a/src/utils/README.md b/src/utils/README.md
@@ -1,6 +1,6 @@
-# Utils Module
+# Utils module
 
-## Cell-based Metadata Annotation
+## Cell-based metadata annotation
 
 The profile `utils_cell_annotate` should be added when generating the main config using `nextflow config`. This will add the following entry in the config:
 
@@ -24,3 +24,29 @@ Then, the following parameters should be updated to use the module feature:
 - `indexColumnName` is the column name from `cellMetaDataFilePath` containing the cell IDs information.
 - `sampleColumnName` is the column name from `cellMetaDataFilePath` containing the sample ID/name information.
 - `annotationColumnNames` is an array of columns names from `cellMetaDataFilePath` containing different annotation metadata to add.
+
+## Sample-based metadata annotation
+The profile `utils_sample_annotate` should be added when generating the main config using nextflow config. This will add the following entry in the config:
+
+```
+params {
+    sc {
+        sample_annotate {
+            iff = '10x_cellranger_mex'
+            off = 'h5ad' 
+            type = 'sample' 
+            metaDataFilePath = 'data/10x/1k_pbmc/metadata.tsv'
+        }
+    }
+}
+```
+Then, the following parameters should be updated to use the module feature:
+
+- `metaDataFilePath` is a TSV file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.
+
+| id  | chemistry | ... |
+| ------------- | ------------- | ------------- |
+| 1k_pbmc_v2_chemistry  | v2  | ... |
+| 1k_pbmc_v3_chemistry  | v3  | ... |
+
+Sample-annotating the samples using this system will allow any user to query all the annotation using the SCope portal. This is especially relevant when samples needs to be compared across specific annotations (check compare tab with SCope).
diff --git a/src/utils/bin/sc_h5ad_extract_metadata.py b/src/utils/bin/sc_h5ad_extract_metadata.py
@@ -0,0 +1,68 @@
+#!/usr/bin/env python3
+
+import os
+import sys
+import argparse
+import pandas as pd
+import scanpy as sc
+import numpy as np
+
+parser = argparse.ArgumentParser(description='')
+
+parser.add_argument(
+    "input",
+    type=argparse.FileType('r'),
+    help='The path to the input h5ad file '
+)
+
+parser.add_argument(
+    "output",
+    type=argparse.FileType('w'),
+    help='The path to the output containing cells IDs that will be used for applying the filter.'
+)
+
+parser.add_argument(
+    '-a', '--axis',
+    type=str,
+    dest="axis",
+    help='The axis defining the metadata which the given column_names will be extracted from. '
+)
+
+parser.add_argument(
+    '-c', '--column-name',
+    type=str,
+    action="append",
+    dest="column_names",
+    help=""
+)
+
+args = parser.parse_args()
+
+FILE_PATH_IN = args.input.name
+
+# I/O
+# Expects h5ad file
+try:
+    adata = sc.read_h5ad(filename=FILE_PATH_IN)
+except IOError:
+    raise Exception("Can only handle .h5ad files.")
+
+#
+# Extract the given column_names from the feature/observation-based metadata.
+#
+
+if args.axis == 'feature':
+    metadata = adata.var[args.column_names]
+elif args.axis == 'observation':
+    raise Exception("Extracting the observation-based metadata is currently not implemented.")
+else:
+    raise Exception(f"Cannot extract from the {args.axis}-based metadata.")
+
+# I/O
+metadata.to_csv(
+    path_or_buf=args.output,
+    sep='\t',
+    header=True,
+    columns=args.column_names,
+    index=False
+)
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,7 @@ @@
     *.pyc
     *.html
     *egg*
+    .vscode
     .nextflow
     .nextflow*
     data
@@ Expand Down @@
+11 −4		bin/dim_reduction/sc_dim_reduction.py
+2 −3		conf/base.config
+10 −3		processes/dim_reduction.nf
+1 −1		workflows/bec_bbknn.nf
+4 −4		workflows/bec_mnn_correct.nf
+1 −1		workflows/cluster_identification.nf
+2 −2		workflows/create_report.nf
+7 −4		workflows/dim_reduction.nf
+31 −0		workflows/dim_reduction_pca.nf
+1 −1		workflows/hvg_selection.nf
+1 −1		workflows/normalize_transform.nf
+5 −1		workflows/qc_filter.nf
+2 −2		main.nf
+5 −5		main.test.nf
+1 −1		workflows/aggregateMultiRuns.nf