Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Merge pull request #95 from vib-singlecell-nf/develop
Browse files Browse the repository at this point in the history
Develop

Former-commit-id: 9236824
  • Loading branch information
dweemx authored Jan 17, 2020
2 parents 1cb3f27 + d2e9462 commit 29ab89d
Show file tree
Hide file tree
Showing 30 changed files with 414 additions and 30 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/bbknn.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
submodules: true
- name: Install Nextflow
run: |
export NXF_VER='19.10.0'
export NXF_VER='19.12.0-edge'
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Get sample data
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/scenic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
submodules: true
- name: Install Nextflow
run: |
export NXF_VER='19.10.0'
export NXF_VER='19.12.0-edge'
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run scenic test
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/single_sample.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
submodules: true
- name: Install Nextflow
run: |
export NXF_VER='19.10.0'
export NXF_VER='19.12.0-edge'
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Get sample data
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/single_sample_scenic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
submodules: true
- name: Install Nextflow
run: |
export NXF_VER='19.10.0'
export NXF_VER='19.12.0-edge'
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Get sample data
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
*.pyc
*.html
*egg*
.vscode
.nextflow
.nextflow*
data
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@
[submodule "src/flybaser"]
path = src/flybaser
url = https://github.com/vib-singlecell-nf/flybaser.git
[submodule "src/pcacv"]
path = src/pcacv
url = https://github.com/vib-singlecell-nf/pcacv.git
72 changes: 63 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# vib-singlecell-nf

[![Nextflow](https://img.shields.io/badge/nextflow-19.10.0-brightgreen.svg)](https://www.nextflow.io/)
[![Nextflow](https://img.shields.io/badge/nextflow-19.12.0-brightgreen.svg)](https://www.nextflow.io/)

A repository of pipelines for single-cell data in Nextflow DSL2.

Expand Down Expand Up @@ -203,13 +203,69 @@ cellranger_outs_dir_path = "/home/data/cellranger/Sample*/outs/"
```
will recursively find all 10x samples in that directory.

# Repository structure
# Advanced

## Root
## Select the optimal number of principal components

When generating the config using `nextflow config` (see above), add the `pcacv` profile.

Remarks:
- Make sure `nComps` config parameter (under `dim_reduction` > `pca`) is not set.
- If `nPcs` is not set for t-SNE or UMAP config entries, then all the PCs from the PCA will be used in the computation.

Currently, only the Scanpy related pipelines have this feature implemented.

## Cell-based metadata annotation

If you have (pre-computed) cell-based metadata and you'd like to add them as annotations, please read [cell-based metadata annotation](https://github.com/vib-singlecell-nf/vib-singlecell-nf/tree/develop/src/utils#cell-based-metadata-annotation).

## Sample-based metadata annotation

If you have sample-based metadata and you'd like to annotate the cells with these annotations, please read [sample-based metadata annotation](https://github.com/vib-singlecell-nf/vib-singlecell-nf/tree/develop/src/utils#sample-based-metadata-annotation).

## Multi-sample parameters

It's possible to define custom parameters for the different samples. It's as easy as defining a hashmap in groovy or a dictionary-like structure in Python.
You'll just have to repeat the following structure for the parameters which you want to enable the multi-sample feature for:

```
params {
sc {
scanpy {
container = 'aertslab/sctx-scanpy:0.5.0'
filter {
report_ipynb = '/src/scanpy/bin/reports/sc_filter_qc_report.ipynb'
// Here we enable the multi-sample feature for the cellFilterMinNgenes parameter
cellFilterMinNGenes = [
'1k_pbmc_v2_chemistry': 600,
'1k_pbmc_v3_chemistry': 800
]
// cellFilterMaxNGenes will be set to 4000 for all the samples
cellFilterMaxNGenes = 4000
// Here we again enable the multi-sample feature for the cellFilterMaxPercentMito parameter
cellFilterMaxPercentMito = [
'1k_pbmc_v2_chemistry': 0.15,
'1k_pbmc_v3_chemistry': 0.05
]
// geneFilterMinNCells will be set to 3 for all the samples
geneFilterMinNCells = 3
iff = '10x_mtx'
off = 'h5ad'
outdir = 'out'
}
}
}
```

# Development

## Repository structure

### Root
The repository root contains a `main.nf` and associated `nextflow.config`.
The root `main.nf` imports and calls sub-workflows defined in the modules.

## Modules
### Modules
A "module" consists of a folder labeled with the tool name (Scanpy, SCENIC, utils, etc.), with subfolders for
* `bin/` (scripts passed into the container)
* `processes/` (where Nextflow processes are defined)
Expand Down Expand Up @@ -254,7 +310,7 @@ src/
└── utils.nf
```

## Workflows
### Workflows

Workflows (chains of nf processes) are defined in the module root folder (e.g. [src/Scanpy/bec_bbknn.nf](https://github.com/vib-singlecell-nf/vib-singlecell-nf/blob/module_refactor/src/scanpy/bec_bbknn.nf))
Workflows import multiple processes and define the workflow by name:
Expand All @@ -274,7 +330,7 @@ workflow CELLRANGER {
```

### Workflow imports
#### Workflow imports
Entire **sub-workflows** can also be imported in other workflows with one command (inheriting all of the process imports from the workflow definition):
```groovy
include CELLRANGER from '../cellranger/main.nf' params(params)
Expand All @@ -296,7 +352,7 @@ workflow {
```


## Parameters structure
### Parameters structure
Parameters are stored in a separate config file per workflow, plus the main `nextflow.config`.
These parameters are merged when starting the run using e.g.:
```groovy
Expand Down Expand Up @@ -362,8 +418,6 @@ params {
```

# Development

## Module testing

Modules and processes can be tested independently, you can find an example in `src/utils/main.test.nf`.
Expand Down
7 changes: 6 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ manifest {
version = '0.6.1'
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!19.10.0' // with ! prefix, stop execution if current version does not match required version.
nextflowVersion = '!19.12.0-edge' // with ! prefix, stop execution if current version does not match required version.
}

params {
Expand Down Expand Up @@ -116,6 +116,11 @@ profiles {
includeConfig 'src/sratoolkit/sratoolkit.config'
}

// feature profiles
pcacv {
includeConfig 'src/pcacv/pcacv.config'
}

// utility profiles
utils_sample_annotate {
includeConfig 'src/utils/conf/sample_annotate.config'
Expand Down
2 changes: 1 addition & 1 deletion src/channels/singleend.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def extractSample(path) {

workflow getChannel {

get:
take:
glob

main:
Expand Down
2 changes: 1 addition & 1 deletion src/channels/sra.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ nextflow.preview.dsl=2

workflow getChannel {

get:
take:
// Expects sra Map [[id: "id1", samples: ["glob1", ...]], ...]
sra

Expand Down
2 changes: 1 addition & 1 deletion src/channels/tenx.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def extractSample(path) {

workflow getChannel {

get:
take:
glob

main:
Expand Down
2 changes: 1 addition & 1 deletion src/edirect
1 change: 1 addition & 0 deletions src/pcacv
Submodule pcacv added at eb92c4
2 changes: 1 addition & 1 deletion src/scenic
2 changes: 1 addition & 1 deletion src/sratoolkit
Submodule sratoolkit updated 1 files
+26 −0 Dockerfile
30 changes: 28 additions & 2 deletions src/utils/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Utils Module
# Utils module

## Cell-based Metadata Annotation
## Cell-based metadata annotation

The profile `utils_cell_annotate` should be added when generating the main config using `nextflow config`. This will add the following entry in the config:

Expand All @@ -24,3 +24,29 @@ Then, the following parameters should be updated to use the module feature:
- `indexColumnName` is the column name from `cellMetaDataFilePath` containing the cell IDs information.
- `sampleColumnName` is the column name from `cellMetaDataFilePath` containing the sample ID/name information.
- `annotationColumnNames` is an array of columns names from `cellMetaDataFilePath` containing different annotation metadata to add.

## Sample-based metadata annotation
The profile `utils_sample_annotate` should be added when generating the main config using nextflow config. This will add the following entry in the config:

```
params {
sc {
sample_annotate {
iff = '10x_cellranger_mex'
off = 'h5ad'
type = 'sample'
metaDataFilePath = 'data/10x/1k_pbmc/metadata.tsv'
}
}
}
```
Then, the following parameters should be updated to use the module feature:

- `metaDataFilePath` is a TSV file (with header) with at least 2 columns where the first column need to match the sample IDs. Any other columns will be added as annotation in the final loom i.e.: all the cells related to their sample will get annotated with their given annotations.

| id | chemistry | ... |
| ------------- | ------------- | ------------- |
| 1k_pbmc_v2_chemistry | v2 | ... |
| 1k_pbmc_v3_chemistry | v3 | ... |

Sample-annotating the samples using this system will allow any user to query all the annotation using the SCope portal. This is especially relevant when samples needs to be compared across specific annotations (check compare tab with SCope).
68 changes: 68 additions & 0 deletions src/utils/bin/sc_h5ad_extract_metadata.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env python3

import os
import sys
import argparse
import pandas as pd
import scanpy as sc
import numpy as np

parser = argparse.ArgumentParser(description='')

parser.add_argument(
"input",
type=argparse.FileType('r'),
help='The path to the input h5ad file '
)

parser.add_argument(
"output",
type=argparse.FileType('w'),
help='The path to the output containing cells IDs that will be used for applying the filter.'
)

parser.add_argument(
'-a', '--axis',
type=str,
dest="axis",
help='The axis defining the metadata which the given column_names will be extracted from. '
)

parser.add_argument(
'-c', '--column-name',
type=str,
action="append",
dest="column_names",
help=""
)

args = parser.parse_args()

FILE_PATH_IN = args.input.name

# I/O
# Expects h5ad file
try:
adata = sc.read_h5ad(filename=FILE_PATH_IN)
except IOError:
raise Exception("Can only handle .h5ad files.")

#
# Extract the given column_names from the feature/observation-based metadata.
#

if args.axis == 'feature':
metadata = adata.var[args.column_names]
elif args.axis == 'observation':
raise Exception("Extracting the observation-based metadata is currently not implemented.")
else:
raise Exception(f"Cannot extract from the {args.axis}-based metadata.")

# I/O
metadata.to_csv(
path_or_buf=args.output,
sep='\t',
header=True,
columns=args.column_names,
index=False
)
Loading

0 comments on commit 29ab89d

Please sign in to comment.