Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Commit

Permalink
Merge pull request #263 from vib-singlecell-nf/develop
Browse files Browse the repository at this point in the history
Develop for v0.23.0

Former-commit-id: b5167f5
  • Loading branch information
cflerin authored Dec 3, 2020
2 parents 8b2845e + c1f0117 commit 0a585c2
Show file tree
Hide file tree
Showing 25 changed files with 553 additions and 11 deletions.
21 changes: 21 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,27 @@
path = src/celda
url = https://github.com/vib-singlecell-nf/celda.git
branch = develop
[submodule "src/sinto"]
path = src/sinto
url = https://github.com/vib-singlecell-nf/sinto.git
[submodule "src/bwamaptools"]
path = src/bwamaptools
url = https://github.com/vib-singlecell-nf/bwamaptools.git
[submodule "src/trimgalore"]
path = src/trimgalore
url = https://github.com/vib-singlecell-nf/trimgalore.git
[submodule "src/archr"]
path = src/archr
url = https://github.com/vib-singlecell-nf/archr.git
[submodule "src/bap"]
path = src/bap
url = https://github.com/vib-singlecell-nf/bap.git
[submodule "src/singlecelltoolkit"]
path = src/singlecelltoolkit
url = https://github.com/vib-singlecell-nf/singlecelltoolkit.git
[submodule "src/pycistopic"]
path = src/pycistopic
url = https://github.com/vib-singlecell-nf/pycistopic.git
[submodule "src/soupx"]
path = src/soupx
url = https://github.com/vib-singlecell-nf/soupx.git
Expand Down
12 changes: 10 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ A repository of pipelines for single-cell data analysis in Nextflow DSL2.
This main repo contains multiple workflows for analyzing single cell transcriptomics data, and depends on a number of tools, which are organized into submodules within the VIB-Singlecell-NF_ organization.
Currently available workflows are listed below.

If VSN-Pipelines is useful for your research, consider citing:

- VSN-Pipelines All Versions (latest): `10.5281/zenodo.3703108 <https://doi.org/10.5281/zenodo.3703108>`_.

Raw Data Processing Workflows
-----------------------------

Expand Down Expand Up @@ -104,13 +108,17 @@ Sample Aggregation Workflows
- |mnncorrect|


---
In addition, the pySCENIC_ implementation of the SCENIC_ workflow is integrated here and can be run in conjunction with any of the above workflows.
The output of each of the main workflows is a loom_-format file, which is ready for import into the interactive single-cell web visualization tool SCope_.
In addition, data is also output in h5ad format, and reports are generated for the major pipeline steps.

If VSN-Pipelines is useful for your research, consider citing:
scATAC-seq workflows
--------------------

- VSN-Pipelines All Versions (latest): `10.5281/zenodo.3703108 <https://doi.org/10.5281/zenodo.3703108>`_.
Single cell ATAC-seq processing steps are now included in VSN Pipelines.
Currently, a preprocesing workflow is available, which will take fastq inputs, apply barcode correction, read trimming, bwa mapping, and output bam and fragments files for further downstream analysis.
See `here <https://vsn-pipelines.readthedocs.io/en/latest/scatac-seq.html>`_ for complete documentation.


.. |VSN-Pipelines| image:: https://img.shields.io/github/v/release/vib-singlecell-nf/vsn-pipelines
Expand Down
13 changes: 13 additions & 0 deletions conf/atac/preprocess.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
params {
data {
atac_preprocess {
metadata = 'metadata.tsv'
}
}
}

includeConfig './../../src/singlecelltoolkit/singlecelltoolkit.config'
includeConfig './../../src/trimgalore/trimgalore.config'
includeConfig './../../src/bwamaptools/bwamaptools.config'
includeConfig './../../src/sinto/sinto.config'
includeConfig './../../src/bap/bap.config'
3 changes: 3 additions & 0 deletions conf/atac/qc_filtering.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
includeConfig './../../src/archr/archr.config'
includeConfig './../../src/pycistopic/pycistopic.config'

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
features
case-studies
development
scatac-seq

.. include:: ../README.rst

Expand Down
202 changes: 202 additions & 0 deletions docs/scatac-seq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
scATAC-seq Pipelines
====================

----

scATAC-seq preprocessing
************************

This pipeline takes fastq files from paired end single cell ATAC-seq, and applies preprocessing steps to align the reads to a reference genome, and produce a bam file and scATAC-seq fragments file.
The full steps are:

- Barcode correction:

* For 'standard' and 'multiome' samples (e.g. 10x Genomics) correction is performed against a whitelist by `this script <https://github.com/aertslab/single_cell_toolkit/blob/master/correct_barcode_in_fastq.sh>`_.
* For 'biorad' samples, barcode correction is performed by `BAP <https://github.com/caleblareau/bap>`_.

- Debarcoding: Add the barcode sequence to the beginning of the fastq sequence identifier
- Read/adapter trimming
- Mapping to a reference genome:

* ``bwa mem`` is used with default parameters.
* Duplicates are marked with ``samtools markdup``.
* Droplet barcodes are included in the BAM file with the ``CR`` tag (by default). No barcode correction is performed.

- A fragments file is created using `Sinto <https://github.com/timoast/sinto>`_.

Input
-----

The input to this pipeline is a (tab-delimited) metadata table with the sample ID, sequencing technology, and locations of the fastq files:

.. list-table:: Metadata Table
:widths: 10 10 10 10 10
:header-rows: 1

* - sample_name
- technology
- fastq_PE1_path
- fastq_barcode_path
- fastq_PE2_path
* - sample_1
- standard
- sample_1_R1.fastq.gz
- sample_1_R2.fastq.gz
- sample_1_R3.fastq.gz
* - sample_2
- multiome
- sample_2_R1.fastq.gz
- sample_2_R2.fastq.gz
- sample_2_R3.fastq.gz
* - sample_3
- biorad
- sample_3_R1.fastq.gz
-
- sample_3_R3.fastq.gz

The columns represent:

- ``sample_name`` Sample name for labeling the sample in the pipeline and output files. This can be any arbitrary string.
- ``technology``: This described the barcode correction and processing methods to use for the fastq files. Current options are ``standard``, ``multiome``, or ``biorad``. See below for additional details.
- ``fastq_PE1_path``: The full path to the fastq file for the first read in a pair.
- ``fastq_barcode_path``: The full path to the fastq file containing the barcodes. This column can be blank/empty depending on the technology setting.
- ``fastq_PE2_path``: The full path to the fastq file for the second read in a pair.

Technology
----------

This controls how both barcode correction and debarcoding is applied to the input fastq files.
Available options are:

``standard``
____________

The ``standard`` setting assumes a typical 10x Genomics style format with two read pair fastqs and a barcode fastq:

.. code:: none
$ zcat sample_1_R1.fastq.gz | head -n 4
@A00311:74:HMLK5DMXX:1:1101:2013:1000 1:N:0:ACTCAGAC
NTTGTCTCAGCACCCCCCGACATGGATTCAGGCTGTCTCTTATACACATC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
$ zcat sample_1_R2.fastq.gz | head -n 4
@A00311:74:HMLK5DMXX:1:1101:2013:1000 2:N:0:ACTCAGAC
CTGTTCGCAAAGCATA
+
F:FFFFFFFFFFFFFF
$ zcat sample_1_R3.fastq.gz | head -n 4
@A00311:74:HMLK5DMXX:1:1101:2013:1000 3:N:0:ACTCAGAC
CCTGAATCCATGTCGGGGGGTGCTGAGACAAGCTGTCTCTTATACACAT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The debarcoding step here uses a
`helper script <https://github.com/aertslab/single_cell_toolkit/blob/master/debarcode_10x_scatac_fastqs.sh>`_
which transforms this input into two paired fastq files with the barcode integrated into the read name:

.. code:: none
$ zcat sample_1_dex_R1.fastq.gz | head -n 4
@CTGTTCGCAAAGCATA:A00311:74:HMLK5DMXX:1:1101:2013:1000 1:N:0:ACTCAGAC
NTTGTCTCAGCACCCCCCGACATGGATTCAGGCTGTCTCTTATACACATC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
$ zcat sample_1_dex_R2.fastq.gz | head -n 4
@CTGTTCGCAAAGCATA:A00311:74:HMLK5DMXX:1:1101:2013:1000 3:N:0:ACTCAGAC
CCTGAATCCATGTCGGGGGGTGCTGAGACAAGCTGTCTCTTATACACAT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
``multiome``
____________

The ``multiome`` setting works the same as ``standard`` with the exception of the whitelist used for barcode correction.
The whitelists are supplied in the params file (``params.tools.singlecelltoolkit.barcode_correction.whitelist``).


``biorad``
__________

The ``biorad`` setting processes BioRad data using `BAP <https://github.com/caleblareau/bap/wiki/Working-with-BioRad-data>`_.
This takes input data:

.. code:: none
$ zcat sample_2_R1.fastq.gz | head -n 4
@NB551608:167:HNYFJBGXC:1:11101:11281:1033 1:N:0:TAAGGCGA
GCGTANACGTATGCATGACGGAAGTTAGTCACTGAGTCAGCAATCGTCGGCAGCGTCAGATGAGTNTAAGAGACAGGGTCAGGATGCGAGATTGACGGCTGCAATAACTAATAGGAAC
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEE<EEEE6EA#6E<66AAEEEEEAEEEEEEEEEEEEAEEAEEEEEEEEE<EEEEEEEEEEE/E
$ zcat sample_2_R2.fastq.gz | head -n 4
@NB551608:167:HNYFJBGXC:1:11101:11281:1033 2:N:0:TAAGGCGA
NNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
##A####################################
And produces paired fastq files with the barcode integrated into the read name (with a ``_`` delimiter):

.. code:: none
$ zcat sample_2_dex_R1.fastq.gz | head -n 4
@GCGTAGAGGAAGTTTCAGCAA_NB551608:167:HNYFJBGXC:1:11101:11281:1033 1:N:0:TAAGGCGA
GGTCAGGATGCGAGATTGACGGCTGCAATAACTAATAGGAAC
+
EEAEEEEEEEEEEEEAEEAEEEEEEEEE<EEEEEEEEEEE/E
$ zcat sample_2_dex_R2.fastq.gz | head -n 4
@GCGTAGAGGAAGTTTCAGCAA_NB551608:167:HNYFJBGXC:1:11101:11281:1033 2:N:0:TAAGGCGA
NNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
##A####################################
Running the workflow
--------------------

To generate a config file, use the ``atac_preprocess`` profile along with ``docker`` or ``singularity``.
Note that the full path to ``vib-singlecell-nf/vsn-pipelines/main_atac.nf`` must be used:

.. code:: bash
nextflow config \
vib-singlecell-nf/vsn-pipelines/main_atac.nf \
-profile atac_preprocess,singularity \
> atac_preprocess.config
The ATAC-specific parameters are described here.
The important parameters to change are:

- ``params.data.atac_preprocess.metadata``: the path to the metadata file.
- ``params.tools.bwamaptools.bwa_fasta``: the path to the bwa reference fasta file. This should be already indexed with ``bwa index``, and the index files located in the same directory as the fasta file.
- ``params.tools.singlecelltoolkit.barcode_correction.whitelist``: Whitelists for barcode correction are supplied here. The whitelists are matched to samples based on the parameter key here ('standard', 'multiome') and the technology field listed for each sample in the metadata file.

Optional parameters to change:

- Within ``params.tools.bwamaptools.add_barcode_as_tag``:

- ``tag``: controls the naming of the barcode tag added to the bam (``CR`` by default).
- ``delimiter_to_split_qname``: Controls which delimiter to split the bam read name field to get the barcode. By default it uses the regex ``'[:|_]'`` to split on both ``:`` and ``|``.

- Within ``params.tools.sinto.fragments``:

- One of (but not both) ``barcodetag`` or ``barcode_regex`` needs to be set to tell Sinto where to find the barcodes in the bam file. The default is to use ``barcodetag`` of ``CR``.
- ``mapq``: Controls quality filtering settings for generating the fragments file. Discards reads with quality score lower than this number (default 30).
- ``temp_dir``: Controls where temp files are stored during fragments processing. For large BAM files, the system default temp location may become full. An alternate temp path can be specified here. Be sure to also include this temp path in the global volume mounts for Docker/Singularity in the config file.


After configuring, the workflow can be run with:

.. code:: bash
nextflow -C atac_preprocess.config run \
vib-singlecell-nf/vsn-pipelines/main_atac.nf \
-entry atac_preprocess -resume
----
38 changes: 38 additions & 0 deletions main_atac.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ nextflow.preview.dsl=2
include {
INIT;
} from './src/utils/workflows/utils' params(params)

INIT(params)

include {
SC__FILE_CONVERTER;
} from './src/utils/processes/utils' params(params)
Expand Down Expand Up @@ -45,3 +47,39 @@ workflow cistopic {

}


workflow atac_preprocess {

// generic ATAC-seq preprocessing pipeline: adapter trimming, mapping, fragments file generation
include {
ATAC_PREPROCESS_WITH_METADATA;
} from './workflows/atac/preprocess.nf' params(params)

ATAC_PREPROCESS_WITH_METADATA(file(params.data.atac_preprocess.metadata))

}

workflow atac_qc_filtering {

include {
ATAC_QC_PREFILTER;
} from './workflows/atac/qc_filtering.nf' params(params)

getDataChannel | ATAC_QC_PREFILTER

}

workflow atac_preprocess_freemuxlet {

// generic ATAC-seq preprocessing pipeline: adapter trimming, mapping, fragments file generation
include {
ATAC_PREPROCESS_WITH_METADATA;
} from './workflows/atac/preprocess.nf' params(params)
include {
freemuxlet as FREEMUXLET;
} from './workflows/popscle' params(params)

ATAC_PREPROCESS_WITH_METADATA(file(params.sc.atac.preprocess.metadata))
FREEMUXLET(ATAC_PREPROCESS_WITH_METADATA.out.bam)
}

21 changes: 17 additions & 4 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,12 @@ manifest {
name = 'vib-singlecell-nf/vsn-pipelines'
description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
version = '0.22.0'
version = '0.23.0'
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!20.04.1' // with ! prefix, stop execution if current version does not match required version.
}

params {
}

// load these configs first:
includeConfig 'conf/global.config'
includeConfig 'conf/compute_resources.config'
Expand Down Expand Up @@ -281,6 +278,12 @@ profiles {
seurat_rds {
includeConfig 'src/channels/conf/seurat_rds.config'
}
fragments {
includeConfig 'src/channels/conf/fragments.config'
}
bam {
includeConfig 'src/channels/conf/bam.config'
}

// metadata profiles:

Expand Down Expand Up @@ -428,6 +431,16 @@ profiles {
cistopic {
includeConfig 'src/cistopic/cistopic.config'
}
atac_preprocess {
includeConfig 'conf/atac/preprocess.config'
}
atac_qc_filtering {
includeConfig 'conf/atac/qc_filtering.config'
}
atac_preprocess_freemuxlet {
includeConfig 'conf/atac/preprocess.config'
includeConfig 'src/popscle/popscle.config'
}


/*
Expand Down
1 change: 1 addition & 0 deletions src/archr
Submodule archr added at b88c42
1 change: 1 addition & 0 deletions src/bap
Submodule bap added at 0d53da
1 change: 1 addition & 0 deletions src/bwamaptools
Submodule bwamaptools added at 4bd81d
2 changes: 1 addition & 1 deletion src/celda
Submodule celda updated 0 files
Loading

0 comments on commit 0a585c2

Please sign in to comment.