Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc #4

Merged
merged 14 commits into from
Jan 25, 2024
34 changes: 9 additions & 25 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# phac-nml/iridanextexample: Contributing Guidelines

Hi there!
Many thanks for taking an interest in improving phac-nml/iridanextexample.
Many thanks for taking an interest in improving phac-nml/fetchdatairidanext.

We try to manage the required tasks for phac-nml/iridanextexample using GitHub issues, you probably came to this page when creating one.
We try to manage the required tasks for phac-nml/fetchdatairidanext using GitHub issues, you probably came to this page when creating one.
Please use the pre-filled template to save time.

## Contribution workflow

If you'd like to write some code for phac-nml/iridanextexample, the standard workflow is as follows:
If you'd like to write some code for phac-nml/fetchdatairidanext, the standard workflow is as follows:

1. Check that there isn't already an issue about your idea in the [phac-nml/iridanextexample issues](https://github.com/phac-nml/iridanextexample/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/iridanextexample repository](https://github.com/phac-nml/iridanextexample) to your GitHub account
1. Check that there isn't already an issue about your idea in the [phac-nml/fetchdatairidanext issues](https://github.com/phac-nml/fetchdatairidanext/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/fetchdatairidanext repository](https://github.com/phac-nml/fetchdatairidanext) to your GitHub account
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
Expand All @@ -27,7 +27,7 @@ There are typically two types of tests that run:

### Lint tests

`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
`phac-nml` has a [set of guidelines](https://github.com/phac-nml/pipeline-standards) which all pipelines must adhere to. These are a subset of the [nf-core set of guidelines](https://nf-co.re/developers/guidelines).
To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.

If any failures or warnings are encountered, please follow the listed URL for more documentation.
Expand All @@ -49,11 +49,11 @@ These tests are run both with the latest available version of `Nextflow` and als

## Getting help

For further information/help, please consult the [phac-nml/iridanextexample documentation](https://github.com/phac-nml/iridanextexample/).
For further information/help, please consult the [phac-nml/fetchdatairidanext documentation](https://github.com/phac-nml/fetchdatairidanext/).

## Pipeline contribution conventions

To make the phac-nml/iridanextexample code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
To make the phac-nml/fetchdatairidanext code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.

### Adding a new step

Expand All @@ -67,8 +67,7 @@ If you wish to contribute a new step, please use the following coding standards:
6. Add sanity checks and validation for all relevant parameters.
7. Perform local tests to validate that the new code works as expected.
8. If applicable, add a new test command in `.github/workflow/ci.yml`.
9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module.
10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`.
9. Add a description of the output files to `docs/output.md`.

### Default values

Expand Down Expand Up @@ -96,18 +95,3 @@ If you are using a new feature from core Nextflow, you may bump the minimum requ
### Images and figures

For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).

## GitHub Codespaces

This repo includes a devcontainer configuration which will create a GitHub Codespaces for Nextflow development! This is an online developer environment that runs in your browser, complete with VSCode and a terminal.

To get started:

- Open the repo in [Codespaces](https://github.com/phac-nml/iridanextexample/codespaces)
- Tools installed
- nf-core
- Nextflow

Devcontainer specs:

- [DevContainer config](.devcontainer/devcontainer.json)
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
contact_links:
- name: "GitHub"
url: https://github.com/phac-nml/iridanextexample
url: https://github.com/phac-nml/fetchdatairidanext
about: The GitHub page for development.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: Feature request
description: Suggest an idea for the phac-nml/iridanextexample pipeline
description: Suggest an idea for the phac-nml/fetchdatairidanext pipeline
labels: enhancement
body:
- type: textarea
Expand Down
8 changes: 4 additions & 4 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
<!--
# phac-nml/iridanextexample pull request
# phac-nml/fetchdatairidanext pull request

Many thanks for contributing to phac-nml/iridanextexample!
Many thanks for contributing to phac-nml/fetchdatairidanext!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release.

Learn more about contributing: [CONTRIBUTING.md](https://github.com/phac-nml/iridanextexample/tree/main/.github/CONTRIBUTING.md)
Learn more about contributing: [CONTRIBUTING.md](https://github.com/phac-nml/fetchdatairidanext/tree/main/.github/CONTRIBUTING.md)
-->

## PR checklist

- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/iridanextexample/tree/main/.github/CONTRIBUTING.md)
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/fetchdatairidanext/tree/main/.github/CONTRIBUTING.md)
- [ ] Make sure your code lints (`nf-core lint`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ jobs:
steps:
# PRs to the phac-nml repo main branch are only ok if coming from the phac-nml repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'phac-nml/iridanextexample'
if: github.repository == 'phac-nml/fetchdatairidanext'
run: |
{ [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/iridanextexample ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
{ [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/fetchdatairidanext ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]

# If the above check failed, post a comment on the PR explaining the failure
# NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
test:
name: Run pipeline with test data
# Only run on push if this is the phac-nml dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/iridanextexample') }}"
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/fetchdatairidanext') }}"
runs-on: ubuntu-latest
strategy:
matrix:
Expand Down
114 changes: 57 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,101 +2,98 @@

# fetchdatairidanext pipeline

This pipeline can be used to fetch data from NCBI for integration into IRIDA Next.
This pipeline can be used to fetch data from NCBI for integration into [IRIDA Next][irida-next].

# Input

The input to the pipeline is a standard sample sheet (passed as `--input samplesheet.csv`) that looks like:

| sample | ncbi_accession |
| ------- | -------------- |
| SampleA | ERR1109373 |
| SampleB | SRR13191702 |
| sample | insdc_accession |
| ------- | --------------- |
| SampleA | ERR1109373 |
| SampleB | SRR13191702 |

That is, there are two columns:

- **sample**: The sample identifier downloaded read data should be associated with.
- **insdc_accession**: The accession from the [International Sequence Data Collaboration (INSDC)][insdc] for the data to download (currently only sequence runs supported, e.g., starting with `SRR`, `ERR`, or `DRR`).

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). An example of this file is provided at [assets/samplesheet.csv](assets/samplesheet.csv).

# Parameters

The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run.
The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers (or `-profile docker` for docker) and `-r [branch]` to specify which GitHub branch you would like to run.

Other parameters (defaults from nf-core) are defined in [nextflow_schema.json](nextflow_schema.json).

# Running

To run the pipeline, please do:
## Test data

To run the pipeline with test data, please do:

```bash
nextflow run phac-nml/fetchdatairidanext -profile test,docker --outdir results
```

The downloaded data will appear in `results/`. A JSON file for integrating data with IRIDA Next will be written to `results/iridanext.output.json.gz` (see the [Output](#output) section for details).

## Other data

To run the pipeline with other data (a custom samplesheet), please do:

```bash
nextflow run phac-nml/fetchdatairidanext -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results
nextflow run phac-nml/fetchdatairidanext -profile docker --input assets/samplesheet.csv --outdir results
```

Where the `samplesheet.csv` is structured as specified in the [Input](#input) section.

# Output

A JSON file for loading metadata into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`).
## Read data

An example of the what the contents of the IRIDA Next JSON file looks like for this particular pipeline is as follows:
The sequence reads will appear in the `results/sratools/reads` directory (assuming `--outdir results` is specified). For example:

```
results/sratools/reads/
├── ERR1109373.fastq.gz
├── ERR1109373_1.fastq.gz
├── ERR1109373_2.fastq.gz
├── SRR13191702_1.fastq.gz
└── SRR13191702_2.fastq.gz
```

## IRIDA Next integration file

A JSON file for loading the data into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`).

```json
{
"files": {
"global": [
{
"path": "summary/summary.txt.gz"
}
],
"samples": {
"SAMPLE1": [
{
"path": "assembly/SAMPLE1.assembly.fa.gz"
}
],
"SAMPLE2": [
{
"path": "assembly/SAMPLE2.assembly.fa.gz"
}
],
"SAMPLE3": [
{
"path": "assembly/SAMPLE3.assembly.fa.gz"
}
]
}
},
"metadata": {
"samples": {
"SAMPLE1": {
"reads.1": "sample1_R1.fastq.gz",
"reads.2": "sample1_R2.fastq.gz"
},
"SAMPLE2": {
"reads.1": "sample2_R1.fastq.gz",
"reads.2": "sample2_R2.fastq.gz"
},
"SAMPLE3": {
"reads.1": "sample1_R1.fastq.gz",
"reads.2": "null"
}
}
"files": {
"global": [],
"samples": {
"SampleA": [
{ "path": "sratools/reads/SRR13191702_1.fastq.gz" },
{ "path": "sratools/reads/SRR13191702_2.fastq.gz" }
]
}
}
}
```

Within the `files` section of this JSON file, all of the output paths are relative to the `outdir`. Therefore, `"path": "assembly/SAMPLE1.assembly.fa.gz"` refers to a file located within `outdir/assembly/SAMPLE1.assembly.fa.gz`.
Within the `files` section of this JSON file, all of the output paths are relative to the `--outdir results`. Therefore, `"path": "sratools/reads/SRR13191702_1.fastq.gz"` refers to a file located within `results/sratools/reads/SRR13191702_1.fastq.gz`.

There is also a pipeline execution summary output file provided (specified in the above JSON as `"global": [{"path":"summary/summary.txt.gz"}]`). However, there is no formatting specification for this file.
An additional example of this file can be found at [tests/data/test1_iridanext.output.json](tests/data/test1_iridanext.output.json).

## Test profile
# Acknowledgements

To run with the test profile, please do:
This pipeline makes use of the following subworkflow from nf-core: [fastq_download_prefetch_fasterqdump_sratools](https://nf-co.re/subworkflows/fastq_download_prefetch_fasterqdump_sratools). Custom modifications to this workflow (and underlying modules) are found in the [subworkflows/local](subworkflows/local) and [modules/local](modules/local) directories.

```bash
nextflow run phac-nml/iridanextexample -profile docker,test -r main -latest --outdir results
```
Other works this pipeline makes use of are found in the [CITATIONS.md](CITATIONS.md) file.

# Legal

Copyright 2023 Government of Canada
Copyright 2024 Government of Canada

Licensed under the MIT License (the "License"); you may not use
this work except in compliance with the License. You may obtain a copy of the
Expand All @@ -108,3 +105,6 @@ Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

[irida-next]: https://github.com/phac-nml/irida-next
[insdc]: https://www.insdc.org/
11 changes: 5 additions & 6 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,13 @@
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"run_accession": {
"insdc_accession": {
"type": "string",
"pattern": "^(SRR|ERR)\\S+$",
"meta": ["run_accession"],
"unique": true,
"errorMessage": "Must provide a valid run accession"
"pattern": "^(SRR|ERR|DRR)\\S+$",
"meta": ["insdc_accession"],
"errorMessage": "Must provide a valid accession"
}
},
"required": ["sample", "run_accession"]
"required": ["sample", "insdc_accession"]
}
}
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# phac-nml/iridanextexample: Documentation
# phac-nml/fetchdatairidanext: Documentation

The phac-nml/iridanextexample documentation is split into the following pages:
The phac-nml/fetchdatairidanext documentation is split into the following pages:

- [Usage](usage.md)
- An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
Expand Down
Binary file removed docs/images/mqc_fastqc_adapter.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_counts.png
Binary file not shown.
Binary file removed docs/images/mqc_fastqc_quality.png
Binary file not shown.
Loading
Loading