phac-nml · apetkau · Jan 25, 2024 · Jan 25, 2024 · Jan 25, 2024 · Jan 25, 2024
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -1,17 +1,17 @@
 # phac-nml/iridanextexample: Contributing Guidelines
 
 Hi there!
-Many thanks for taking an interest in improving phac-nml/iridanextexample.
+Many thanks for taking an interest in improving phac-nml/fetchdatairidanext.
 
-We try to manage the required tasks for phac-nml/iridanextexample using GitHub issues, you probably came to this page when creating one.
+We try to manage the required tasks for phac-nml/fetchdatairidanext using GitHub issues, you probably came to this page when creating one.
 Please use the pre-filled template to save time.
 
 ## Contribution workflow
 
-If you'd like to write some code for phac-nml/iridanextexample, the standard workflow is as follows:
+If you'd like to write some code for phac-nml/fetchdatairidanext, the standard workflow is as follows:
 
-1. Check that there isn't already an issue about your idea in the [phac-nml/iridanextexample issues](https://github.com/phac-nml/iridanextexample/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this
-2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/iridanextexample repository](https://github.com/phac-nml/iridanextexample) to your GitHub account
+1. Check that there isn't already an issue about your idea in the [phac-nml/fetchdatairidanext issues](https://github.com/phac-nml/fetchdatairidanext/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this
+2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/fetchdatairidanext repository](https://github.com/phac-nml/fetchdatairidanext) to your GitHub account
 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
 4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
@@ -27,7 +27,7 @@ There are typically two types of tests that run:
 
 ### Lint tests
 
-`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
+`phac-nml` has a [set of guidelines](https://github.com/phac-nml/pipeline-standards) which all pipelines must adhere to. These are a subset of the [nf-core set of guidelines](https://nf-co.re/developers/guidelines).
 To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
 
 If any failures or warnings are encountered, please follow the listed URL for more documentation.
@@ -49,11 +49,11 @@ These tests are run both with the latest available version of `Nextflow` and als
 
 ## Getting help
 
-For further information/help, please consult the [phac-nml/iridanextexample documentation](https://github.com/phac-nml/iridanextexample/).
+For further information/help, please consult the [phac-nml/fetchdatairidanext documentation](https://github.com/phac-nml/fetchdatairidanext/).
 
 ## Pipeline contribution conventions
 
-To make the phac-nml/iridanextexample code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
+To make the phac-nml/fetchdatairidanext code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
 
 ### Adding a new step
 
@@ -67,8 +67,7 @@ If you wish to contribute a new step, please use the following coding standards:
 6. Add sanity checks and validation for all relevant parameters.
 7. Perform local tests to validate that the new code works as expected.
 8. If applicable, add a new test command in `.github/workflow/ci.yml`.
-9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module.
-10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`.
+9. Add a description of the output files to `docs/output.md`.
 
 ### Default values
 
@@ -96,18 +95,3 @@ If you are using a new feature from core Nextflow, you may bump the minimum requ
 ### Images and figures
 
 For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).
-
-## GitHub Codespaces
-
-This repo includes a devcontainer configuration which will create a GitHub Codespaces for Nextflow development! This is an online developer environment that runs in your browser, complete with VSCode and a terminal.
-
-To get started:
-
-- Open the repo in [Codespaces](https://github.com/phac-nml/iridanextexample/codespaces)
-- Tools installed
-  - nf-core
-  - Nextflow
-
-Devcontainer specs:
-
-- [DevContainer config](.devcontainer/devcontainer.json)
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -1,4 +1,4 @@
 contact_links:
   - name: "GitHub"
-    url: https://github.com/phac-nml/iridanextexample
+    url: https://github.com/phac-nml/fetchdatairidanext
     about: The GitHub page for development.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -1,5 +1,5 @@
 name: Feature request
-description: Suggest an idea for the phac-nml/iridanextexample pipeline
+description: Suggest an idea for the phac-nml/fetchdatairidanext pipeline
 labels: enhancement
 body:
   - type: textarea

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,21 +1,21 @@
 <!--
-# phac-nml/iridanextexample pull request
+# phac-nml/fetchdatairidanext pull request
 
-Many thanks for contributing to phac-nml/iridanextexample!
+Many thanks for contributing to phac-nml/fetchdatairidanext!
 
 Please fill in the appropriate checklist below (delete whatever is not relevant).
 These are the most common things requested on pull requests (PRs).
 
 Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release.
 
-Learn more about contributing: [CONTRIBUTING.md](https://github.com/phac-nml/iridanextexample/tree/main/.github/CONTRIBUTING.md)
+Learn more about contributing: [CONTRIBUTING.md](https://github.com/phac-nml/fetchdatairidanext/tree/main/.github/CONTRIBUTING.md)
 -->
 
 ## PR checklist
 
 - [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
-- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/iridanextexample/tree/main/.github/CONTRIBUTING.md)
+- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/fetchdatairidanext/tree/main/.github/CONTRIBUTING.md)
 - [ ] Make sure your code lints (`nf-core lint`).
 - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir <OUTDIR>`).
 - [ ] Usage Documentation in `docs/usage.md` is updated.

diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml
@@ -11,9 +11,9 @@ jobs:
     steps:
       # PRs to the phac-nml repo main branch are only ok if coming from the phac-nml repo `dev` or any `patch` branches
       - name: Check PRs
-        if: github.repository == 'phac-nml/iridanextexample'
+        if: github.repository == 'phac-nml/fetchdatairidanext'
         run: |
-          { [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/iridanextexample ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
+          { [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/fetchdatairidanext ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
 
       # If the above check failed, post a comment on the PR explaining the failure
       # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -19,7 +19,7 @@ jobs:
   test:
     name: Run pipeline with test data
     # Only run on push if this is the phac-nml dev branch (merged PRs)
-    if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/iridanextexample') }}"
+    if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/fetchdatairidanext') }}"
     runs-on: ubuntu-latest
     strategy:
       matrix:

diff --git a/README.md b/README.md
@@ -2,101 +2,98 @@
 
 # fetchdatairidanext pipeline
 
-This pipeline can be used to fetch data from NCBI for integration into IRIDA Next.
+This pipeline can be used to fetch data from NCBI for integration into [IRIDA Next][irida-next].
 
 # Input
 
 The input to the pipeline is a standard sample sheet (passed as `--input samplesheet.csv`) that looks like:
 
-| sample  | ncbi_accession |
-| ------- | -------------- |
-| SampleA | ERR1109373     |
-| SampleB | SRR13191702    |
+| sample  | insdc_accession |
+| ------- | --------------- |
+| SampleA | ERR1109373      |
+| SampleB | SRR13191702     |
+
+That is, there are two columns:
+
+- **sample**: The sample identifier downloaded read data should be associated with.
+- **insdc_accession**: The accession from the [International Sequence Data Collaboration (INSDC)][insdc] for the data to download (currently only sequence runs supported, e.g., starting with `SRR`, `ERR`, or `DRR`).
 
 The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). An example of this file is provided at [assets/samplesheet.csv](assets/samplesheet.csv).
 
 # Parameters
 
-The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run.
+The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers (or `-profile docker` for docker) and `-r [branch]` to specify which GitHub branch you would like to run.
 
 Other parameters (defaults from nf-core) are defined in [nextflow_schema.json](nextflow_schema.json).
 
 # Running
 
-To run the pipeline, please do:
+## Test data
+
+To run the pipeline with test data, please do:
+
+```bash
+nextflow run phac-nml/fetchdatairidanext -profile test,docker --outdir results
+```
+
+The downloaded data will appear in `results/`. A JSON file for integrating data with IRIDA Next will be written to `results/iridanext.output.json.gz` (see the [Output](#output) section for details).
+
+## Other data
+
+To run the pipeline with other data (a custom samplesheet), please do:
 
 ```bash
-nextflow run phac-nml/fetchdatairidanext -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results
+nextflow run phac-nml/fetchdatairidanext -profile docker --input assets/samplesheet.csv --outdir results
 ```
 
 Where the `samplesheet.csv` is structured as specified in the [Input](#input) section.
 
 # Output
 
-A JSON file for loading metadata into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`).
+## Read data
 
-An example of the what the contents of the IRIDA Next JSON file looks like for this particular pipeline is as follows:
+The sequence reads will appear in the `results/sratools/reads` directory (assuming `--outdir results` is specified). For example:
 
 ```
+results/sratools/reads/
+├── ERR1109373.fastq.gz
+├── ERR1109373_1.fastq.gz
+├── ERR1109373_2.fastq.gz
+├── SRR13191702_1.fastq.gz
+└── SRR13191702_2.fastq.gz
+```
+
+## IRIDA Next integration file
+
+A JSON file for loading the data into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`).
+
+```json
 {
-    "files": {
-        "global": [
-            {
-                "path": "summary/summary.txt.gz"
-            }
-        ],
-        "samples": {
-            "SAMPLE1": [
-                {
-                    "path": "assembly/SAMPLE1.assembly.fa.gz"
-                }
-            ],
-            "SAMPLE2": [
-                {
-                    "path": "assembly/SAMPLE2.assembly.fa.gz"
-                }
-            ],
-            "SAMPLE3": [
-                {
-                    "path": "assembly/SAMPLE3.assembly.fa.gz"
-                }
-            ]
-        }
-    },
-    "metadata": {
-        "samples": {
-            "SAMPLE1": {
-                "reads.1": "sample1_R1.fastq.gz",
-                "reads.2": "sample1_R2.fastq.gz"
-            },
-            "SAMPLE2": {
-                "reads.1": "sample2_R1.fastq.gz",
-                "reads.2": "sample2_R2.fastq.gz"
-            },
-            "SAMPLE3": {
-                "reads.1": "sample1_R1.fastq.gz",
-                "reads.2": "null"
-            }
-        }
+  "files": {
+    "global": [],
+    "samples": {
+      "SampleA": [
+        { "path": "sratools/reads/SRR13191702_1.fastq.gz" },
+        { "path": "sratools/reads/SRR13191702_2.fastq.gz" }
+      ]
     }
+  }
 }
 ```
 
-Within the `files` section of this JSON file, all of the output paths are relative to the `outdir`. Therefore, `"path": "assembly/SAMPLE1.assembly.fa.gz"` refers to a file located within `outdir/assembly/SAMPLE1.assembly.fa.gz`.
+Within the `files` section of this JSON file, all of the output paths are relative to the `--outdir results`. Therefore, `"path": "sratools/reads/SRR13191702_1.fastq.gz"` refers to a file located within `results/sratools/reads/SRR13191702_1.fastq.gz`.
 
-There is also a pipeline execution summary output file provided (specified in the above JSON as `"global": [{"path":"summary/summary.txt.gz"}]`). However, there is no formatting specification for this file.
+An additional example of this file can be found at [tests/data/test1_iridanext.output.json](tests/data/test1_iridanext.output.json).
 
-## Test profile
+# Acknowledgements
 
-To run with the test profile, please do:
+This pipeline makes use of the following subworkflow from nf-core: [fastq_download_prefetch_fasterqdump_sratools](https://nf-co.re/subworkflows/fastq_download_prefetch_fasterqdump_sratools). Custom modifications to this workflow (and underlying modules) are found in the [subworkflows/local](subworkflows/local) and [modules/local](modules/local) directories.
 
-```bash
-nextflow run phac-nml/iridanextexample -profile docker,test -r main -latest --outdir results
-```
+Other works this pipeline makes use of are found in the [CITATIONS.md](CITATIONS.md) file.
 
 # Legal
 
-Copyright 2023 Government of Canada
+Copyright 2024 Government of Canada
 
 Licensed under the MIT License (the "License"); you may not use
 this work except in compliance with the License. You may obtain a copy of the
@@ -108,3 +105,6 @@ Unless required by applicable law or agreed to in writing, software distributed
 under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
 CONDITIONS OF ANY KIND, either express or implied. See the License for the
 specific language governing permissions and limitations under the License.
+
+[irida-next]: https://github.com/phac-nml/irida-next
+[insdc]: https://www.insdc.org/
diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -14,14 +14,13 @@
                 "unique": true,
                 "errorMessage": "Sample name must be provided and cannot contain spaces"
             },
-            "run_accession": {
+            "insdc_accession": {
                 "type": "string",
-                "pattern": "^(SRR|ERR)\\S+$",
-                "meta": ["run_accession"],
-                "unique": true,
-                "errorMessage": "Must provide a valid run accession"
+                "pattern": "^(SRR|ERR|DRR)\\S+$",
+                "meta": ["insdc_accession"],
+                "errorMessage": "Must provide a valid accession"
             }
         },
-        "required": ["sample", "run_accession"]
+        "required": ["sample", "insdc_accession"]
     }
 }
diff --git a/docs/README.md b/docs/README.md
@@ -1,6 +1,6 @@
-# phac-nml/iridanextexample: Documentation
+# phac-nml/fetchdatairidanext: Documentation
 
-The phac-nml/iridanextexample documentation is split into the following pages:
+The phac-nml/fetchdatairidanext documentation is split into the following pages:
 
 - [Usage](usage.md)
   - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.

diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png
diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png
diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png