diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index aecbd6e..e9013a8 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,17 +1,17 @@ # phac-nml/iridanextexample: Contributing Guidelines Hi there! -Many thanks for taking an interest in improving phac-nml/iridanextexample. +Many thanks for taking an interest in improving phac-nml/fetchdatairidanext. -We try to manage the required tasks for phac-nml/iridanextexample using GitHub issues, you probably came to this page when creating one. +We try to manage the required tasks for phac-nml/fetchdatairidanext using GitHub issues, you probably came to this page when creating one. Please use the pre-filled template to save time. ## Contribution workflow -If you'd like to write some code for phac-nml/iridanextexample, the standard workflow is as follows: +If you'd like to write some code for phac-nml/fetchdatairidanext, the standard workflow is as follows: -1. Check that there isn't already an issue about your idea in the [phac-nml/iridanextexample issues](https://github.com/phac-nml/iridanextexample/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this -2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/iridanextexample repository](https://github.com/phac-nml/iridanextexample) to your GitHub account +1. Check that there isn't already an issue about your idea in the [phac-nml/fetchdatairidanext issues](https://github.com/phac-nml/fetchdatairidanext/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this +2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [phac-nml/fetchdatairidanext repository](https://github.com/phac-nml/fetchdatairidanext) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) 4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged @@ -27,7 +27,7 @@ There are typically two types of tests that run: ### Lint tests -`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. +`phac-nml` has a [set of guidelines](https://github.com/phac-nml/pipeline-standards) which all pipelines must adhere to. These are a subset of the [nf-core set of guidelines](https://nf-co.re/developers/guidelines). To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -49,11 +49,11 @@ These tests are run both with the latest available version of `Nextflow` and als ## Getting help -For further information/help, please consult the [phac-nml/iridanextexample documentation](https://github.com/phac-nml/iridanextexample/). +For further information/help, please consult the [phac-nml/fetchdatairidanext documentation](https://github.com/phac-nml/fetchdatairidanext/). ## Pipeline contribution conventions -To make the phac-nml/iridanextexample code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. +To make the phac-nml/fetchdatairidanext code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. ### Adding a new step @@ -67,8 +67,7 @@ If you wish to contribute a new step, please use the following coding standards: 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. -9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module. -10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`. +9. Add a description of the output files to `docs/output.md`. ### Default values @@ -96,18 +95,3 @@ If you are using a new feature from core Nextflow, you may bump the minimum requ ### Images and figures For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). - -## GitHub Codespaces - -This repo includes a devcontainer configuration which will create a GitHub Codespaces for Nextflow development! This is an online developer environment that runs in your browser, complete with VSCode and a terminal. - -To get started: - -- Open the repo in [Codespaces](https://github.com/phac-nml/iridanextexample/codespaces) -- Tools installed - - nf-core - - Nextflow - -Devcontainer specs: - -- [DevContainer config](.devcontainer/devcontainer.json) diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml index f0c034e..2a8df94 100644 --- a/.github/ISSUE_TEMPLATE/config.yml +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -1,4 +1,4 @@ contact_links: - name: "GitHub" - url: https://github.com/phac-nml/iridanextexample + url: https://github.com/phac-nml/fetchdatairidanext about: The GitHub page for development. diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml index e67ccaa..0204616 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.yml +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -1,5 +1,5 @@ name: Feature request -description: Suggest an idea for the phac-nml/iridanextexample pipeline +description: Suggest an idea for the phac-nml/fetchdatairidanext pipeline labels: enhancement body: - type: textarea diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 63aed5f..146c253 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,21 +1,21 @@ ## PR checklist - [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/iridanextexample/tree/main/.github/CONTRIBUTING.md) +- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/phac-nml/fetchdatairidanext/tree/main/.github/CONTRIBUTING.md) - [ ] Make sure your code lints (`nf-core lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index d4ad0e4..d72ca17 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -11,9 +11,9 @@ jobs: steps: # PRs to the phac-nml repo main branch are only ok if coming from the phac-nml repo `dev` or any `patch` branches - name: Check PRs - if: github.repository == 'phac-nml/iridanextexample' + if: github.repository == 'phac-nml/fetchdatairidanext' run: | - { [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/iridanextexample ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/fetchdatairidanext ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] # If the above check failed, post a comment on the PR explaining the failure # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 03e1016..745d17e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -19,7 +19,7 @@ jobs: test: name: Run pipeline with test data # Only run on push if this is the phac-nml dev branch (merged PRs) - if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/iridanextexample') }}" + if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/fetchdatairidanext') }}" runs-on: ubuntu-latest strategy: matrix: diff --git a/README.md b/README.md index 176c4de..6abed32 100644 --- a/README.md +++ b/README.md @@ -2,101 +2,98 @@ # fetchdatairidanext pipeline -This pipeline can be used to fetch data from NCBI for integration into IRIDA Next. +This pipeline can be used to fetch data from NCBI for integration into [IRIDA Next][irida-next]. # Input The input to the pipeline is a standard sample sheet (passed as `--input samplesheet.csv`) that looks like: -| sample | ncbi_accession | -| ------- | -------------- | -| SampleA | ERR1109373 | -| SampleB | SRR13191702 | +| sample | insdc_accession | +| ------- | --------------- | +| SampleA | ERR1109373 | +| SampleB | SRR13191702 | + +That is, there are two columns: + +- **sample**: The sample identifier downloaded read data should be associated with. +- **insdc_accession**: The accession from the [International Sequence Data Collaboration (INSDC)][insdc] for the data to download (currently only sequence runs supported, e.g., starting with `SRR`, `ERR`, or `DRR`). The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). An example of this file is provided at [assets/samplesheet.csv](assets/samplesheet.csv). # Parameters -The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run. +The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers (or `-profile docker` for docker) and `-r [branch]` to specify which GitHub branch you would like to run. Other parameters (defaults from nf-core) are defined in [nextflow_schema.json](nextflow_schema.json). # Running -To run the pipeline, please do: +## Test data + +To run the pipeline with test data, please do: + +```bash +nextflow run phac-nml/fetchdatairidanext -profile test,docker --outdir results +``` + +The downloaded data will appear in `results/`. A JSON file for integrating data with IRIDA Next will be written to `results/iridanext.output.json.gz` (see the [Output](#output) section for details). + +## Other data + +To run the pipeline with other data (a custom samplesheet), please do: ```bash -nextflow run phac-nml/fetchdatairidanext -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results +nextflow run phac-nml/fetchdatairidanext -profile docker --input assets/samplesheet.csv --outdir results ``` Where the `samplesheet.csv` is structured as specified in the [Input](#input) section. # Output -A JSON file for loading metadata into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`). +## Read data -An example of the what the contents of the IRIDA Next JSON file looks like for this particular pipeline is as follows: +The sequence reads will appear in the `results/sratools/reads` directory (assuming `--outdir results` is specified). For example: ``` +results/sratools/reads/ +├── ERR1109373.fastq.gz +├── ERR1109373_1.fastq.gz +├── ERR1109373_2.fastq.gz +├── SRR13191702_1.fastq.gz +└── SRR13191702_2.fastq.gz +``` + +## IRIDA Next integration file + +A JSON file for loading the data into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our [Pipeline Standards for the IRIDA Next JSON](https://github.com/phac-nml/pipeline-standards#32-irida-next-json). This JSON file is written directly within the `--outdir` provided to the pipeline with the name `irida.output.json.gz` (ex: `[outdir]/irida.output.json.gz`). + +```json { - "files": { - "global": [ - { - "path": "summary/summary.txt.gz" - } - ], - "samples": { - "SAMPLE1": [ - { - "path": "assembly/SAMPLE1.assembly.fa.gz" - } - ], - "SAMPLE2": [ - { - "path": "assembly/SAMPLE2.assembly.fa.gz" - } - ], - "SAMPLE3": [ - { - "path": "assembly/SAMPLE3.assembly.fa.gz" - } - ] - } - }, - "metadata": { - "samples": { - "SAMPLE1": { - "reads.1": "sample1_R1.fastq.gz", - "reads.2": "sample1_R2.fastq.gz" - }, - "SAMPLE2": { - "reads.1": "sample2_R1.fastq.gz", - "reads.2": "sample2_R2.fastq.gz" - }, - "SAMPLE3": { - "reads.1": "sample1_R1.fastq.gz", - "reads.2": "null" - } - } + "files": { + "global": [], + "samples": { + "SampleA": [ + { "path": "sratools/reads/SRR13191702_1.fastq.gz" }, + { "path": "sratools/reads/SRR13191702_2.fastq.gz" } + ] } + } } ``` -Within the `files` section of this JSON file, all of the output paths are relative to the `outdir`. Therefore, `"path": "assembly/SAMPLE1.assembly.fa.gz"` refers to a file located within `outdir/assembly/SAMPLE1.assembly.fa.gz`. +Within the `files` section of this JSON file, all of the output paths are relative to the `--outdir results`. Therefore, `"path": "sratools/reads/SRR13191702_1.fastq.gz"` refers to a file located within `results/sratools/reads/SRR13191702_1.fastq.gz`. -There is also a pipeline execution summary output file provided (specified in the above JSON as `"global": [{"path":"summary/summary.txt.gz"}]`). However, there is no formatting specification for this file. +An additional example of this file can be found at [tests/data/test1_iridanext.output.json](tests/data/test1_iridanext.output.json). -## Test profile +# Acknowledgements -To run with the test profile, please do: +This pipeline makes use of the following subworkflow from nf-core: [fastq_download_prefetch_fasterqdump_sratools](https://nf-co.re/subworkflows/fastq_download_prefetch_fasterqdump_sratools). Custom modifications to this workflow (and underlying modules) are found in the [subworkflows/local](subworkflows/local) and [modules/local](modules/local) directories. -```bash -nextflow run phac-nml/iridanextexample -profile docker,test -r main -latest --outdir results -``` +Other works this pipeline makes use of are found in the [CITATIONS.md](CITATIONS.md) file. # Legal -Copyright 2023 Government of Canada +Copyright 2024 Government of Canada Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the @@ -108,3 +105,6 @@ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. + +[irida-next]: https://github.com/phac-nml/irida-next +[insdc]: https://www.insdc.org/ diff --git a/assets/schema_input.json b/assets/schema_input.json index c41a389..edcf572 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -14,14 +14,13 @@ "unique": true, "errorMessage": "Sample name must be provided and cannot contain spaces" }, - "run_accession": { + "insdc_accession": { "type": "string", - "pattern": "^(SRR|ERR)\\S+$", - "meta": ["run_accession"], - "unique": true, - "errorMessage": "Must provide a valid run accession" + "pattern": "^(SRR|ERR|DRR)\\S+$", + "meta": ["insdc_accession"], + "errorMessage": "Must provide a valid accession" } }, - "required": ["sample", "run_accession"] + "required": ["sample", "insdc_accession"] } } diff --git a/docs/README.md b/docs/README.md index db77372..67788e8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,6 @@ -# phac-nml/iridanextexample: Documentation +# phac-nml/fetchdatairidanext: Documentation -The phac-nml/iridanextexample documentation is split into the following pages: +The phac-nml/fetchdatairidanext documentation is split into the following pages: - [Usage](usage.md) - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png deleted file mode 100755 index 361d0e4..0000000 Binary files a/docs/images/mqc_fastqc_adapter.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png deleted file mode 100755 index cb39ebb..0000000 Binary files a/docs/images/mqc_fastqc_counts.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png deleted file mode 100755 index a4b89bf..0000000 Binary files a/docs/images/mqc_fastqc_quality.png and /dev/null differ diff --git a/docs/output.md b/docs/output.md index 69a1069..8365331 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,4 +1,4 @@ -# phac-nml/iridanextexample: Output +# phac-nml/fetchdatairidanext: Output ## Introduction @@ -6,11 +6,10 @@ This document describes the output produced by the pipeline. The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. -- assembly: very small mock assembly files for each sample -- generate: intermediate files used in generating the IRIDA Next JSON output -- pipeline_info: information about the pipeline's execution -- simplify: simplified intermediate files used in generating the IRIDA Next JSON output -- summary: summary report about the pipeline's execution and results +- `sratools`: Data from the SRA tools step (downloading sequence reads). + - `sratools/reads`: The fastq files of downloaded reads. +- `pipeline_info`: information about the pipeline's execution +- `custom`: information on detected/generated NCBI settings used for accessing certain databases (see ). The IRIDA Next-compliant JSON output file will be named `iridanext.output.json.gz` and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the [IRIDA Next JSON output specifications](https://github.com/phac-nml/pipeline-standards#42-irida-next-json). @@ -18,50 +17,30 @@ The IRIDA Next-compliant JSON output file will be named `iridanext.output.json.g The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -- [Assembly stub](#assembly-stub) - Performs a stub assembly by generating a mock assembly -- [Generate sample JSON](#generate-sample-json) - Generates a JSON file for each sample -- [Generate summary](#generate-summary) - Generates a summary text file describing the samples and assemblies -- [Simplify IRIDA JSON](#simplify-irida-json) - Simplifies the sample JSONs by limiting nesting depth -- [IRIDA Next Output](#irida-next-output) - Generates a JSON output file that is compliant with IRIDA Next +- [Reads download](#prefetch-fasterq) - Downloads data from INSDC databases (using NCBI's SRA Tools). - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- [IRIDA Next Output](#irida-next-output) - Generates a JSON output file that is compliant with IRIDA Next -### Assembly stub - -
-Output files - -- `assembly/` - - Mock assembly files: `ID.assembly.fa.gz` - -
- -### Generate sample JSON - -
-Output files - -- `generate/` - - JSON files: `ID.json.gz` - -
- -### Generate summary +### Reads download
Output files -- `summary/` - - Text summary describing samples and assemblies: `summary.txt.gz` +- `sratools/` + - Sequence data in SRA format: `INSDC_ACCESSION/INSDC_ACCESSION.sra` + - Reads in fastq format: `reads/INSDC_ACCESSION.fastq.gz`
-### Simplify IRIDA JSON +### Pipeline information
Output files -- `simplify/` - - Simplified JSON files: `ID.simple.json.gz` +- `pipeline_info/` + - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. + - Parameters used by the pipeline run: `params.json`.
@@ -75,17 +54,4 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d -### Pipeline information - -
-Output files - -- `pipeline_info/` - - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. - - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. - - Parameters used by the pipeline run: `params.json`. - -
- [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/run-wes-example.json b/docs/run-wes-example.json index 7807de1..519cf60 100644 --- a/docs/run-wes-example.json +++ b/docs/run-wes-example.json @@ -1,18 +1,19 @@ { "workflow_params": { - "--input": "[SAMPLESHEET]", - "-r": "main" + "--input": "az://samplesheet.csv", + "-r": "1.0.0" }, - "workflow_type": "DSL2", - "workflow_type_version": "22.10.7", + "workflow_type": "NFL", + "workflow_type_version": "DSL2", + "workflow_engine": "nextflow", + "workflow_engine_version": "23.10.0", "tags": { "createdBy": "TestUser", "group": "TestUserGroup" }, "workflow_engine_parameters": { - "engine": "nextflow", "execute_loc": "azure" }, - "workflow_url": "https://github.com/phac-nml/iridanextexample", + "workflow_url": "https://github.com/phac-nml/fetchdatairidanext", "workflow_attachment": "" } diff --git a/docs/usage.md b/docs/usage.md index 5563e59..a261f28 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,12 +1,12 @@ -# phac-nml/iridanextexample: Usage +# phac-nml/fetchdatairidanext: Usage ## Introduction -This pipeline is an example that illustrates running a nf-core-compliant pipeline on IRIDA Next. +This pipeline is used to download read data from INSDC databases. ## Samplesheet input -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with two columns, and a header row as shown in the examples below. ```bash --input '[path to samplesheet file]' @@ -14,22 +14,20 @@ You will need to create a samplesheet with information about the samples you wou ### Full samplesheet -The input samplesheet must contain three columns: `ID`, `fastq_1`, `fastq_2`. The IDs within a samplesheet should be unique. All other columns will be ignored. +The input samplesheet must contain two columns: `sample`, `insdc_accession`. The sample entries within a samplesheet should be unique. All other columns will be ignored. -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. +An example samplesheet is shown below: ```console -sample,fastq_1,fastq_2 -SAMPLE1,sample1_R1.fastq.gz,sample1_R2.fastq.gz -SAMPLE2,sample2_R1.fastq.gz,sample2_R2.fastq.gz -SAMPLE3,sample1_R1.fastq.gz, +sample,insdc_accession +SAMPLE1,ERR1109373 +SAMPLE2,SRR13191702 ``` -| Column | Description | -| --------- | -------------------------------------------------------------------------------------------------------------------------- | -| `sample` | Custom sample name. Samples should be unique within a samplesheet. | -| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | -| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| Column | Description | +| ----------------- | ------------------------------------------------------------------------------------------------------------ | +| `sample` | A sample name which will be associated with downloaded reads. Samples should be unique within a samplesheet. | +| `insdc_accession` | The accession (run accession) from one of the INSDC databases (NCBI, ENA, or DDBJ). | An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. @@ -38,10 +36,10 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p The typical command for running the pipeline is as follows: ```bash -nextflow run main.nf --input ./samplesheet.csv --outdir ./results -profile singularity +nextflow run phac-nml/fetchdatairidanext -profile test,docker --outdir results ``` -This will launch the pipeline with the `singularity` configuration profile. See below for more information about profiles. +This will launch the pipeline with the `docker` configuration profile (use `singularity` for singularity profile). See below for more information about profiles. Note that the pipeline will create the following files in your working directory: @@ -62,7 +60,7 @@ Do not use `-c ` to specify parameters as this will result in errors. Cust The above pipeline run specified with a params file in yaml format: ```bash -nextflow run phac-nml/iridanextexample -profile docker -params-file params.yaml +nextflow run phac-nml/fetchdatairidanext -profile docker -params-file params.yaml ``` with `params.yaml` containing: @@ -79,7 +77,7 @@ You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-c It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. -First, go to the [phac-nml/iridanextexample page](https://github.com/phac-nml/iridanextexample) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. +First, go to the [phac-nml/fetchdatairidanext page](https://github.com/phac-nml/fetchdatairidanext) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. diff --git a/workflows/fetchdatairidanext.nf b/workflows/fetchdatairidanext.nf index d3fae5d..74b5fcc 100644 --- a/workflows/fetchdatairidanext.nf +++ b/workflows/fetchdatairidanext.nf @@ -56,7 +56,7 @@ workflow FETCHDATAIRIDANEXT { // Create a new channel of metadata from a sample sheet // NB: `input` corresponds to `params.input` and associated sample sheet schema input = Channel.fromSamplesheet("input") - meta_accessions = input.map {meta -> tuple(["id": meta.id.first()], meta.run_accession.first())} + meta_accessions = input.map {meta -> tuple(["id": meta.id.first()], meta.insdc_accession.first())} FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS ( ch_sra_ids = meta_accessions,