Skip to content

Commit

Permalink
Merge pull request #88 from AustralianBioCommons/main
Browse files Browse the repository at this point in the history
 Update hic-scaffolding.md
  • Loading branch information
supernord authored Jun 26, 2024
2 parents 56cbcb5 + 023361e commit e88dd15
Show file tree
Hide file tree
Showing 8 changed files with 252 additions and 31 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## About

This site presents How-to Guides and other associated documentation that supports the reuse of bioinformatics tools, workflows and data on Australian compute systems and infrastructure.
This site presents a collection of step-by-step guides that support the reuse of Galaxy workflows created in collaboration with the Bioplatforms Australia Threatened Species Initiative (TSI). There are guides for genome assembly and quality control, RAD-seq analysis with Stacks, and more!
Other guides are available at the BioCommons [How-to Hub](https://australianbiocommons.github.io/how-to-hub), the central location for all guides and associated documents that have been prepared by community members who gather around BioCommons activities.

## Article template

See [`guide_template.md`](./about/guide_template.md)

## Acknowledgements

This work is supported by the [Australian BioCommons](https://www.biocommons.org.au/) via funding from [Bioplatforms Australia](https://bioplatforms.com/), the Australian Research Data Commons (https://doi.org/10.47486/PL105) and the Queensland Government RICF programme. Bioplatforms Australia and the Australian Research Data Commons are funded by the National Collaborative Research Infrastructure Strategy (NCRIS).
This work is supported by the [Australian BioCommons](https://www.biocommons.org.au/) via funding from [Bioplatforms Australia](https://bioplatforms.com/) and the Queensland Government RICF programme. Bioplatforms Australia is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

These guides were first developed as part of the [Australian BioCommons BYOD Expansion Project](https://www.biocommons.org.au/byo-data-platform-expansion), which was is funded through NCRIS investments from Bioplatforms Australia and the Australian Research Data Commons ([http://doi.org/10.47486/PL105]) that were matched by co-investments from AARNet, Melbourne Bioinformatics, NCI, Pawsey, QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University.

This repository makes use of the ELIXIR toolkit theme: [![theme badge](https://img.shields.io/badge/ELIXIR%20toolkit%20theme-jekyll-blue?color=0d6efd)](https://github.com/ELIXIR-Belgium/elixir-toolkit-theme)

Expand Down
25 changes: 13 additions & 12 deletions genome_annotation/Fgenesh.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: How-to Guide for genome annotation with FgenesH++.
affiliations: [University of Sydney, Australian BioCommons, Bioplatforms Australia, Galaxy Australia, Threatened Species Initiative]
---

[Galaxy Australia](https://usegalaxy.org.au/) is capable of conducting genome annotation using the FgenesH++ annotation tool.
[Galaxy Australia](https://usegalaxy.org.au/) is capable of conducting genome annotation using the FgenesH++ annotation tool. Users need to apply for access to this tool, please see [service notes here](https://www.biocommons.org.au/fgenesh-plus-plus) and apply for access [here](https://site.usegalaxy.org.au/request/access/fgenesh).

This How-to-Guide will describe the steps required to annotate your genome on the Galaxy Australia platform (see **Fig 1**), developed in consultations between the Bioplatforms Australia [Threatened Species Initiative](https://threatenedspeciesinitiative.com/), [Galaxy Australia](https://usegalaxy.org.au/), and the [Australian BioCommons](https://www.biocommons.org.au/).

Expand All @@ -16,22 +16,23 @@ If you need help, the Galaxy community is both approachable and helpful. [Ask th
## Quick start guide

1. [Login to Galaxy Australia](#register-and-login)
2. Create a new history
3. Upload your `assembled reference genome`, `repeat masked reference genome`, `.cdna`, `.pro` and `.dat` files from the [transcriptome workflow](Transcriptome)
4. Load and execute workflows, using required options
2. [Apply for access to FGenesH++](https://site.usegalaxy.org.au/request/access/fgenesh).
3. Create a new history
4. Upload your `assembled reference genome`, `repeat masked reference genome`, `.cdna`, `.pro` and `.dat` files from the [transcriptome workflow](Transcriptome)
5. Load and execute workflows, using required options
- [Open `FgenesH++ genome annotation` workflow](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=881)
5. Review workflow report and perform additional QC as needed
6. Re-run workflows, or individual tools, as needed
6. Review workflow report and perform additional QC as needed
7. Re-run workflows, or individual tools, as needed


## How to cite the workflow

> Silver, L. (2024). Fgenesh annotation -TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.881.1
> Silver, L. (2024). Fgenesh annotation -TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.881.4

## The overall workflow

{% include image.html file="/genome_annotation/Fig1.png" caption="Fig 1. The approach described in this How-to-Guide, including Quick Start guide steps 1) registration, 2) upload of input files, 3) FgenesH++ genome annotation Required workflow steps are blue, and optional steps are red." max-width="10" %}
{% include image.html file="/genome_annotation/Fig1-updated.png" caption="Fig 1. The approach described in this How-to-Guide, including Quick Start guide steps 1) registration, 2) upload of input files, 3) FgenesH++ genome annotation Required workflow steps are blue, and optional steps are red." max-width="10" %}

Further to this, a summary of the different elements of this assembly approach are detailed below:

Expand Down Expand Up @@ -63,7 +64,7 @@ Further to this, a summary of the different elements of this assembly approach a

{:start="3"}

3. Upload your assembled reference genome and masked reference genome (Link to repeat masking workflow), as well as the `.cdna`, `.pro` and `.dat` output from your [transcriptome assembly](Transcriptome)
3. Upload your assembled reference genome and masked reference genome (Link to repeat masking workflow), as well as the `.cdna`, `.pro` and `.dat` output from your [transcriptome assembly](Transcriptome). Note: it is recommended by Softberry that the genome is hard-masked rather than soft-masked.


### Run the annotation workflow
Expand All @@ -79,10 +80,10 @@ Further to this, a summary of the different elements of this assembly approach a
{:start="4"}

4. The workflow invocation window will open.
5. Select your reference genome fasta file (Step 1 in Fig 5),
6. Select your repeat masked reference genome fasta file (Step 2 in Fig 5).
5. Select your reference assembled genome fasta file (Fig 5).
6. Select your repeat masked reference genome fasta file (Fig 5).

{% include image.html file="/genome_annotation/Fig5.png" caption="Fig 5." max-width="10" %}
{% include image.html file="/genome_annotation/Fig5-updated.png" caption="Fig 5." max-width="10" %}

{:start="7"}

Expand Down
23 changes: 12 additions & 11 deletions genome_annotation/Transcriptome.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ If you need help, the Galaxy community is both approachable and helpful. [Ask th

## How to cite the workflows

> Silver, L., & Syme, A. (2024). Repeat masking - TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.875.2
> Silver, L., & Syme, A. (2024). Repeat masking - TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.875.3
> Silver, L., & Syme, A. (2024). QC and trimming of RNAseq reads - TSI. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.876.1
Expand All @@ -53,10 +53,10 @@ Further to this, a summary of the different elements of this alignment approach
| Process name | Workflow name | Description | Inputs | Outputs |
| ---------------- | ----------------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| UPLOAD FILES | Not applicable | See the [different upload options](#upload-data-files). | reference genome, Fastq mRNA | Uploaded data! |
| Repeat Masking | [Repeat masking - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=875) | Repeat masking of reference genome | Reference genome | FASTA file, Statistic file
| Repeat Masking | [Repeat masking - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=875) | Repeat masking of reference genome | Reference genome | FASTA files of hard-masked and soft-masked genomes, Statistic file
| RNA seq QC and trimming| [QC and trimming of RNAseq reads -TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=876) | Trimming of fastq files, including a fastqc step | Raw mRNA sequencing files | FASTQC report, Paired read FASTQ file |
| Align reads to find transcripts | [Find transcripts - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=877) | Alignment of trimmed FASTQ reads to masked reference genome | Repeat masked reference genome, paired trimmed FASTQ reads | BAM file, GTF file alignment metrics|
| Combine Transcripts | [Combine Transcripts - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=878) | Merges individual tissue transcripts to a global transcriptome and predicts coding sequences |GTF file, closely related species coding and non-coding sequences | GTF for global transcriptome, FASTA sequences of coding transcripts |
| Align reads to find transcripts | [Find transcripts - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=877) | Alignment of trimmed FASTQ reads to masked reference genome | (soft) repeat masked reference genome, paired trimmed FASTQ reads | BAM file, GTF file alignment metrics|
| Combine Transcripts | [Combine Transcripts - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=878) | Merges individual tissue transcripts to a global transcriptome and predicts coding sequences |GTF file, soft-masked genome, closely related species coding and non-coding sequences | GTF for global transcriptome, FASTA sequences of coding transcripts |
| Extract Longest Transcripts | [Extract Transcripts-TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=879) | Transdecoder predictions and filtering of transcripts | FASTA sequence of coding transcripts | pep.fasta, cds.fasta and gff3 file of longest isoform transcripts |
| Convert Outputs | [Convert formats - TSI](https://usegalaxy.org.au/workflows/trs_import?trs_server=workflowhub.eu&run_form=true&trs_id=880) | Converts outputs of transcdecoder to required inputs for FGenesH++ annotation | transdecoder-peptides.fasta, global_nucleotides.fasta |.cdna, .dat and .pro files |

Expand Down Expand Up @@ -116,7 +116,7 @@ Further to this, a summary of the different elements of this alignment approach
- Retrieve the workflows for `Align reads to find transcripts`
- Import into your Galaxy Australia workflows
2. Once you have reached the workflow screen, select the ```play``` button for Align reads to find transcripts (Fig 8)
3. Select the paired forward and paired reverse trimmed reads and masked reference genome as input (Fig9), ensure you select files tagged with `#fastq_out_r1_paired` and `#fastq_out_r2_paired`
3. Select the paired forward and paired reverse trimmed reads and soft-masked reference genome as input (Fig9), ensure you select files tagged with `#fastq_out_r1_paired` and `#fastq_out_r2_paired`
4. Check the mapping summary file for each tissue to make sure there are high mapping rates to the genome
5. Make a dataset collection containing gtf files for all tissue transcriptomes

Expand All @@ -137,14 +137,17 @@ Further to this, a summary of the different elements of this alignment approach
{:start="3"}

3. Search for your species on [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) to find the most closely related species which has an NCBI RefSeq annotation (Fig 11)

{% include image.html file="/transcriptome/Fig11.png" caption="Fig 11." max-width="10" %}

4. Go to the NCBI ftp server and locate the entry for this species (e.g. Corroborree frog RefSeq entry is GCF_028390025.1 and ftp entry is https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/028/390/025/)
5. Download the `_cds_from_genomic.fna.gz` and `pseudo_without_product.fna.gz` files to your local computer and upload into Galaxy (Fig 12)

{% include image.html file="/transcriptome/Fig12.png" caption="Fig 12." max-width="10" %}

{:start="6"}

6. Select the gtf collection, masked reference genome, coding sequences and pseudo coding sequences in the combine transcripts workflow
6. Select the gtf collection, soft-masked reference genome, coding sequences and pseudo coding sequences in the combine transcripts workflow
7. In Step 7 of the workflow ensure the masked genome is selected and that in Step 10 of the workflow type "1" in the `List of Fields` box (Fig 13; Fig 14; Fig 15)

{% include image.html file="/transcriptome/Fig13.png" caption="Fig 13." max-width="10" %}
Expand Down Expand Up @@ -186,13 +189,11 @@ Further to this, a summary of the different elements of this alignment approach

{:start="3"}

3. Select the transdecoder peptide fasta file and the text transformed fasta output file from the Combine Transcripts workflow (Fig 19; Fig20)

{% include image.html file="/transcriptome/Fig19.png" caption="Fig 19." max-width="10" %}
3. Select the transdecoder peptide fasta file and the text transformed fasta output file from the Combine Transcripts workflow (Fig 19)

{% include image.html file="/transcriptome/Fig20.png" caption="Fig 20." max-width="10" %}
{% include image.html file="/transcriptome/Fig19_updated.png" caption="Fig 19." max-width="10" %}

{:start="4"}

4. The output files tagged with `#dat`, `#pro`, and `#cdna`, along with the masked and unmasked reference genome are used as input files for [FGenesH++ genome annotation](Fgenesh)
4. The output files tagged with `#dat`, `#pro`, and `#cdna`, along with the hard-masked and unmasked reference genome are used as input files for [FGenesH++ genome annotation](Fgenesh).

Loading

0 comments on commit e88dd15

Please sign in to comment.