Skip to content

Commit

Permalink
documentation anvi-script-find-misassemblies
Browse files Browse the repository at this point in the history
  • Loading branch information
FlorianTrigodet committed Aug 14, 2024
1 parent 4a35ea9 commit 95eb020
Show file tree
Hide file tree
Showing 6 changed files with 745 additions and 533 deletions.
2 changes: 1 addition & 1 deletion help/main/artifacts/bam-file/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ A BAM-type anvi'o artifact. This artifact is typically provided **by the user**
## Required or used by


<p style="text-align: left" markdown="1"><span class="artifact-r">[anvi-get-short-reads-from-bam](../../programs/anvi-get-short-reads-from-bam)</span> <span class="artifact-r">[anvi-get-short-reads-mapping-to-a-gene](../../programs/anvi-get-short-reads-mapping-to-a-gene)</span> <span class="artifact-r">[anvi-get-tlen-dist-from-bam](../../programs/anvi-get-tlen-dist-from-bam)</span> <span class="artifact-r">[anvi-profile](../../programs/anvi-profile)</span> <span class="artifact-r">[anvi-profile-blitz](../../programs/anvi-profile-blitz)</span> <span class="artifact-r">[anvi-report-linkmers](../../programs/anvi-report-linkmers)</span> <span class="artifact-r">[anvi-script-get-coverage-from-bam](../../programs/anvi-script-get-coverage-from-bam)</span> <span class="artifact-r">[anvi-script-reformat-bam](../../programs/anvi-script-reformat-bam)</span></p>
<p style="text-align: left" markdown="1"><span class="artifact-r">[anvi-get-short-reads-from-bam](../../programs/anvi-get-short-reads-from-bam)</span> <span class="artifact-r">[anvi-get-short-reads-mapping-to-a-gene](../../programs/anvi-get-short-reads-mapping-to-a-gene)</span> <span class="artifact-r">[anvi-get-tlen-dist-from-bam](../../programs/anvi-get-tlen-dist-from-bam)</span> <span class="artifact-r">[anvi-profile](../../programs/anvi-profile)</span> <span class="artifact-r">[anvi-profile-blitz](../../programs/anvi-profile-blitz)</span> <span class="artifact-r">[anvi-report-linkmers](../../programs/anvi-report-linkmers)</span> <span class="artifact-r">[anvi-script-find-misassemblies](../../programs/anvi-script-find-misassemblies)</span> <span class="artifact-r">[anvi-script-get-coverage-from-bam](../../programs/anvi-script-get-coverage-from-bam)</span> <span class="artifact-r">[anvi-script-reformat-bam](../../programs/anvi-script-reformat-bam)</span></p>


## Description
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 32 additions & 2 deletions help/main/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ If you need an introduction to the terminology used in 'omics research or in anv
<a href="/network/" target="_blank"><img src="/images/anvio-network.png" width="100%" /></a>

{:.notice}
The help contents were last updated on **02 Aug 24 22:54:33** for anvi'o version **8-dev (marie)**.
The help contents were last updated on **14 Aug 24 13:42:12** for anvi'o version **8-dev (marie)**.


{% include _project-anvio-version.html %}
Expand Down Expand Up @@ -132,7 +132,7 @@ Listed below **a total of 134 artifacts**.

Anvi'o programs perform atomic tasks that can be weaved together to implement complete 'omics workflows. Please note that there may be programs that are not listed on this page. You can type 'anvi-' in your terminal, and press the TAB key twice to see the full list of programs available to you on your system, and type `anvi-program-name --help` to read the full list of command line options.

Listed below **a total of 151 programs**.
Listed below **a total of 152 programs**.


<div style="width:100%;">
Expand Down Expand Up @@ -4029,6 +4029,36 @@ Listed below **a total of 151 programs**.
</table>
</div>

<div style="width:100%;">
<table class="programs-table">
<tbody>
<tr style="border:none;">
<td class="program-td">
<span class="artifact-emoji">🔥</span> <span markdown="1">**[anvi-script-find-misassemblies](programs/anvi-script-find-misassemblies)**</span>. <span markdown="1">This script report errors in long read assembly using read-recruitment information. The input file should be a BAM file of long reads mapped to an assembly made from these reads.</span>.
</td>
</tr>
<tr>
<td class="artifact-r-td">
<span class="artifact-emoji">🧀</span>
<span class="artifact-r" markdown="1">[bam-file](artifacts/bam-file) <img src="images/icons/BAM.png" class="artifact-icon-mini" /> </span>
</td>
</tr>
<tr>
<td class="artifact-p-td">
<span class="artifact-emoji">🍕</span>
<span class="artifact-p" markdown="1">n/a</span>
</td>
</tr>
<tr style="border:none;">
<td class="program-td">
<span class="artifact-emoji">🧠</span> <div class="anvio-person-mini"><div class="anvio-person-photo-mini"><a href="/people/floriantrigodet" target="_blank"><img class="anvio-person-photo-img-mini" title="Florian Trigodet" src="images/authors/FlorianTrigodet.jpg" /></a></div></div>

</td>
</tr>
</tbody>
</table>
</div>

<div style="width:100%;">
<table class="programs-table">
<tbody>
Expand Down
115 changes: 115 additions & 0 deletions help/main/programs/anvi-script-find-misassemblies/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
layout: program
title: anvi-script-find-misassemblies
excerpt: An anvi'o program. This script report errors in long read assembly using read-recruitment information.
categories: [anvio]
comments: false
redirect_from: /m/anvi-script-find-misassemblies
image:
featurerelative: ../../../images/header.png
display: true
---

This script report errors in long read assembly using read-recruitment information. The input file should be a BAM file of long reads mapped to an assembly made from these reads..

🔙 **[To the main page](../../)** of anvi'o programs and artifacts.


{% include _toc.html %}
<div id="svg" class="subnetwork"></div>
{% capture network_path %}{{ "network.json" }}{% endcapture %}
{% capture network_height %}{{ 300 }}{% endcapture %}
{% include _project-anvio-graph.html %}


## Authors

<div class="anvio-person"><div class="anvio-person-info"><div class="anvio-person-photo"><img class="anvio-person-photo-img" src="../../images/authors/FlorianTrigodet.jpg" /></div><div class="anvio-person-info-box"><a href="/people/floriantrigodet" target="_blank"><span class="anvio-person-name">Florian Trigodet</span></a><div class="anvio-person-social-box"><a href="mailto:[email protected]" class="person-social" target="_blank"><i class="fa fa-fw fa-envelope-square"></i>Email</a><a href="http://twitter.com/FlorianTrigodet" class="person-social" target="_blank"><i class="fa fa-fw fa-twitter-square"></i>Twitter</a><a href="http://github.com/floriantrigodet" class="person-social" target="_blank"><i class="fa fa-fw fa-github"></i>Github</a></div></div></div></div>



## Can consume


<p style="text-align: left" markdown="1"><span class="artifact-r">[bam-file](../../artifacts/bam-file) <img src="../../images/icons/BAM.png" class="artifact-icon-mini" /></span></p>


## Can provide


This program does not seem to provide any artifacts. Such programs usually print out some information for you to see or alter some anvi'o artifacts without producing any immediate outputs.


## Usage


The aim of this script is to find potential assembly errors from long read assemblers.

### Principle

This script searches for potential assembly errors in a <span class="artifact-n">[bam-file](/help/main/artifacts/bam-file)</span> generated from the mapping of long reads onto an assembly made using the same reads. The basic principle is that (1) every single nucleotide, and (2) consecutive pairs/triplets of nucleotides in an assembly should be covered by at least one of the long reads that was used to generate the assembly.

To find out if every nucleotide is covered by at least one read, one simply needs to find regions with 0x coverage, and that is one output of this script.

To find where two/three consecutive nucleotides are not covered by at least one read, we can leverage the clipping information reported by long read mapping software like [minimap2](https://github.com/lh3/minimap2). Clipping happens when the left- or right-most part of a read does not align to the reference. If 100% of reads clip at the same nucleotide position, then it means that not a single read is covering at least two consecutive nucleotides. AND THAT'S NOT GOOD.

Here is an example visualized in [IGV](https://igv.org/). All reads are clipped at the same position. There is no support in the long reads that the left and right sides of the contig should be joined, suggesting that there is a misassembly here.

![clipping_example](../../images/anvi-script-find-misassemblies.png)


### Basic usage

The only input file to this script is a simple <span class="artifact-n">[bam-file](/help/main/artifacts/bam-file)</span>. But not any BAM file. It has to made by mapping long reads onto an assembly generated using the same reads. You also need to provide an output prefix.

<div class="codeblock" markdown="1">
anvi&#45;script&#45;find&#45;misassemblies &#45;b sample01.bam &#45;o result
</div>


### Outputs

The first output is a table reporting regions in your assemblies with zero coverage. This output includes the contig in which the region occurs, the contig's length, the region's start and stop positions, and the length of the region with no coverage.

|**contig**|**length**|**range**|**range_size**|
|:--|:--|:--|:--|
|contig_001|1665603|0-498|498|
|contig_001|1665603|100500-101000|500|
|contig_001|1665603|1665106-1665603|497|

The second output is a table reporting the positions with high relative abundance of clipped reads. It includes the contig in which the position occurs, the contig's length, the position in the contig with high levels of clipping, its relative position in the contig, the total coverage at that position, the coverage of reads clipping at that position, and the ratio between these two coverage values. The 'relative position' is the position divided by the contig's length (a value from 0 to 1), which tells you whether the position is close to the contig ends or somewhere in the middle.

|**contig**|**length**|**pos**|**relative_pos**|**cov**|**clipping**|**clipping_ratio**|
|:--|:--|:--|:--|:--|:--|:--|
|contig_001|1665603|498|0.0002989908159387321|48|48|1.0|
|contig_001|1665603|500999|0.30079136504917436|120|120|1.0|
|contig_001|1665603|501000|0.30079196543233894|79|79|1.0|
|contig_001|1665603|1665105|0.9997010091840612|45|45|1.0|


### Additional parameters

By default, the script will report clipping positions if the ratio of clipping reads to the total number of reads is over 80%. You can change that threshold with the flag `--clipping-ratio`:

<div class="codeblock" markdown="1">
anvi&#45;script&#45;find&#45;misassemblies &#45;b sample01.bam &#45;o result &#45;&#45;clipping&#45;ratio 0.6
</div>


Another default behavior of the script is to skip the first and last 100bp of a contig (only valid for the table reporting positions of high clipping). This is because contig ends often have high proportions of clipping which have nothing to do with misassemblies. You can change that parameter with the flag `--min-dist-to-end`:

<div class="codeblock" markdown="1">
anvi&#45;script&#45;find&#45;misassemblies &#45;b sample01.bam &#45;o result &#45;&#45;min&#45;dist&#45;to&#45;end 0
</div>


{:.notice}
Edit [this file](https://github.com/merenlab/anvio/tree/master/anvio/docs/programs/anvi-script-find-misassemblies.md) to update this information.


## Additional Resources



{:.notice}
Are you aware of resources that may help users better understand the utility of this program? Please feel free to edit [this file](https://github.com/merenlab/anvio/tree/master/bin/anvi-script-find-misassemblies) on GitHub. If you are not sure how to do that, find the `__resources__` tag in [this file](https://github.com/merenlab/anvio/blob/master/bin/anvi-interactive) to see an example.
30 changes: 30 additions & 0 deletions help/main/programs/anvi-script-find-misassemblies/network.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"graph": [],
"nodes": [
{
"size": 1,
"score": 1,
"color": "#AA0000",
"id": "bam-file",
"name": "bam-file",
"provided_by_anvio": false,
"type": "BAM"
},
{
"size": 1,
"score": 0.1,
"color": "#AAAA00",
"id": "anvi-script-find-misassemblies",
"name": "anvi-script-find-misassemblies",
"type": "PROGRAM"
}
],
"links": [
{
"target": 1,
"source": 0
}
],
"directed": false,
"multigraph": false
}
Loading

0 comments on commit 95eb020

Please sign in to comment.