Skip to content

Commit

Permalink
Merge pull request #4 from sagc-bioinformatics/V0.1.3
Browse files Browse the repository at this point in the history
V0.1.3
  • Loading branch information
ziadbkh authored Jan 11, 2024
2 parents a56be16 + 2ef9713 commit 452bb51
Show file tree
Hide file tree
Showing 26 changed files with 548 additions and 134 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "mgikit"
version = "0.1.2"
version = "0.1.3"
edition = "2021"
authors = ["Ziad Al Bkhetan <[email protected]>"]
repository = "https://github.com/sagc-bioinformatics/mgikit"
Expand Down
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,25 @@ This command is to merge demultiplexing and quality reports from multiple lanes

<hr/>

## Installation

You can use the static binary under bins directly, however, if you like to build it from the source code:

You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html)

```bash
git clone https://github.com/sagc-bioinformatics/mgikit.git
cd mgikit
cargo build --release
```



## User Guide

Please checkout the [documeantion](https://sagc-bioinformatics.github.io/mgikit/)


## Commerical Use

Please contact us if you want to use the software for commercial purposes.
Binary file added bins/mgikit-V0.1.3.zip
Binary file not shown.
Binary file removed bins/mgikit.zip
Binary file not shown.
13 changes: 13 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,19 @@ This command is to merge demultiplexing and quality reports from multiple lanes

<hr/>

## Installation

You can use the static binary under bins directly, however, if you like to build it from the source code:

You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html)


```bash
git clone https://github.com/sagc-bioinformatics/mgikit.git
cd mgikit
cargo build --release
```

## User Guide Table of Content

{% include section-navigation-tiles.html type="guides" %}
Expand Down
123 changes: 122 additions & 1 deletion docs/pages/demultiplex.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ the number of allowed mismatches is high.

+ **`--report-level`**: The level of reporting. [default: 2]

+ **`--compression-level`**: The level of compression (between 0 and 12). 0 is fast but no compression, 9 is slow but high compression. [default: 1]
+ **`--compression-level`**: The level of compression (between 0 and 12). 0 is fast but no compression, 12 is slow but high compression. [default: 1]

+ **`--force`**: this flag is to force the run and overwrite the existing output directory if exists.

Expand Down Expand Up @@ -356,6 +356,127 @@ multiqc mgikit-examples/test/

```

### Performance evaluation

Performance time (in minutes) evaluation and comparison on different datasets.
DS01 and DS04 are 10 bp dual index, DS02 and DS3 are 8 bp dual index and DS05 is 8 bp single index.
In the case of single-end, the R2 file of the dataset is used alone for demultiplexing.

<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-g7sd{border-color:inherit;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-uzvj{border-color:inherit;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
<tr>
<th class="tg-g7sd" rowspan="2">Dataset</th>
<th class="tg-uzvj" rowspan="2">Reads</th>
<th class="tg-uzvj" rowspan="2">Samples</th>
<th class="tg-uzvj" colspan="2">Length (bp)</th>
<th class="tg-uzvj" colspan="2">Size (GB)</th>
<th class="tg-uzvj" rowspan="2">Paired-end</th>
<th class="tg-uzvj" rowspan="2">Single-end</th>
</tr>
<tr>
<th class="tg-7btt">R1</th>
<th class="tg-7btt">R2</th>
<th class="tg-7btt">R1</th>
<th class="tg-7btt">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-fymr">DS01</td>
<td class="tg-c3ow">298303014</td>
<td class="tg-c3ow">102</td>
<td class="tg-c3ow">300</td>
<td class="tg-c3ow">320</td>
<td class="tg-c3ow">76</td>
<td class="tg-c3ow">85</td>
<td class="tg-c3ow">71.5</td>
<td class="tg-c3ow">37.2</td>
</tr>
<tr>
<td class="tg-fymr">DS02</td>
<td class="tg-c3ow">494667136</td>
<td class="tg-c3ow">39</td>
<td class="tg-c3ow">148</td>
<td class="tg-c3ow">172</td>
<td class="tg-c3ow">65</td>
<td class="tg-c3ow">75</td>
<td class="tg-c3ow">61.5</td>
<td class="tg-c3ow">31.8</td>
</tr>
<tr>
<td class="tg-fymr">DS03</td>
<td class="tg-c3ow">506600595</td>
<td class="tg-c3ow">29</td>
<td class="tg-c3ow">100</td>
<td class="tg-c3ow">124</td>
<td class="tg-c3ow">46</td>
<td class="tg-c3ow">55</td>
<td class="tg-c3ow">43.5</td>
<td class="tg-c3ow">30</td>
</tr>
<tr>
<td class="tg-fymr">DS04</td>
<td class="tg-c3ow">274567350</td>
<td class="tg-c3ow">5</td>
<td class="tg-c3ow">28</td>
<td class="tg-c3ow">70</td>
<td class="tg-c3ow">8.5</td>
<td class="tg-c3ow">19</td>
<td class="tg-c3ow">13</td>
<td class="tg-c3ow">11.9</td>
</tr>
<tr>
<td class="tg-fymr">DS05</td>
<td class="tg-c3ow">500612381</td>
<td class="tg-c3ow">64</td>
<td class="tg-c3ow">50</td>
<td class="tg-c3ow">8</td>
<td class="tg-c3ow">22</td>
<td class="tg-c3ow">5.5</td>
<td class="tg-c3ow">12</td>
<td class="tg-c3ow">-</td>
</tr>
</tbody>
</table>

### Memory utilisation

The default parameters of the tool are optimised to achive high performance. The majority of the memory needed is allocated for output buffering to reduce writing to disk operations.

The expected memory usage is influnced yb three main factors,

1. Number of samples in the sample sheet.
2. Writing buffer size (`--writing-buffer-size` parameter, default is `67108864`).
3. Compression buffer size (`--compression-buffer-size` parameter, default is `131072`).
4. Single end or paired end input data.

The expected allocated memory is

+ **Single-end input**: `number of smaples * (writing buffer size + 2 * compression buffer size)`.

+ **Paired-end input**: `2 * number of smaples * (writing buffer size + 2 * compression buffer size)`.

When using the default parameters:

+ **Single-end input**: `number of smaples * 64.25 MB`.

+ **Paired-end input**: `2 * number of smaples 64.25 MB`.

Reducing the writing buffer size will reduce the reqiured memory but also affect the performance time.


### Execution examples

You can use the datasets at `testing_data` to perform these tests.
Expand Down
Loading

0 comments on commit 452bb51

Please sign in to comment.