-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
5,510 additions
and
5,219 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
/target | ||
/.idea | ||
/testing_data/output/ds*/* | ||
Cargo.lock | ||
/target | ||
/.idea | ||
/testing_data/output/ds*/* | ||
Cargo.lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,34 @@ | ||
[package] | ||
name = "mgikit" | ||
version = "0.1.6" | ||
edition = "2021" | ||
authors = ["Ziad Al Bkhetan <[email protected]>"] | ||
repository = "https://github.com/sagc-bioinformatics/mgikit" | ||
readme = "README.md" | ||
keywords = ["fastq", "MGI", "demultiplexing"] | ||
|
||
[dependencies] | ||
flate2 = { version = "1.0.28", features = ["zlib-ng-compat"], default-features = false } | ||
libz-sys = { version = "1.1.8", default-features = false, features = ["libc"] } | ||
mimalloc = { version = "0.1.34", default-features = false } | ||
combinations = "*" | ||
itertools = "*" | ||
chrono = "0.4" | ||
termion = "2.0.1" | ||
clap = "4.3.21" | ||
memchr = "2.6.4" | ||
libdeflater = "1.12.0" | ||
niffler = { version = "2.5.0", default-features = false, features = ["gz"]} | ||
walkdir = "2.4.0" | ||
glob = "0.3.1" | ||
log = "0.4" | ||
env_logger = "0.10.1" | ||
sysinfo = "0.24.0" | ||
|
||
[dev-dependencies] | ||
md5 = "0.7.0" | ||
|
||
[profile.release] | ||
lto = "fat" | ||
codegen-units = 1 | ||
panic = "abort" | ||
[package] | ||
name = "mgikit" | ||
version = "0.1.7" | ||
edition = "2021" | ||
authors = ["Ziad Al Bkhetan <[email protected]>"] | ||
repository = "https://github.com/sagc-bioinformatics/mgikit" | ||
readme = "README.md" | ||
keywords = ["fastq", "MGI", "demultiplexing"] | ||
|
||
[dependencies] | ||
flate2 = { version = "1.0.28", features = ["zlib-ng-compat"], default-features = false } | ||
libz-sys = { version = "1.1.8", default-features = false, features = ["libc"] } | ||
mimalloc = { version = "0.1.34", default-features = false } | ||
combinations = "*" | ||
itertools = "*" | ||
chrono = "0.4" | ||
termion = "2.0.1" | ||
clap = "4.3.21" | ||
memchr = "2.6.4" | ||
libdeflater = "1.12.0" | ||
niffler = { version = "2.5.0", default-features = false, features = ["gz"]} | ||
walkdir = "2.4.0" | ||
glob = "0.3.1" | ||
log = "0.4" | ||
env_logger = "0.10.1" | ||
sysinfo = "0.24.0" | ||
|
||
[dev-dependencies] | ||
md5 = "0.7.0" | ||
|
||
[profile.release] | ||
lto = "fat" | ||
codegen-units = 1 | ||
panic = "abort" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,69 +1,69 @@ | ||
--- | ||
|
||
![SAGC-Bioinformatics](docs/assets/SAGC-logo-hover.png) | ||
|
||
--- | ||
|
||
## MGIKIT | ||
mgikit is a collection of tools used to demultiplex fastq files and generate demultiplexing and quality reports. | ||
|
||
The toolkit includes the following commands: | ||
|
||
### demultiplex | ||
This command is used to demultiplex fastq files and assign the sequencing reads to their | ||
associated samples. The tool requires the following mandatory input files to perform the | ||
demultiplexing: | ||
1. Fastq files (single/paired-end). | ||
2. Sample sheet which contains sample indexes and their templates (will be explained in detail). | ||
|
||
Simply, the tool reads the barcodes at the end of R2 (reveres) reads for paired-end reads input or the end of | ||
R1 (forward) reads for single read input. Based on the barcode, it assigns the read to the relevant | ||
sample allowing for mismatches less than a specific threshold. The tool outputs fastq files for each sample | ||
as well as some summary reports that can be visualised through the MultiQC tool and mgikit plugin. | ||
|
||
<hr/> | ||
|
||
### template | ||
|
||
This command is used to detect the location and form of the indexes within the read barcode. It simply goes through a small number of the reads and investigates the number of matches with the indexes in the sample sheet within each possible location in the read barcode and considering the indexes as is and their reverse complementary. | ||
|
||
It reports matches for all possible combinations and uses the read template that had the maximum number of matches. This process happens for each sample individually and therefore, the best matching template for each sample will be reported. | ||
|
||
Using this comprehensive scan, the tool can detect the templates for mixed libraries. | ||
|
||
<hr/> | ||
|
||
### report | ||
|
||
This command is to merge demultiplexing and quality reports from multiple lanes into one comprehensive report for MultQC reports visualisation. | ||
|
||
<hr/> | ||
|
||
### reformat | ||
|
||
This command is to reformat fastq files generated by `splitBarcode` into Illumina format and generate quality reports. | ||
|
||
<hr/> | ||
|
||
## Installation | ||
|
||
You can use the static binary under bins directly, however, if you like to build it from the source code: | ||
|
||
You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html) | ||
|
||
```bash | ||
git clone https://github.com/sagc-bioinformatics/mgikit.git | ||
cd mgikit | ||
cargo build --release | ||
``` | ||
|
||
|
||
|
||
## User Guide | ||
|
||
Please checkout the [documeantion](https://sagc-bioinformatics.github.io/mgikit/) | ||
|
||
|
||
## Commerical Use | ||
|
||
Please contact us if you want to use the software for commercial purposes. | ||
--- | ||
|
||
![SAGC-Bioinformatics](docs/assets/SAGC-logo-hover.png) | ||
|
||
--- | ||
|
||
## MGIKIT | ||
mgikit is a collection of tools used to demultiplex fastq files and generate demultiplexing and quality reports. | ||
|
||
The toolkit includes the following commands: | ||
|
||
### demultiplex | ||
This command is used to demultiplex fastq files and assign the sequencing reads to their | ||
associated samples. The tool requires the following mandatory input files to perform the | ||
demultiplexing: | ||
1. Fastq files (single/paired-end). | ||
2. Sample sheet which contains sample indexes and their templates (will be explained in detail). | ||
|
||
Simply, the tool reads the barcodes at the end of R2 (reveres) reads for paired-end reads input or the end of | ||
R1 (forward) reads for single read input. Based on the barcode, it assigns the read to the relevant | ||
sample allowing for mismatches less than a specific threshold. The tool outputs fastq files for each sample | ||
as well as some summary reports that can be visualised through the MultiQC tool and mgikit plugin. | ||
|
||
<hr/> | ||
|
||
### template | ||
|
||
This command is used to detect the location and form of the indexes within the read barcode. It simply goes through a small number of the reads and investigates the number of matches with the indexes in the sample sheet within each possible location in the read barcode and considering the indexes as is and their reverse complementary. | ||
|
||
It reports matches for all possible combinations and uses the read template that had the maximum number of matches. This process happens for each sample individually and therefore, the best matching template for each sample will be reported. | ||
|
||
Using this comprehensive scan, the tool can detect the templates for mixed libraries. | ||
|
||
<hr/> | ||
|
||
### report | ||
|
||
This command is to merge demultiplexing and quality reports from multiple lanes into one comprehensive report for MultQC reports visualisation. | ||
|
||
<hr/> | ||
|
||
### reformat | ||
|
||
This command is to reformat fastq files generated by `splitBarcode` into Illumina format and generate quality reports. | ||
|
||
<hr/> | ||
|
||
## Installation | ||
|
||
You can use the static binary under bins directly, however, if you like to build it from the source code: | ||
|
||
You need to have Rust and cargo installed first, check rust [documenation](https://doc.rust-lang.org/cargo/getting-started/installation.html) | ||
|
||
```bash | ||
git clone https://github.com/sagc-bioinformatics/mgikit.git | ||
cd mgikit | ||
cargo build --release | ||
``` | ||
|
||
|
||
|
||
## User Guide | ||
|
||
Please checkout the [documeantion](https://sagc-bioinformatics.github.io/mgikit/) | ||
|
||
|
||
## Commerical Use | ||
|
||
Please contact us if you want to use the software for commercial purposes. |
Oops, something went wrong.