This repository is attached to the paper "mi-Mic: a novel multi-layer statistical test for microbiota-disease associations".
miMic is a straightforward yet remarkably versatile and scalable approach for differential abundance analysis.
miMic consists of three main steps:
-
Data preprocessing and translation to a cladogram of means.
-
An apriori nested ANOVA (or nested GLM for continuous labels) to detect overall microbiome-label relations.
-
A post hoc test along the cladogram trajectories.
miMic is available through the following platforms:
pip install mimic-da
See example_use.py
for an example of how to use miMic.
The example contains the following steps:
-
Import miMic and additional packages.
from mimic_da import apply_mimic import pandas as pd
-
Load the raw ASVs table in the following format:
- The first column is named "ID"
- Each row represents a sample and each column represents an ASV.
- The last row contains the taxonomy information, named "taxonomy".
df = pd.read_csv("example_data/for_process.csv")
- Note:
for_process.csv
is a file that contains the raw ASVs table in the required format, you can find an exmaple file inexample_data
folder.
-
Load a tag table as csv, such that the tag column is named "Tag".
tag = pd.read_csv("example_data/tag.csv",index_col=0)
- Note:
tag.csv
is a file that contains the tag table in the required format, you can find an example tag inexample_data
folder.
- Note:
-
Specify a folder to save the output of the miMic test.
folder = "example_data/2D_images"
- Note:
2D_images
is a folder that will be created in your current working directory, and the output of the miMic test will be saved there.
- Note:
-
Apply MIPMLP.
- MIPMLP using defaulting parameters, you can find more in 'Note' section below.
- taxonomy_group: ["sub PCA", "mean", "sum"], "sub PCA" method is preferred.
processed = apply_mimic(folder=folder, tag=tag, mode="preprocess", preprocess=True, rawData=df, taxnomy_group='sub PCA')
- Note: MIPMLP is a package that is used to preprocess the raw ASVs table, see MIPMLP PyPi or MIPMLP website for more explanations.
If you have your own processed data, setpreprocess
to False, and use your processed data as input forproceesed
parameter in the next step.
-
Apply miMic test.
miMic using the following hyperparameters:- eval: evaluation method, ["man", "corr", "cat"]. Default is "man".
- "man" for binary labels.
- "corr" for continuous labels.
- "cat" for categorical labels.
- sis: apply sister correction,["fdr_bh", "bonferroni", "no"]. Default is "fdr_bh".
- correct_first: apply FDR correction to the starting taxonomy level according to
sis
parameter,[True, False] Default is True. - mode: 2 different formats of running,["test", "plot"]. Default is "test".
- save: whether to save the corrs_df of the miMic test to computer,[True, False] Default is True.
- tax: starting taxonomy of the post hoc test,["None", 1, 2, 3, "noAnova", "nosignifacnt"]
- In "test" mode the defaulting value is "None".
- In the "plot" mode the tax is set automatically to the selected taxonomy of the "test" mode [1, 2, 3, "noAnova"].
- "noAnova", where apriori nested ANOVA test is not significant.
- "nosignificant", where apriori nested ANOVA test is not significant and miMic did not find any significant taxa in the leafs. In this case, the post hoc test will not be applied.
- colorful: Determines whether to apply colorful mode on the plots [True, False]. Default is True.
- threshold_p: the threshold for significant values. Default is 0.05.
- THRESHOLD_edge: the threshold for having an edge in "interaction" plot. Default is 0.5.
- processed: the processed data from the previous step. Default is None.
- apply_samba: whether to apply samba or no. Default is True (Boolean).
- samba_output: if you already have samba outputs- miMic will read it from the folder you specified,
else miMic will apply samba and set
samba_output
to None.
if processed is not None: taxonomy_selected,samba_output = apply_mimic(folder, tag, eval="man", threshold_p=0.05, processed=processed, apply_samba=True, save=False) if taxonomy_selected is not None: apply_mimic(folder, tag, mode="plot", tax=taxonomy_selected, eval="man", sis='fdr_bh', samba_output=samba_output,save=False, threshold_p=0.05, THRESHOLD_edge=0.5)
- Note: if
apply_samba
is set to True, miMic will apply samba-metric.
Ifsave
is set to True, the output will be saved to the folder you specified.
See SAMBA PyPi for more explanations.
- eval: evaluation method, ["man", "corr", "cat"]. Default is "man".
miMic will output the following:
-
If
save
is set to True, samba outputs and the following csv will be saved to your specified folder:- corrs_df: a dataframe containing the results of the miMic test (including Utest results).
- just_mimic: a dataframe containing the results of the miMic test without the Utest results.
- u_test_without_mimic: a dataframe containing the results of the Utest without the miMic results.
- miMic&Utest: a dataframe containing the joint results of miMic and Utest tests.
-
If
mode
is set to "plot", plots will be saved in the folder named 'plots' in your current working directory.
The following plots will be saved:-
tax_vs_rp_sp_anova_p: plot RP vs SP over the different taxonomy levels and color the background of the plot till the selected taxonomy, based on miMic test.
-
rsp_vs_beta: calculate RSP score for different betas and create the appropriate plot.
-
corrs_within_family: a plot of the correlation between the significant ASVs within the family level, if
colorful
is set to True, each family will be colored.
-
interaction: a plot of the interaction between the significant ASVs.
-
correlations_tree: create correlation cladogram, such that tha size of each node is according to the -log(p-value), the color of each node represents the sign of the post hoc test, the shape of the node (circle, square,sphere) is based on miMic, Utest, or both results accordingly, and if
colorful
is set to True, the background color of the node will be colored based on the family color.
-
If you are using our package, miMic for any purpose, please cite us; Shtossel, Oshrit, Shani Finkelstein, and Yoram Louzoun. "mi-Mic: a novel multi-layer statistical test for microbiota-disease associations." Genome Biology 25, no. 1 (2024): 113. https://link.springer.com/article/10.1186/s13059-024-03256-0