Skip to content

oshritshtossel/miMic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

miMic (Mann-Whitney image microbiome)

This repository is attached to the paper "mi-Mic: a novel multi-layer statistical test for microbiota-disease associations".
miMic is a straightforward yet remarkably versatile and scalable approach for differential abundance analysis.

miMic consists of three main steps:

  • Data preprocessing and translation to a cladogram of means.

  • An apriori nested ANOVA (or nested GLM for continuous labels) to detect overall microbiome-label relations.

  • A post hoc test along the cladogram trajectories.

miMic

miMic is available through the following platforms:

Install the package

pip install mimic-da

How to apply miMic

See example_use.py for an example of how to use miMic.
The example contains the following steps:

  1. Import miMic and additional packages.

    from mimic_da import apply_mimic
    import pandas as pd
  2. Load the raw ASVs table in the following format:

    • The first column is named "ID"
    • Each row represents a sample and each column represents an ASV.
    • The last row contains the taxonomy information, named "taxonomy".
    df = pd.read_csv("example_data/for_process.csv")
    • Note: for_process.csv is a file that contains the raw ASVs table in the required format, you can find an exmaple file in example_data folder.
  3. Load a tag table as csv, such that the tag column is named "Tag".

    tag = pd.read_csv("example_data/tag.csv",index_col=0)
    • Note: tag.csv is a file that contains the tag table in the required format, you can find an example tag in example_data folder.
  4. Specify a folder to save the output of the miMic test.

    folder = "example_data/2D_images"
    • Note: 2D_images is a folder that will be created in your current working directory, and the output of the miMic test will be saved there.
  5. Apply MIPMLP.

    • MIPMLP using defaulting parameters, you can find more in 'Note' section below.
    • taxonomy_group: ["sub PCA", "mean", "sum"], "sub PCA" method is preferred.
    processed = apply_mimic(folder=folder, tag=tag, mode="preprocess", preprocess=True, rawData=df,
                             taxnomy_group='sub PCA')
    • Note: MIPMLP is a package that is used to preprocess the raw ASVs table, see MIPMLP PyPi or MIPMLP website for more explanations.
      If you have your own processed data, set preprocess to False, and use your processed data as input for proceesed parameter in the next step.
  6. Apply miMic test.
    miMic using the following hyperparameters:

    • eval: evaluation method, ["man", "corr", "cat"]. Default is "man".
      • "man" for binary labels.
      • "corr" for continuous labels.
      • "cat" for categorical labels.
    • sis: apply sister correction,["fdr_bh", "bonferroni", "no"]. Default is "fdr_bh".
    • correct_first: apply FDR correction to the starting taxonomy level according to sis parameter,[True, False] Default is True.
    • mode: 2 different formats of running,["test", "plot"]. Default is "test".
    • save: whether to save the corrs_df of the miMic test to computer,[True, False] Default is True.
    • tax: starting taxonomy of the post hoc test,["None", 1, 2, 3, "noAnova", "nosignifacnt"]
      • In "test" mode the defaulting value is "None".
      • In the "plot" mode the tax is set automatically to the selected taxonomy of the "test" mode [1, 2, 3, "noAnova"].
      • "noAnova", where apriori nested ANOVA test is not significant.
      • "nosignificant", where apriori nested ANOVA test is not significant and miMic did not find any significant taxa in the leafs. In this case, the post hoc test will not be applied.
    • colorful: Determines whether to apply colorful mode on the plots [True, False]. Default is True.
    • threshold_p: the threshold for significant values. Default is 0.05.
    • THRESHOLD_edge: the threshold for having an edge in "interaction" plot. Default is 0.5.
    • processed: the processed data from the previous step. Default is None.
    • apply_samba: whether to apply samba or no. Default is True (Boolean).
    • samba_output: if you already have samba outputs- miMic will read it from the folder you specified, else miMic will apply samba and set samba_output to None.
      if processed is not None:
           taxonomy_selected,samba_output = apply_mimic(folder, tag, eval="man", threshold_p=0.05, processed=processed, apply_samba=True, save=False)
           if taxonomy_selected is not None:
               apply_mimic(folder, tag, mode="plot", tax=taxonomy_selected, eval="man", sis='fdr_bh', samba_output=samba_output,save=False,
                           threshold_p=0.05, THRESHOLD_edge=0.5)
    • Note: if apply_samba is set to True, miMic will apply samba-metric.
      If save is set to True, the output will be saved to the folder you specified.
      See SAMBA PyPi for more explanations.

miMic output

miMic will output the following:

  • If save is set to True, samba outputs and the following csv will be saved to your specified folder:

    • corrs_df: a dataframe containing the results of the miMic test (including Utest results).
    • just_mimic: a dataframe containing the results of the miMic test without the Utest results.
    • u_test_without_mimic: a dataframe containing the results of the Utest without the miMic results.
    • miMic&Utest: a dataframe containing the joint results of miMic and Utest tests.
  • If mode is set to "plot", plots will be saved in the folder named 'plots' in your current working directory.
    The following plots will be saved:

    1. tax_vs_rp_sp_anova_p: plot RP vs SP over the different taxonomy levels and color the background of the plot till the selected taxonomy, based on miMic test.
      tax_vs_rp_sp_anova_p

    2. rsp_vs_beta: calculate RSP score for different betas and create the appropriate plot.
      rsp_vs_beta

    3. hist: a histogram of the ASVs in each taxonomy level.
      hist

    4. corrs_within_family: a plot of the correlation between the significant ASVs within the family level, if colorful is set to True, each family will be colored.
      corrs_within_family

    5. interaction: a plot of the interaction between the significant ASVs.
      interaction

    6. correlations_tree: create correlation cladogram, such that tha size of each node is according to the -log(p-value), the color of each node represents the sign of the post hoc test, the shape of the node (circle, square,sphere) is based on miMic, Utest, or both results accordingly, and if colorful is set to True, the background color of the node will be colored based on the family color.
      correlations_tree

Cite us

If you are using our package, miMic for any purpose, please cite us; Shtossel, Oshrit, Shani Finkelstein, and Yoram Louzoun. "mi-Mic: a novel multi-layer statistical test for microbiota-disease associations." Genome Biology 25, no. 1 (2024): 113. https://link.springer.com/article/10.1186/s13059-024-03256-0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages