Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance matrix of tensors? #108

Open
jonbakerlab opened this issue Jan 10, 2025 · 1 comment
Open

Distance matrix of tensors? #108

jonbakerlab opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@jonbakerlab
Copy link

Hey Cameron, thanks for the great suite of tools! We are using gemelli and have a question regarding testing the Phylo CTF output for significance. We have a dataset that is different sites from subjects (rather than longitudinal data), and applying Phylo CTF does give better separation on a biplot of our healthy and disease subjects than the individual samples did (i.e., biplots of various beta diversity metrics). It would seem to me that we should do some sort of statistical test to see if our healthy and disease groups are truly different based on data represented in the Phylo CTF subject biplot (e.g., a PERMANOVA). However, the 'distance_matrix.qza' file output by qiime gemelli phylogenetic-ctf-with-taxonomy contains the distances of all the samples, rather than distances of the subject tensors, so using it as input for qiime diversity beta-group-significance would seem to give the same result as the data before going through Phylo CTF...? Is there an actual output from Phylo CTF that we could use as input for a PERMANOVA, or how should we test for significance? Doesn't seem like any of the output from qiime gemelli phylogenetic-ctf-with-taxonomy would be usable for that? Thanks for your help!

@cameronmartino cameronmartino self-assigned this Jan 10, 2025
@cameronmartino cameronmartino added the enhancement New feature or request label Jan 10, 2025
@cameronmartino
Copy link
Collaborator

Hi @jonbakerlab,

Thanks for the kind words and for using gemelli! It is awesome to hear that Phylo-CTF is helping extract signals in the data.

My preferred way is to split the distance matrix by timepoint/site and test for sig. differences by phenotype between subjects per time/site and then on aggregate. However, you are the second person to ask this question this month and what you are asking for makes sense, especially in the context of multiple body sites, which is why I am flagging this as a requested feature. I will get this output included in the next version update of the tool. In the meantime, you can generate it yourself. It will involve a bit of code but here is an example of how to calculate it using QIIME2 API IBD Example that should hopefully be easy to apply to your data. This will give a per subject distance (not per sample as is the default):

First run Phylo-CTF like the tutorial (you probably already did this step)

import os
import warnings
import qiime2 as q2
from qiime2.plugins.gemelli.actions import phylogenetic_ctf_with_taxonomy
from qiime2.plugins.fragment_insertion.methods import filter_features
# hide pandas Future/Deprecation Warning(s) for tutorial
warnings.filterwarnings("ignore", category=DeprecationWarning) 
warnings.simplefilter(action='ignore', category=FutureWarning)

# import table(s)
table = q2.Artifact.load('IBD-2538/data/table.qza')
rarefied_table = q2.Artifact.load('IBD-2538/data/rarefied-table.qza')
# import metadata
metadata = q2.Metadata.load('IBD-2538/data/metadata.tsv')
# import tree
tree = q2.Artifact.load('IBD-2538/data/sepp-insertion-tree.qza')
# import taxonomy
taxonomy = q2.Artifact.load('IBD-2538/data/taxonomy.qza')

# make a dir. for results
os.mkdir('IBD-2538/ctf-results')
# first ensure all the table features are in the tree
table = filter_features(table, tree).filtered_table
# now run phylo-CTF
ctf_results = phylogenetic_ctf_with_taxonomy(table, tree, taxonomy.view(q2.Metadata),
                                             metadata, 'host_subject_id', 'timepoint',
                                             min_feature_frequency=10)
# write all the artifacts to file by name
for art_tmp_name_, art_tmp_ in ctf_results._asdict().items():
    art_tmp_.save('IBD-2538/ctf-results/%s.qza' % (art_tmp_name_))

Now make a subject-level distance (your likely starting point)

import qiime2 as q2
from scipy.spatial import distance
from skbio import OrdinationResults, DistanceMatrix

# import and extract subject ordination (If not in QIIME2 don't need to extract just load as skbio type)
q2_subject_biplot = q2.Artifact.load('IBD-2538/ctf-results/subject_biplot.qza')
sk_subject_biplot = q2_subject_biplot.view(OrdinationResults)
# make subject distance
subject_U = sk_subject_biplot.samples.copy()
Udist = distance.cdist(subject_U.copy(),
                       subject_U.copy())
U_dist_res = DistanceMatrix(Udist, ids=subject_U.index)
U_dist_res_q2 = q2.Artifact.import_data('DistanceMatrix', U_dist_res)
# write the distances in skbio and QIIME2 formats
U_dist_res_q2.save('subject_distance.qza')
U_dist_res.write('subject_distance.txt')

I hope that solves your problem, let me know if you have any issues running the above code on your data.

Best,

Cameron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants