-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distance matrix of tensors? #108
Comments
Hi @jonbakerlab, Thanks for the kind words and for using gemelli! It is awesome to hear that Phylo-CTF is helping extract signals in the data. My preferred way is to split the distance matrix by timepoint/site and test for sig. differences by phenotype between subjects per time/site and then on aggregate. However, you are the second person to ask this question this month and what you are asking for makes sense, especially in the context of multiple body sites, which is why I am flagging this as a requested feature. I will get this output included in the next version update of the tool. In the meantime, you can generate it yourself. It will involve a bit of code but here is an example of how to calculate it using QIIME2 API IBD Example that should hopefully be easy to apply to your data. This will give a per subject distance (not per sample as is the default): First run Phylo-CTF like the tutorial (you probably already did this step)import os
import warnings
import qiime2 as q2
from qiime2.plugins.gemelli.actions import phylogenetic_ctf_with_taxonomy
from qiime2.plugins.fragment_insertion.methods import filter_features
# hide pandas Future/Deprecation Warning(s) for tutorial
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
# import table(s)
table = q2.Artifact.load('IBD-2538/data/table.qza')
rarefied_table = q2.Artifact.load('IBD-2538/data/rarefied-table.qza')
# import metadata
metadata = q2.Metadata.load('IBD-2538/data/metadata.tsv')
# import tree
tree = q2.Artifact.load('IBD-2538/data/sepp-insertion-tree.qza')
# import taxonomy
taxonomy = q2.Artifact.load('IBD-2538/data/taxonomy.qza')
# make a dir. for results
os.mkdir('IBD-2538/ctf-results')
# first ensure all the table features are in the tree
table = filter_features(table, tree).filtered_table
# now run phylo-CTF
ctf_results = phylogenetic_ctf_with_taxonomy(table, tree, taxonomy.view(q2.Metadata),
metadata, 'host_subject_id', 'timepoint',
min_feature_frequency=10)
# write all the artifacts to file by name
for art_tmp_name_, art_tmp_ in ctf_results._asdict().items():
art_tmp_.save('IBD-2538/ctf-results/%s.qza' % (art_tmp_name_)) Now make a subject-level distance (your likely starting point)import qiime2 as q2
from scipy.spatial import distance
from skbio import OrdinationResults, DistanceMatrix
# import and extract subject ordination (If not in QIIME2 don't need to extract just load as skbio type)
q2_subject_biplot = q2.Artifact.load('IBD-2538/ctf-results/subject_biplot.qza')
sk_subject_biplot = q2_subject_biplot.view(OrdinationResults)
# make subject distance
subject_U = sk_subject_biplot.samples.copy()
Udist = distance.cdist(subject_U.copy(),
subject_U.copy())
U_dist_res = DistanceMatrix(Udist, ids=subject_U.index)
U_dist_res_q2 = q2.Artifact.import_data('DistanceMatrix', U_dist_res)
# write the distances in skbio and QIIME2 formats
U_dist_res_q2.save('subject_distance.qza')
U_dist_res.write('subject_distance.txt') I hope that solves your problem, let me know if you have any issues running the above code on your data. Best, Cameron |
Hey Cameron, thanks for the great suite of tools! We are using
gemelli
and have a question regarding testing the Phylo CTF output for significance. We have a dataset that is different sites from subjects (rather than longitudinal data), and applying Phylo CTF does give better separation on a biplot of our healthy and disease subjects than the individual samples did (i.e., biplots of various beta diversity metrics). It would seem to me that we should do some sort of statistical test to see if our healthy and disease groups are truly different based on data represented in the Phylo CTF subject biplot (e.g., a PERMANOVA). However, the 'distance_matrix.qza' file output byqiime gemelli phylogenetic-ctf-with-taxonomy
contains the distances of all the samples, rather than distances of the subject tensors, so using it as input forqiime diversity beta-group-significance
would seem to give the same result as the data before going through Phylo CTF...? Is there an actual output from Phylo CTF that we could use as input for a PERMANOVA, or how should we test for significance? Doesn't seem like any of the output fromqiime gemelli phylogenetic-ctf-with-taxonomy
would be usable for that? Thanks for your help!The text was updated successfully, but these errors were encountered: