A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.
The MuVI
class is the main entry point for loading the data and performing the inference:
import numpy as np
import pandas as pd
import anndata as ad
import mudata as md
import muvi
# Load processed input data (missing values are allowed)
# Matrix of dimensions n_samples x n_rna_features
rna_df = pd.read_csv(...)
# Matrix of dimensions n_samples x n_prot_features
prot_df = pd.read_csv(...)
# Load prior feature sets, e.g. gene sets
gene_sets = muvi.fs.from_gmt(...)
# Binary matrix of dimensions n_gene_sets x n_rna_features
gene_sets_mask = gene_sets.to_mask(rna_df.columns)
# Create a MuVI object by passing both input data and prior information
model = muvi.MuVI(
observations={"rna": rna_df, "prot": prot_df},
prior_masks={"rna": gene_sets_mask},
...
device=device,
)
# Alternatively, create a MuVI model from AnnData (single-view)
rna_adata = ad.AnnData(rna_df, dtype=np.float32)
rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T
model = muvi.tl.from_adata(
adata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Alternatively, create a MuVI model from MuData (multi-view)
mdata = md.MuData({"rna": rna_adata, "prot": prot_adata})
model = muvi.tl.mdata(
mdata,
prior_mask_key="gene_sets_mask",
...,
device=device
)
# Fit the model for a given number of training epochs
model.fit(batch_size, n_epochs, ...)
# Continue with the downstream analysis (see below)
The package consists of three additional submodules for analysing the results post-training:
muvi.tl
provides tools for downstream analysis, e.g.,- compute
muvi.tl.variance_explained
across all factors and views muvi.tl.test
the significance between the prior feature sets and the inferred factors- apply clustering on the latent space such as
muvi.tl.leiden
muvi.tl.save
the model in order tomuvi.tl.load
it at a later point in time
- compute
muvi.pl
works in tandem withmuvi.tl
by providing visualization methods such asmuvi.pl.variance_explained
(see above)- plotting the latent space via
muvi.pl.tsne
,muvi.pl.scatter
ormuvi.pl.stripplot
- investigating factors in terms of their inferred loadings with
muvi.pl.inspect_factor
muvi.fs
serves the data structure and methods for loading, processing and storing the prior information from feature sets
Check out our basic tutorial to get familiar with MuVI
, or jump straight to a single-cell multiome analysis!
R
users can readily export a trained MuVI
model into R
with a single line of code and resume the analysis with the MOFA2
package.
muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True)
See this vignette for more details!
We suggest using conda to manage your environments, and pip to install muvi
as a python package. Follow these steps to get muvi
up and running!
- Create a python environment in
conda
:
conda create -n muvi python=3.10
- Activate freshly created environment:
source activate muvi
- Install
muvi
withpip
:
python3 -m pip install muvi
- Alternatively, install the latest version with
pip
:
python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
Make sure to install a GPU version of PyTorch to significantly speed up the inference.
If you use MuVI
in your work, please use this BibTeX entry:
Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity
Arber Qoku and Florian Buettner
International Conference on Artificial Intelligence and Statistics (AISTATS) 2023