Skip to content

COHCCC/stGCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Graph Convolutional Clustering on ST data Tutorial

Outline

  1. Installation

  2. Run experiments

  3. Data preparation

  4. Execution and Visulization

  5. Citation

1. Installation

Download the package from Github and install it locally:

 git clone https://github.com/COHCCC/stGCC
 cd stGCC

Some of packages used in this tutorial might conflict with your currient enviornment, consider installing Anaconda. After installing Anaconda, you can create a new environment, for example, git_gnn (you can change to any name you like).

Two tutorials for anaconda installation:

Anaconda Distribution

How to Successfully Install Anaconda on a Mac

#update anaconda if necessary
#conda update -n base -c defaults conda

#create an environment called git_gnn
conda create -n git_gnn
#activate your environment 
conda activate git_gnn

Example Environment:

  • System: Anaconda
  • Python: 3.10.8
  • Python packages: pandas = 1.5.3, numpy = 1.23.5, tensorflow = 2.11.0, sklearn = 1.2.1, setuptools = 67.1.0, scanpy = 1.1.9
conda install -c conda-forge tensorflow
conda install -c anaconda pandas
conda install -c anaconda scikit-learn
conda install -c conda-forge setuptools
conda install -c bioconda scanpy

*Note: If import errors occur during the execution of the scripts in this tutorial, please use conda install to install the required packages within the environment instead of using pip.

2. Run Experiments

To verify the functionality of the current environment setup, run the model on public dataset citeseer

Example

To adaptively tune the power on Citeseer use

python gcc/tune_power.py --dataset=citeseer

To run the model on Citeseer for power p=5 and have the average execution time

python gcc/run.py --dataset=citeseer --power 5

Parameter list

For run.py

Parameter Type Default Description
dataset string citeseer Name of the graph dataset (citeseer, FFD1).
power integer 5 First power to test.
runs integer 20 Number of runs per power.
n_clusters integer 0 Number of clusters (0 for ground truth).
max_iter integer 30 Number of iterations of the algorithm.
tol float 10e-7 Tolerance threshold of convergence.

Example output looks like: run example results

*Note: Please note that occasionally running these two command lines within a Jupyter notebook environment may result in a ModuleNotFoundError: No module named 'gcc' error. In such cases, it is recommended to execute them within a terminal instead.

If the error persist, we suggest trying the following:

export PYTHONPATH="${PYTHONPATH}:/Users/ninasong/Desktop/spatialProject/literature_model/graph_convolutional_clustering/unsupervised-GCN" (change the path to your working directory)

For more details of this model, please find the WSDM '22 paper Efficient Graph Convolution for Joint Node Representation Learning and Clustering.github

3. Data preparation

Its the time to generate our own data and use it as the input for this unsupervised DCN model :>

For the step-by-step tutorial with explanation, please refer to: Jupyter Notebook of the tutorial: data preparation

*Note: Files needed to generate FFD1.mat are located in FFD1 folder, make sure when running the notbook, the path to required files are correct.

4: Execution and Visulization

Once get the file FFD1.mat using the tutorial (make sure it is stored in data subdirectory), run the following command line in the terminal:

python gcc/tune_power.py --dataset=FFD1 --max_power=30

python gcc/tune_power.py --dataset FFD1 --max_power 10

Output file will be pred_label_only.csv in /annotation folder. The post processing section will combine the predict label with barcode, form the similar format of tissue_position.csv, thereby enabling its future utilization in visualizations via R studio. For downstream visulation, please refer to: R script: visualizing_GCN-clustering.R

FFD1 GCN trial 1 - label visulation (working on improving the model)

5. Citation

@inproceedings{fettal2022efficient,
  author = {Fettal, Chakib and Labiod, Lazhar and Nadif, Mohamed},
  title = {Efficient Graph Convolution for Joint Node Representation Learning and Clustering},
  year = {2022},
  publisher = {Association for Computing Machinery},
  doi = {10.1145/3488560.3498533},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  pages = {289–297},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published