Unsupervised Graph Convolutional Clustering on ST data Tutorial

Outline

1. Installation

Download the package from Github and install it locally:

 git clone https://github.com/COHCCC/stGCC
 cd stGCC

Some of packages used in this tutorial might conflict with your currient enviornment, consider installing Anaconda. After installing Anaconda, you can create a new environment, for example, git_gnn (you can change to any name you like).

Two tutorials for anaconda installation:

Anaconda Distribution

How to Successfully Install Anaconda on a Mac

#update anaconda if necessary
#conda update -n base -c defaults conda

#create an environment called git_gnn
conda create -n git_gnn
#activate your environment 
conda activate git_gnn

Example Environment:

System: Anaconda
Python: 3.10.8
Python packages: pandas = 1.5.3, numpy = 1.23.5, tensorflow = 2.11.0, sklearn = 1.2.1, setuptools = 67.1.0, scanpy = 1.1.9

conda install -c conda-forge tensorflow
conda install -c anaconda pandas
conda install -c anaconda scikit-learn
conda install -c conda-forge setuptools
conda install -c bioconda scanpy

*Note: If import errors occur during the execution of the scripts in this tutorial, please use conda install to install the required packages within the environment instead of using pip.

2. Run Experiments

To verify the functionality of the current environment setup, run the model on public dataset citeseer

Example

To adaptively tune the power on Citeseer use

python gcc/tune_power.py --dataset=citeseer

To run the model on Citeseer for power p=5 and have the average execution time

python gcc/run.py --dataset=citeseer --power 5

Parameter list

For run.py

Parameter	Type	Default	Description
`dataset`	string	`citeseer`	Name of the graph dataset (`citeseer`, `FFD1`).
`power`	integer	`5`	First power to test.
`runs`	integer	`20`	Number of runs per power.
`n_clusters`	integer	`0`	Number of clusters (`0` for ground truth).
`max_iter`	integer	`30`	Number of iterations of the algorithm.
`tol`	float	`10e-7`	Tolerance threshold of convergence.

Example output looks like: run example results

*Note: Please note that occasionally running these two command lines within a Jupyter notebook environment may result in a ModuleNotFoundError: No module named 'gcc' error. In such cases, it is recommended to execute them within a terminal instead.

If the error persist, we suggest trying the following:

export PYTHONPATH="${PYTHONPATH}:/Users/ninasong/Desktop/spatialProject/literature_model/graph_convolutional_clustering/unsupervised-GCN" (change the path to your working directory)

For more details of this model, please find the WSDM '22 paper Efficient Graph Convolution for Joint Node Representation Learning and Clustering.github

3. Data preparation

Its the time to generate our own data and use it as the input for this unsupervised DCN model :>

For the step-by-step tutorial with explanation, please refer to: Jupyter Notebook of the tutorial: data preparation

*Note: Files needed to generate FFD1.mat are located in FFD1 folder, make sure when running the notbook, the path to required files are correct.

4: Execution and Visulization

Once get the file FFD1.mat using the tutorial (make sure it is stored in data subdirectory), run the following command line in the terminal:

python gcc/tune_power.py --dataset=FFD1 --max_power=30

python gcc/tune_power.py --dataset FFD1 --max_power 10

Output file will be pred_label_only.csv in /annotation folder. The post processing section will combine the predict label with barcode, form the similar format of tissue_position.csv, thereby enabling its future utilization in visualizations via R studio. For downstream visulation, please refer to: R script: visualizing_GCN-clustering.R

FFD1 GCN trial 1 - label visulation (working on improving the model)

5. Citation

@inproceedings{fettal2022efficient,
  author = {Fettal, Chakib and Labiod, Lazhar and Nadif, Mohamed},
  title = {Efficient Graph Convolution for Joint Node Representation Learning and Clustering},
  year = {2022},
  publisher = {Association for Computing Machinery},
  doi = {10.1145/3488560.3498533},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  pages = {289–297},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Graph Convolutional Clustering on ST data Tutorial

Outline

1. Installation

2. Run Experiments

Example

Parameter list

3. Data preparation

4: Execution and Visulization

5. Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
FFD1		FFD1
data		data
gcc		gcc
plot		plot
tutorial		tutorial
LICENSE		LICENSE
README.md		README.md

License

COHCCC/stGCC

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Graph Convolutional Clustering on ST data Tutorial

Outline

1. Installation

2. Run Experiments

Example

Parameter list

3. Data preparation

4: Execution and Visulization

5. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages