forked from cox-labs/coxdocs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhierarchicalcluster.qmd
132 lines (89 loc) · 3.87 KB
/
hierarchicalcluster.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "Hierarchical clustering"
author: "Cox Lab"
format:
html:
toc: true
toc-depth: 4
toc-expand: false
number-sections: true
number-depth: 4
editor: source
date: today
bibliography: references.bib
---
# General
- **Type:** - Matrix Analysis
- **Heading:** - Clustering/PCA
- **Source code:** not public.
# Brief description
This activity performs hierarchical clustering of rows and/or columns and produces a visual heat map representation of the clustered matrix. Clustering can be performed with a choice of distances and linkages. This activity can also be used just to display your data in a heat map without performing clustering by deselecting row and column clustering.
```{=html}
<!-- This comment and the line above it must be preserved when editing this file!
The recommended sections are these, but they may be changed on a case by case basis.
===== Detailed description =====
===== Parameters =====
===== Theoretical background =====
===== Examples =====
Make changes only below this line! -->
```
# Parameters
## Row tree
If checked rows will be clustered and a tree (dendrogram) is generated (default: checked).
### Distance
Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:
- Euclidean
- L1
- Maximum
- Lp
- Pearson correlation
- Spearman correlation
- Cosine
- Canberra
### Linkage
Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:
- Average
- Complete
- Single
### Constraint
Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:
- None
- Preserve order
- Preserve order (periodic)
### Preprocess with k-means
Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).
### Number of clusters
This parameter is just relevant, if the parameter "Preprocess with k-means" is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).
## Column tree
If checked, columns will be clustered and a tree (dendrogram) is generated (default: checked).
### Distance
Selected distance that will be used for the clustering process (default: Euclidean). The distance can be selected from a predefined list:
- Euclidean
- L1
- Maximum
- Lp
- Pearson correlation
- Spearman correlation
### Linkage
Selected clustering method that will be applied (default: Average). It can be selected from a predefined list:
- Average
- Complete
- Single
### Constraint
Selected constraint that should be preserved from the input data (default: None). The used constraint can be selected from a predefined list of constraints:
- None
- Preserve order
- Preserve order (periodic)
- Preserve grouping
### Preprocess with k-means
Specifies, whether the data should be pre-processed using k-means before applying clustering and generating a heatmap (default: checked).
### Number of clusters
This parameter is just relevant, if the parameter "Preprocess with k-means" is checked. Defines the number of clusters that will be created by the k-means algorithm (default: 300).
## Which columns to use
List of all expression/numerical columns in the data set (default: all numerical columns; the expression columns are selected see parameter "Use for clustering").
## Use for clustering
Selected expression/numerical columns that should be used for the clustering (default: all expression columns are selected).
## Display in heat map but do not use for clustering
Selected expression/numerical columns that should be displayed in the output heat map, but are not used for the clustering (default: empty).
# Parameter window
![](images/clustering-pca-hierachical_clustering-edited.png)