-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhtml_spotlight_tutorial.Rmd
399 lines (331 loc) · 14.1 KB
/
html_spotlight_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
---
title: "SPOTlight vignette"
author: "Marc Elosua Bayes"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
toc: yes
toc_float: yes
number_sections: yes
df_print: paged
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, out.width = "100%", fig.align='center',
message = FALSE, warning = FALSE)
options(width = 1200)
```
```{r, out.width = "200px", echo = FALSE, fig.align='center'}
knitr::include_graphics("img/SPOTlight_VF2.png")
```
The goal of **SPOTlight** is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X's Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. **SPOTlight** is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.
![Graphical abstract](img/SPOTlight_graphical_abstract.png)
## Installation
You can install the latest stable version from the GitHub repository [SPOTlight](https://github.com/MarcElosua/SPOTlight) with:
```{r eval = FALSE}
# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")
```
Or the latest version in development by downloading the devel branch
```{r eval = FALSE}
devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")
```
## Libraries
```{r }
library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)
```
## Load data
For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded [here](https://github.com/MarcElosua/SPOTlight/blob/master/inst/allen_cortex_dwn.rds) while the spatial data is the one put out publicly by [10X](https://www.10xgenomics.com/resources/datasets/) and the processed object can be downloaded using [SeuratData](https://github.com/satijalab/seurat-data) as shown below.
Load single-cell reference dataset.
```{r}
path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))
```
Load Spatial data
```{r}
if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
# If dataset not downloaded proceed to download it
SeuratData::InstallData("stxBrain")
}
# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")
```
## Pre-processing
```{r}
set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
Seurat::RunPCA(., verbose = FALSE) %>%
Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
```
Visualize the clustering
```{r fig.width=8, fig.height=6}
Seurat::DimPlot(cortex_sc,
group.by = "subclass",
label = TRUE) + Seurat::NoLegend()
```
## Descriptive
```{r}
dplyr::count(subclass) %>%
gt::gt(.[-1, ]) %>%
gt::tab_header(
title = "Cell types present in the reference dataset",
) %>%
gt::cols_label(
subclass = gt::html("Cell Type")
)
```
## Compute marker genes
To determine the most important marker genes we can use the function `Seurat::FindAllMarkers` which will return the markers for each cluster.
```{r eval = FALSE}
Seurat::Idents(object = cortex_sc) <- [email protected]$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc,
assay = "SCT",
slot = "data",
verbose = TRUE,
only.pos = TRUE)
saveRDS(object = cluster_markers_all,
file = here::here("inst/markers_sc.RDS"))
```
```{r echo = FALSE}
cluster_markers_all <- readRDS(file = here::here("inst/markers_sc.RDS"))
```
### SPOTlight Decomposition
```{r SPOTlight, eval = FALSE}
set.seed(123)
spotlight_ls <- spotlight_deconvolution(
se_sc = cortex_sc,
counts_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
cluster_markers = cluster_markers_all, # Dataframe with the marker genes
cl_n = 100, # number of cells per cell type to use
hvg = 3000, # Number of HVG to use
ntop = NULL, # How many of the marker genes to use (by default all)
transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
method = "nsNMF", # Factorization method
min_cont = 0 # Remove those cells contributing to a spot below a certain threshold
)
saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))
```
Read RDS object
```{r}
spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))
nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]
```
### Assess deconvolution
Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
The first thing we can do is look at how specific the topic profiles are for each cell type.
```{r fig.width=10, fig.height=8}
h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod[[2]])
topic_profile_plts[[2]] + ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 12))
```
Next we can take a look at the how the individual topic profiles of each cell within each cell-type behave. \
Here we expect that all the cells from the same cell type show a similar topic profile distribution, if not there might be a bit more substructure in that cluster and we may only be capturing one or the other.
```{r fig.width=20, fig.height=22}
topic_profile_plts[[1]] + theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(size = 12))
```
Lastly we can take a look at which genes are the most important for each topic and therefore get an insight into which genes are driving them.
```{r}
basis_spotlight <- data.frame(NMF::basis(nmf_mod[[1]]))
colnames(basis_spotlight) <- unique(stringr::str_wrap(nmf_mod[[2]], width = 30))
basis_spotlight %>%
dplyr::arrange(desc(Astro)) %>%
round(., 5) %>%
DT::datatable(., filter = "top")
```
## Visualization
Join decomposition with metadata
```{r}
# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)
decon_df <- decon_mtrx %>%
data.frame() %>%
tibble::rownames_to_column("barcodes")
tibble::rownames_to_column("barcodes") %>%
dplyr::left_join(decon_df, by = "barcodes") %>%
tibble::column_to_rownames("barcodes")
```
### Specific cell-types
we can use the standard `Seurat::SpatialFeaturePlot` to view predicted celltype proportions one at a time.
```{r fig.width=20, fig.height=25}
Seurat::SpatialFeaturePlot(
object = anterior,
features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
alpha = c(0.1, 1))
```
### Spatial scatterpies
Plot spot composition of all the spots.
```{r fig.width=8, fig.height=8}
cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
pie_scale = 0.4)
```
Plot spot composition of spots containing cell-types of interest
```{r fig.width=5, fig.height=4}
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
cell_types_interest = "L6b",
pie_scale = 0.8)
```
### Spatial interaction graph
Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function `get_spatial_interaction_graph`, this function prints the plot and returns the elements needed to plot it.
```{r}
graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])
```
If you want to tune how the graph looks you can do the following or you can check out more options [here](https://www.r-graph-gallery.com/network.html):
```{r fig.width=5, fig.height=5}
deg <- degree(graph_ntw, mode="all")
# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance
# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]
# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))
# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)
# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
color = getPalette(length(grad_edge)),
stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
dplyr::left_join(graph_col_df, by = "value") %>%
dplyr::pull(color)
# Open a pdf file
plot(graph_ntw,
# Size of the edge
edge.width = edge_importance,
edge.color = color_edge,
# Size of the buble
vertex.size = deg,
vertex.color = "#cde394",
vertex.frame.color = "white",
vertex.label.color = "black",
vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
layout = layout.circle)
```
Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.
```{r fig.width=12, fig.height=12}
# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]
# Compute correlation
decon_cor <- cor(decon_mtrx_sub)
# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)
# Visualize
ggcorrplot::ggcorrplot(
corr = decon_cor,
p.mat = p.mat[[1]],
hc.order = TRUE,
type = "full",
insig = "blank",
lab = TRUE,
outline.col = "lightgrey",
method = "square",
# colors = c("#4477AA", "white", "#BB4444"))
colors = c("#6D9EC1", "white", "#E46726"),
title = "Predicted cell-cell proportion correlation",
legend.title = "Correlation\n(Pearson)") +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
legend.text = ggplot2::element_text(size = 12),
legend.title = ggplot2::element_text(size = 15),
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
```
## Step-by-Step insight
Here we are going to show step by step what is going on and all the different steps involved in the process.
![SPOTlight scheme](img/SPOTlight_scheme.png)
```{r, include = FALSE}
knitr::opts_chunk$set(eval = FALSE)
```
### Downsample data
If the dataset is very large we want to downsample it, both in terms of number of cells and number of genes, to train the model. To do this downsampling we want to keep a representative amount of cells per cluster and the most important genes. We show that this downsampling doesn't affect the performance of the model and greatly speeds up the model training.
```{r eval = FALSE}
# Downsample scRNAseq to select gene set and number of cells to train the model
se_sc_down <- downsample_se_obj(se_obj = cortex_sc,
clust_vr = "subclass",
cluster_markers = cluster_markers_all,
cl_n = 100,
hvg = 3000)
```
### Train NMF model
Once we have the data ready to pass to the model we can train it as shown below.
```{r eval = FALSE}
start_time <- Sys.time()
nmf_mod_ls <- train_nmf(cluster_markers = cluster_markers_all,
se_sc = se_sc_down,
mtrx_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass",
ntop = NULL,
hvg = 3000,
transf = "uv",
method = "nsNMF")
nmf_mod <- nmf_mod_ls[[1]]
```
Extract matrices form the model:
```{r eval = FALSE}
# get basis matrix W
w <- basis(nmf_mod)
dim(w)
# get coefficient matrix H
h <- coef(nmf_mod)
dim(h)
```
Look at cell-type specific topic profile
```{r eval = FALSE}
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod_ls[[2]]
)
topic_profile_plts[[2]] + theme(axis.text.x = element_text(angle = 90))
```
### Spot Deconvolution
```{r eval = FALSE}
# Extract count matrix
spot_counts <- anterior@assays$Spatial@counts
# Subset to genes used to train the model
spot_counts <- spot_counts[rownames(spot_counts) %in% rownames(w), ]
```
Run spots through the basis to get the pertinent coefficients. To do this for every spot we are going to set up a system of linear equations where we need to find the coefficient, we will use non-negative least squares to determine the best coefficient fit.
```{r eval = FALSE}
ct_topic_profiles <- topic_profile_per_cluster_nmf(h = h,
train_cell_clust = nmf_mod_ls[[2]])
decon_mtrx <- mixture_deconvolution_nmf(nmf_mod = nmf_mod,
mixture_transcriptome = spot_counts,
transf = "uv",
reference_profiles = ct_topic_profiles,
min_cont = 0.09)
````
## Session Info
```{r eval = TRUE}
sessionInfo()
```