-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMia_Dust_Fungi_CCAenvfit.Rmd
281 lines (192 loc) · 6.85 KB
/
Mia_Dust_Fungi_CCAenvfit.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title: "Mia_Dust_Fungi_CCAenvfit"
author: "Nat Pombubpa"
date: "Updated on October 29, 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
###STEP1: Load all necessary packages for analysis
More information about Phyloseq can be found at the following link: [Phyloseq](https://joey711.github.io/phyloseq/)
If you get error in this step, you probably need to install any packages which causes error.
```{r warning=FALSE, message=FALSE}
library(ape)
library(vegan)
library(plyr)
library(dplyr)
library(scales)
library(grid)
library(reshape2)
library(phyloseq)
library(magrittr)
library(ggplot2)
library(ggpubr)
library(data.table)
```
###STEP2: Import Mapping file (metadate file)
1.Check mapping file before import to R, R will automatically change sample name that starts with number or contain “-” in sample name.
2.First column of first row should not start with #, R will not read the first row that starts with '#'
3. You can choose which samples to include in analysis by indicating specific group in the column
```{r warning=FALSE}
meta = read.table("SierraMapIsoF.csv",
header=TRUE,row.names=1,
sep=",",stringsAsFactors=FALSE)
```
###STEP3: Check if your metadata file has been import successfully and correctly
The output will show a table of your metadata file (mapping file).
*If you do not have header, you might start your first row with #
```{r warning=FALSE}
head(meta)
```
###STEP4: Construct sample_data-class using imported metadata
```{r warning=FALSE}
sampleData <- sample_data(meta)
```
###STEP5: Import OTU table
OTU table from Jornada 16S data is “Pietras16S.otu_table.fix.txt”.
```{r warning=FALSE}
otus <- read.table("DustFungi.otu_table.txt",
header=T,sep="\t",row.names = 1)
otumat <- as(as.matrix(otus), "matrix")
OTU = otu_table(otumat, taxa_are_rows = TRUE)
```
```{r}
head(otus)
```
###STEP6: Import taxonomy table
Taxonmy table generated from AMPtk need to be rearranged using following script.
“perl rdp_taxonmy2mat.pl<Input_taxonmy.txt>Output_taxonomy.txt”
rdp_taxonomy2mat.pl was created by Professor Jason E. Stajich
```{r warning=FALSE}
taxmat <- read.table("DustFungi.taxonomy.fix.txt",
header=T,sep="\t",row.names=1)
taxmat <- as(as.matrix(taxmat),"matrix")
TAX = tax_table(taxmat)
```
###STEP7: Construct Phyloseq object
To construct phyloseq object, otu table, taxonomy table, and sampleData are required. Phylogenetic tree can be included, but it is not necessary for constructing phyloseq object.
Construct Phyloseq object called "Physeq"
```{r warning=FALSE}
physeq = phyloseq(OTU,TAX,sampleData)
```
###STEP8: Check phyloseq object
This should indicate that your physeq is a "phyloseq-class experiment-level object"
```{r warning=FALSE}
physeq
```
###STEP9: Remove singletons
Remove any OTUs that present only one time.
```{r }
physeq.prune = prune_taxa(taxa_sums(physeq) > 1, physeq)
```
```{r warning=FALSE}
physeq.prune
```
##STEP10: Plot read counts to check dataset
Check read counts: any samples that have very low reads should be removed.
[Ref](http://evomics.org/wp-content/uploads/2016/01/phyloseq-Lab-01-Answers.html)
```{r}
readcount = data.table(as(sample_data(physeq.prune), "data.frame"),
TotalReads = sample_sums(physeq.prune),
keep.rownames = TRUE)
setnames(readcount, "rn", "SampleID")
SeqDepth = ggplot(readcount, aes(TotalReads)) + geom_histogram() + ggtitle("Sequencing Depth")
```
TotalReads of all the samples can be in this table (select only SampleID and TotalReads columns).
In order to check samples with low number of reads, "order()" can be used to sort "TotalReads" column.
In this dataset, N55.Rhizo has very low number of reads, so will will filter this sample out using the next minimum number of reads.
```{r}
readcount = readcount[order(readcount$TotalReads), c("SampleID", "TotalReads")]
```
```{r}
head(readcount)
```
###STEP11:Rarefy OTUs to a minimum number of reads (OPTIONAL)
Rarefy OTUs (remove any samples that has very low number of reads)
```{r warning=FALSE}
set.seed(711)
physeq.prune.rarefy = rarefy_even_depth(physeq.prune, sample.size = 7325, replace = FALSE, trimOTUs = TRUE)
physeq.prune.rarefy
```
You have successfully imported data to R and completely generate phyloseq object as "physeq.prune.rarefy".
###STEP12: Calculating distance matrix (bray curtis)
```{r}
ps.dist = phyloseq::distance(physeq.prune.rarefy, "bray")
```
###STEP12.1: PERMANOVA on environmental data
```{r}
colnames(meta)
```
```{r}
set.seed(1)
adonis(ps.dist ~Sr+NdEp+Elevation, as(sample_data(physeq.prune.rarefy),"data.frame"))
```
Using "formaula" to indicate environmental factors for distance/ordinate calculation.
```{r}
physeq.prune.rarefy.ps.cca <- ordinate(physeq.prune.rarefy, "CCA",
formula = ~Sr+NdEp+Elevation)
```
###STEP13: Plot ordination
```{r}
plot_ordination(physeq.prune.rarefy, physeq.prune.rarefy.ps.cca, type = "samples",
color = "Site", shape = "Site") + ggtitle("Bacterial Beta Diversity (CCA)") + theme(plot.title = element_text(hjust = 0.5))
```
add ordination plot to a variable "pcca"
```{r}
pcca = plot_ordination(physeq.prune.rarefy, physeq.prune.rarefy.ps.cca, type = "samples",
color = "Site", shape = "Site")
```
###STEP14: Get environmental distance from CCA to arrows
```{r}
arrowdist <- vegan::scores(physeq.prune.rarefy.ps.cca, display = c("bp"))
```
Check data for arrow distance, you should see your environmental factor with distances (CCA1 and CCA2)
```{r}
head(arrowdist)
```
###STEP15: Add arrow distance to data frame
```{r}
arrowdf <- data.frame(labels = rownames(arrowdist), arrowdist)
```
Check data frame for arrow distance
```{r}
head(arrowdf)
```
###STEP16: Define arrow starting point and labels
Defining arrow starting point and coordinates for fitting into CCA plot.
```{r}
arrow_map = aes(xend = CCA1, yend = CCA2, x = 0, y = 0, shape = NULL, color = NULL, label = labels)
label_map = aes(x = 1.1 * CCA1, y = 1.1 * CCA2, shape = NULL, color = NULL, label = labels)
```
###STEP17: Add environemntal arrow data to plot
```{r warning=FALSE}
arrowhead = arrow(length = unit(0.05, "npc"))
pcca.envfit = pcca + geom_segment(arrow_map, size = 0.8, data = arrowdf, color = "gray",
arrow = arrowhead) + geom_text(label_map, size = 3, data = arrowdf) +
ggtitle("Bacterial Beta Diversity (CCA)") + theme(plot.title = element_text(hjust = 0.5))
```
###STEP18: CCA plot with environmental fit
Now, the CCA plot should inculde environmental arrows.
```{r warning=FALSE}
pcca.envfit
```
###Testing with Ordistep
```{r}
metacca = meta[c(4,5,11)]
```
```{r}
head(metacca)
```
```{r}
TOTU = t(OTU)
```
```{r}
ps.cca = cca(TOTU ~ 1, metacca)
```
```{r}
ps.cca1 = cca(TOTU ~ ., metacca)
```
```{r}
ordistep(ps.cca, scope=formula(ps.cca1), perm.max=999)
```