generated from jtr13/cctemplate
-
Notifications
You must be signed in to change notification settings - Fork 139
/
Copy pathintroduction_to_time_series_analysis_in_r.Rmd
195 lines (134 loc) · 5.41 KB
/
introduction_to_time_series_analysis_in_r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
# Introduction to Time Series Analysis in R
Chenqi Wang
```{r, include=FALSE}
# this prevents package loading message from appearing in the rendered html file
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```
### Overview
We introduce fundamental time series models including ARMA, GARCH etc. in R and focus on model definitions, process generations and visualizations.
### ARMA (Autoregressive-moving-average)
ARMA(p, q): $y_t=\alpha+\sum_{i = 1}^p\beta_iy_{t-i}+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}$
where $y_t$ is the time series, $\alpha$ is the intercept, $\beta_i$ and $\theta_i$ are coefficients, and $\epsilon_t$ is the white noise.
#### White Noise
$\epsilon_t$ ~ i.i.d. N(0, $\sigma^2$) for some $\sigma^2>0$.
```{r}
# generate white noise of length n and s.t.d. b
WN <- function(n, b) {
x <- rnorm(n, 0, b)
plot.ts(x, main = paste("white noise of size", n, "and standard deviation", b))
x
}
e <- WN(500, 1)
```
#### AR (Autoregressive)
An ARMA model consists of two parts: AR and MA. Let's first introduce the AR model:
AR(p): $y_t=\alpha+\sum_{i = 1}^p\beta_iy_{t-i}+\epsilon_t$
Then we generate an AR(1) model:
```{r}
# simulation an AR(1) process y_t = b_0 + b_1 * y_{t-1} + epsilon, with data length n
AR1 <- function(b0, b1, b2, n) {
set.seed(100)
x <- rnorm(n, 0, b2)
y <- 0
y[1] <- b0 / (1 - b1) + x[1]
for (t in 2 : n) {
y[t] <- b0 + b1 * y[t - 1] + x[t]
}
plot.ts(y, main = paste("y[t] =", b0, "+", b1, "* y[t - 1] + epsilon"))
y
}
y <- AR1(1, 0.5, 0.2, 500)
```
An AR(1) model is weakly stationary if $|\beta_1|<1$, so our generated model $y_t=1+0.5y_{t-1}+\epsilon$ is weakly stationary. And we can also see this from the above plot: the mean of the data roughly remains the same, which is what weak stationarity implies.
#### PACF (Partial Autocorrelation Function)
Given a time series $y_t$ generated by an AR(p) model, to determine the value of p, we can use PACF. The lag-k PACF $\beta_{kk}$ of a weakly stationary time series measures the contribution of adding the term $y_{t-k}$ over an AR(k - 1) model, which can be estimated by the Ordinary Least Squares (OLS) estimator $\hat{\beta_{kk}}$ of the AR(k) model $y_t=\alpha_{k}+\sum_{i = 1}^k\beta_{ki}y_{t-i}+\epsilon_{kt}$.
```{r}
pacf(y)
```
The above plot indicates that lag $k\geq 2$ has insignificant contribution, so AR(1) model is enough.
#### ACF (Autocorrelation Function)
lag-k ACf $\gamma(k):=cov(y_t, y_{t+k})$. Assume the time series $y_t$ is weakly stationary, then $cov(y_t, y_{t+k})$ only depends on k.
ACF can also be used to determine the value of p in AR(p) model.
```{r}
acf(y)
```
From the above plot we observe that $y_t$ and $y_{t+k},k\geq 2$ has insignificant correlation, so again AR(1) is enough.
#### Unit Root Process
A random walk $y_t=y_{t-1}+\epsilon_t$ is an unit root process, which is not stationary and contains stochastic trend.
```{r}
# generate the unit root process y[t] = y[t-1] + epsilon
UR <- function(n) {
set.seed(100)
x <- rnorm(n, 0, 1)
y <- 0
y[1] <- x[1]
for (t in 2 : n) {
y[t] <- y[t - 1] + x[t]
}
plot.ts(y, main = paste("y[t] = y[t - 1] + epsilon"))
y
}
y1 <- UR(500)
```
The stochastic trend can be observed in the above plot. Generally the mean of the data goes down, so the time series is not stationary.
#### MA (Moving-average)
MA(q): $y_t=\alpha+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}$
Generate a MA(1) model:
```{r}
# generate MA(1) model y[t] = a + e[t] + r * e[t-1] with data length n
MA1 <- function(a, r, n) {
set.seed(100)
e <- rnorm(n, 0, 0.2)
y <- 0
y[1] <- a + e[1]
for (t in 2 : n) {
y[t] <- a + e[t] + r * e[t - 1]
}
plot.ts(y, main = paste("y[t] =", a, "+ e[t] +", r, "* e[t-1]"))
y
}
y2 <- MA1(1.2, 0.3, 500)
```
For MA(q) model, $\forall k>q$ ACF $\gamma(k)=0$. So it's easy to ACF to determine q.
```{r}
acf(y2)
```
We can observe from the above plot that q should be 1.
#### ARIMA (Autoregressive-integrated-moving-average)
Define the lag operator $Ly_t:=y_{t-1}$, and $(1-L)^d:=(1-L)[(1-L)^{d-1}]$ (i.e. d order difference). If $(1-L)^{d-1}y_t$ is not weakly stationary but $(1-L)^dy_t$ is weakly stationary for $d>0$, and $(1-L)^dy_t$ is an ARMA(p, q) process, then $y_t$ is called an ARIMA(p, d, q) process.
We can use arima() in R to estimate an ARIMA model:
```{r}
arima(y2, order = c(0, 0, 1), method = "ML") # MA(1)
```
### GARCH (Generalized Autoregressive Conditional Heteroskedasticity)
AR(1) - GARCH(s, m):
$y_t=\phi_0+\phi_{1}y_{t-1}+\epsilon_t$
$\epsilon_t = \sigma_t\eta_t$
$\sigma_t^2=\alpha_0+\sum_{i=1}^s\beta_i\sigma_{t-i}^2+\sum_{i=1}^m\alpha_i\epsilon_{t-i}^2$
where $var(\epsilon_t|y_{t-1})=\sigma^2(y_{t-1})$ (conditional heteroskedasticity)
GARCH models the second moment information of a time series, while ARIMA models only the level of it.
```{r}
# AR(1) - GARCH(1, 1) y[t] = a0 + a1 * y[t - 1] + v * epsilon, v[t]^2 = b0 + b1 + v[t-1]^2 + b2e^2, data length n
GARCH <- function(a0, a1, b0, b1, b2, n) {
set.seed(1000)
n = n + 500
x <- rnorm(n, 0, 1)
y = 0
v = 0
a = 0
v[1] = b0 / (1 - b1 - b2)
a[1] = sqrt(v[1]) * x[1]
y[1] = a0 / (1 - a1) + a[1]
for (t in 2 : n) {
v[t] = b0 + b1 * v[t - 1] + b2 * a[t - 1] ^ 2
a[t] = sqrt(v[t]) * x[t]
y[t] = a0 + a1 * y[t - 1] + a[t]
}
y = y[-(1:500)]
v = v[-(1:500)]
plot.ts(y, main = paste("y[t] =", a0, "+", a1, "* y[t - 1] + epsilon"))
z = cbind(v, v)
z
}
y3 <- GARCH(1, 0.5, 0.02, 0.6, 0.3, 500)
```