69 Introduction to Time Series Analysis in R

Chenqi Wang

69.0.1 Overview

We introduce fundamental time series models including ARMA, GARCH etc. in R and focus on model definitions, process generations and visualizations.

69.0.2 ARMA (Autoregressive-moving-average)

ARMA(p, q): \(y_t=\alpha+\sum_{i = 1}^p\beta_iy_{t-i}+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}\)

where \(y_t\) is the time series, \(\alpha\) is the intercept, \(\beta_i\) and \(\theta_i\) are coefficients, and \(\epsilon_t\) is the white noise.

69.0.2.1 White Noise

\(\epsilon_t\) ~ i.i.d. N(0, \(\sigma^2\)) for some \(\sigma^2>0\).

# generate white noise of length n and s.t.d. b
WN <- function(n, b) {
  x <- rnorm(n, 0, b)
  plot.ts(x, main = paste("white noise of size", n, "and standard deviation", b))
  x
}

e <- WN(500, 1)

69.0.2.2 AR (Autoregressive)

An ARMA model consists of two parts: AR and MA. Let’s first introduce the AR model:

AR(p): \(y_t=\alpha+\sum_{i = 1}^p\beta_iy_{t-i}+\epsilon_t\)

Then we generate an AR(1) model:

# simulation an AR(1) process y_t = b_0 + b_1 * y_{t-1} + epsilon, with data length n
AR1 <- function(b0, b1, b2, n) {
  set.seed(100)
  x <- rnorm(n, 0, b2)
  y <- 0
  y[1] <- b0 / (1 - b1) + x[1] 
  for (t in 2 : n) {
    y[t] <- b0 + b1 * y[t - 1] + x[t]
  }
  
  plot.ts(y, main = paste("y[t] =", b0, "+", b1, "* y[t - 1] + epsilon"))
  y
}

y <- AR1(1, 0.5, 0.2, 500)

An AR(1) model is weakly stationary if \(|\beta_1|<1\), so our generated model \(y_t=1+0.5y_{t-1}+\epsilon\) is weakly stationary. And we can also see this from the above plot: the mean of the data roughly remains the same, which is what weak stationarity implies.

69.0.2.3 PACF (Partial Autocorrelation Function)

Given a time series \(y_t\) generated by an AR(p) model, to determine the value of p, we can use PACF. The lag-k PACF \(\beta_{kk}\) of a weakly stationary time series measures the contribution of adding the term \(y_{t-k}\) over an AR(k - 1) model, which can be estimated by the Ordinary Least Squares (OLS) estimator \(\hat{\beta_{kk}}\) of the AR(k) model \(y_t=\alpha_{k}+\sum_{i = 1}^k\beta_{ki}y_{t-i}+\epsilon_{kt}\).

pacf(y)

The above plot indicates that lag \(k\geq 2\) has insignificant contribution, so AR(1) model is enough.

69.0.2.4 ACF (Autocorrelation Function)

lag-k ACf \(\gamma(k):=cov(y_t, y_{t+k})\). Assume the time series \(y_t\) is weakly stationary, then \(cov(y_t, y_{t+k})\) only depends on k.

ACF can also be used to determine the value of p in AR(p) model.

acf(y)

From the above plot we observe that \(y_t\) and \(y_{t+k},k\geq 2\) has insignificant correlation, so again AR(1) is enough.

69.0.2.5 Unit Root Process

A random walk \(y_t=y_{t-1}+\epsilon_t\) is an unit root process, which is not stationary and contains stochastic trend.

# generate the unit root process y[t] = y[t-1] + epsilon
UR <- function(n) {
  set.seed(100)
  x <- rnorm(n, 0, 1)
  y <- 0
  y[1] <- x[1]
  for (t in 2 : n) {
    y[t] <- y[t - 1] + x[t]
  }
  
  plot.ts(y, main = paste("y[t] = y[t - 1] + epsilon"))
  y
}

y1 <- UR(500)

The stochastic trend can be observed in the above plot. Generally the mean of the data goes down, so the time series is not stationary.

69.0.2.6 MA (Moving-average)

MA(q): \(y_t=\alpha+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}\)

Generate a MA(1) model:

# generate MA(1) model y[t] = a + e[t] + r * e[t-1] with data length n
MA1 <- function(a, r, n) {
  set.seed(100)
  e <- rnorm(n, 0, 0.2)
  y <- 0
  y[1] <- a + e[1]
  for (t in 2 : n) {
    y[t] <- a + e[t] + r * e[t - 1]
  }
  
  plot.ts(y, main = paste("y[t] =", a, "+ e[t] +", r, "* e[t-1]"))
  y
}

y2 <- MA1(1.2, 0.3, 500)

For MA(q) model, \(\forall k>q\) ACF \(\gamma(k)=0\). So it’s easy to ACF to determine q.

acf(y2)

We can observe from the above plot that q should be 1.

69.0.2.7 ARIMA (Autoregressive-integrated-moving-average)

Define the lag operator \(Ly_t:=y_{t-1}\), and \((1-L)^d:=(1-L)[(1-L)^{d-1}]\) (i.e. d order difference). If \((1-L)^{d-1}y_t\) is not weakly stationary but \((1-L)^dy_t\) is weakly stationary for \(d>0\), and \((1-L)^dy_t\) is an ARMA(p, q) process, then \(y_t\) is called an ARIMA(p, d, q) process.

We can use arima() in R to estimate an ARIMA model:

arima(y2, order = c(0, 0, 1), method = "ML") # MA(1)

## 
## Call:
## arima(x = y2, order = c(0, 0, 1), method = "ML")
## 
## Coefficients:
##          ma1  intercept
##       0.2244     1.1903
## s.e.  0.0469     0.0110
## 
## sigma^2 estimated as 0.0401:  log likelihood = 94.63,  aic = -183.25

69.0.3 GARCH (Generalized Autoregressive Conditional Heteroskedasticity)

AR(1) - GARCH(s, m):

\(y_t=\phi_0+\phi_{1}y_{t-1}+\epsilon_t\)

\(\epsilon_t = \sigma_t\eta_t\)

\(\sigma_t^2=\alpha_0+\sum_{i=1}^s\beta_i\sigma_{t-i}^2+\sum_{i=1}^m\alpha_i\epsilon_{t-i}^2\)

where \(var(\epsilon_t|y_{t-1})=\sigma^2(y_{t-1})\) (conditional heteroskedasticity)

GARCH models the second moment information of a time series, while ARIMA models only the level of it.

# AR(1) - GARCH(1, 1) y[t] = a0 + a1 * y[t - 1] + v * epsilon, v[t]^2 = b0 + b1 + v[t-1]^2 + b2e^2, data length n
GARCH <- function(a0, a1, b0, b1, b2, n) {
  set.seed(1000)
  n = n + 500
  x <- rnorm(n, 0, 1) 
  y = 0
  v = 0
  a = 0
  v[1] = b0 / (1 - b1 - b2)
  a[1] = sqrt(v[1]) * x[1]
  y[1] = a0 / (1 - a1) + a[1]
  for (t in 2 : n) {
    v[t] = b0 + b1 * v[t - 1] + b2 * a[t - 1] ^ 2
    a[t] = sqrt(v[t]) * x[t]
    y[t] = a0 + a1 * y[t - 1] + a[t]
  }
  y = y[-(1:500)]
  v = v[-(1:500)]
  plot.ts(y, main = paste("y[t] =", a0, "+", a1, "* y[t - 1] + epsilon"))
  z = cbind(v, v)
  z
}

y3 <- GARCH(1, 0.5, 0.02, 0.6, 0.3, 500)

68 Mtemplates

70 Storytelling with Data