ARIMA predict

25
4/19/2011 1 Time Series Analysis ARIMA Modeling Example: stationary ARIMA

Transcript of ARIMA predict

4/19/2011

1

Time Series Analysis

ARIMA Modeling

Example: stationary ARIMA

4/19/2011

2

Seasonally differenced using a seasonal

harmonic model

Checking residuals for stationarity > adf.test(flu.shgls$residuals, alternative = "stationary")

Augmented Dickey-Fuller Test

data: flu.shgls$residuals

Dickey-Fuller = -4.8395, Lag order = 5, p-value = 0.01

alternative hypothesis: stationary

4/19/2011

3

Choosing a TS model

Model fitting > flu.ar<-arima(flu.shgls$residuals, order=c(2,0,0))

> flu.ar$aic

[1] -349.0261

> flu.ma<-arima(flu.shgls$residuals, order=c(0,0,1))

> flu.ma$aic

[1] -354.4418

> flu.arma<-arima(flu.shgls$resid, order=c(1,0,1))

> flu.arma$aic

[1] -352.8236

> flu.arma2<-arima(flu.shgls$resid, order=c(0,0,2))

> flu.arma2$aic

[1] -352.9634

4/19/2011

4

Cool code for finding the best ARMA best.order<-c(0,0,0)

best.aic<-Inf

for (i in 0:2) for (j in 0:2) {

fit.aic<-AIC(arima(resid(flu.shgls), order=c(i,0,j)))

if (fit.aic < best.aic) {

best.order <- c(i,0,j)

best.arma <- arima(resid(flu.shgls), order=best.order)

best.aic <-fit.aic

}}

Results > best.arma

Series: resid(flu.shgls)

ARIMA(0,0,1) with non-zero mean

Call: arima(x = resid(flu.shgls), order = best.order)

Coefficients:

ma1 intercept

0.7003 0.0001

s.e. 0.0696 0.0091

sigma^2 estimated as 0.003797: log likelihood = 180.22

AIC = -354.44 AICc = -354.25 BIC = -345.79

Results are exactly the same as the flu.arma2 model we ran by

hand!

4/19/2011

5

Is the model adequate?

Is the model adequate?

4/19/2011

6

Is the model adequate? > Box.test(best.arma$resid, lag = 10, type = "Ljung", fitdf=2)

Box-Ljung test

data: best.arma$resid

X-squared = 6.2012, df = 8, p-value = 0.6247

Forecasting

Predict() function can be used to forecast future values from a fitted regression model and ARMA model Sum the two to give a forecast for the overall series

> new.T<-time(ts(start=1979, end=c(1981, 12), fr=12))

> TIME<-(new.T-mean(time(flu)))/sd(time(flu))

> SIN<-COS<-matrix(nr=length(new.T), nc=6)

> for (i in 1:6) {

COS[,i] <-cos(2*pi*i*time(new.T))

SIN[,i] <-sin(2*pi*i*time(new.T)) }

> SIN<-SIN[,-(3:6)]

> COS<-COS[,-(5:6)]

> new.data<-data.frame(TIME=as.vector(TIME), SIN=SIN, COS=COS)

> predict.gls<-predict(flu.shgls, new.data)

> predict.arma<-predict(best.arma, n.ahead=36)

> flu.pred<-ts((predict.gls+predict.arma$pred), st=1979, freq=12)

4/19/2011

7

Forecasting > ts.plot(cbind(flu, flu.pred),lty=1:2, col=1:2)

What are my predicted values? > flu.pred

Jan Feb Mar Apr May Jun Jul

1979 0.3924846 0.3864422 0.2830972 0.2245747 0.1580188 0.1349402 0.1576668

1980 0.4278026 0.3758563 0.2725114 0.2139888 0.1474329 0.1243543 0.1470809

1981 0.4172167 0.3652704 0.2619255 0.2034029 0.1368470 0.1137684 0.1364950

Aug Sep Oct Nov Dec

1979 0.1497663 0.1524658 0.1684255 0.1899788 0.3101741

1980 0.1391804 0.1418799 0.1578396 0.1793929 0.2995882

1981 0.1285945 0.1312940 0.1472537 0.1688070 0.2890023

4/19/2011

8

What is an ARIMA model?

Type of ARMA model that can be used with some kinds of

non-stationary data

Useful for series with stochastic trends

First order or “simple” differencing

Series with deterministic trends should be differenced first

then an ARMA model applied

The “I” in ARIMA stands for integrated, which basically

means you’re differencing

Integrated at the order d (e.g., the dth difference)

Example: non-stationary TS

4/19/2011

9

Logged series

ACF/PACF for a non-stationary TS

4/19/2011

10

One more test… > adf.test(lngas, alternative="stationary")

Augmented Dickey-Fuller Test

data: lngas

Dickey-Fuller = -1.2118, Lag order = 5, p-value = 0.9021

alternative hypothesis: stationary

R code for choosing the best ARIMA model > get.best.arima <- function(x.ts, maxord = c(1,1,1))

{

best.aic <- 1e8

n <- length(x.ts)

for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])

{

fit <- arima(x.ts, order = c(p,d,q))

fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)

if (fit.aic < best.aic)

{

best.aic <- fit.aic

best.fit <- fit

best.model <- c(p,d,q)

}}

list(best.aic, best.fit, best.model)

}

> get.best.arima(lngas, maxord=c(2,2,2))

4/19/2011

11

Results Series: x.ts

ARIMA(0,1,1)

Call: arima(x = x.ts, order = c(p, d, q))

Coefficients:

ma1

0.5243

s.e. 0.0646

sigma^2 estimated as 0.001701: log likelihood = 316.53

AIC = -629.06 AICc = -628.99 BIC = -622.68

BUT, the basic ARIMA output isn’t stored anywhere, so we

can’t “do” anything with it!

Plus, where’s our intercept term?!?

A caution about R

When fitting ARIMA models with R, an intercept term is

NOT included in the model if there is any differencing

We need to force R to do this, using an extra term in the

arima() function

xreg=t or xreg=1:length(lngas)

R will automatically calculate the intercept for us and call

it 1:length(lngas)

http://www.stat.pitt.edu/stoffer/tsa3/Rissues.htm

4/19/2011

12

Correct results > gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))

Series: lngas

ARIMA(0,1,1)

Call: arima(x = lngas, order = c(0, 1, 1), xreg = 1:length(gas))

Coefficients:

ma1 1:length(gas)

0.5181 0.0068

s.e. 0.0654 0.0046

sigma^2 estimated as 0.001681: log likelihood = 317.59

AIC = -629.18 AICc = -629.04 BIC = -619.61

Compare the results > gas.arima1<-arima(lngas, ord=c(0,1,1))

Coefficients:

ma1

0.5243

s.e. 0.0646

AIC = -629.06 AICc = -628.99 BIC = -622.68

> gas.arima2<-arima(diff(lngas), ord=c(0,0,1))

Coefficients:

ma1 intercept

0.5181 0.0068

s.e. 0.0654 0.0046

AIC = -629.18 AICc = -629.04 BIC = -619.61

> gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))

Coefficients:

ma1 1:length(gas)

0.5181 0.0068

s.e. 0.0654 0.0046

AIC = -629.18 AICc = -629.04 BIC = -619.61

4/19/2011

13

Is the model adequate?

Is the model adequate?

4/19/2011

14

Is the model adequate? > tsdiag(gas.arima)

Forecasting > plot(gas, xlim=c(1973,1991))

> nobs=length(gas)

> gas.pred <- predict(gas.arima, n.ahead=36,

newxreg=(nobs+1):(nobs+36))

> lines(exp(gas.pred$pred), col="red")

4/19/2011

15

Results

Example: nonstationary series

4/19/2011

16

ACF/PACF for a nonstationary series

Choose the best ARIMA model > get.best.arima(newGTS, maxord=c(2,2,2))

[[3]]

[1] 2 1 1

> gts.arima<-arima(newGTS, ord=c(2,1,1), xreg=1:length(newGTS))

Series: newGTS

ARIMA(2,1,1)

Coefficients:

ar1 ar2 ma1 1:length(newGTS)

0.5114 0.2923 -1.0000 0.0014

s.e. 0.0464 0.0467 0.0062 0.0002

sigma^2 estimated as 0.007272: log likelihood = 447.61

AIC = -885.22 AICc = -885.08 BIC = -864.89

4/19/2011

17

Check the residuals > tsdiag(gts.arima)

Forecast

> plot(newGTS,xlim=c(1970,2010))

> nobs=length(newGTS)

> gts.pred<-predict(gts.arima, n.ahead=60,

newxreg=(nobs+1):(nobs+60))

> lines(gts.pred$pred,col="red")

4/19/2011

18

Why use ARIMA vs. a linear model?

Detecting the trend with an ARIMA model is implicit

Can’t calculate the exact slope of the trend line

If understanding the dynamics of the trend is important

to your research question, may want to use a linear

model instead

ARIMA models also better for stochastic time series

Additional non-stationary ARIMA models

Seasonal ARIMA (aka SARIMA)

An ARIMA model with an additional seasonal parameter

The seasonal part of an ARIMA model has the same

structure as the non-seasonal part:

Can have an SAR factor, an SMA factor, and/or an order of

differencing

These factors operate across multiples of lag s

e.g., the number of periods in a season

ARIMA(p, d, q)x(P, D, Q)

P=number of seasonal autoregressive terms (SAR)

D=number of seasonal differences

Q=number of seasonal moving average terms (SMA)

4/19/2011

19

Rules for model fitting

The first step is to determine whether or not a seasonal difference is needed, in addition to or perhaps instead of a non-seasonal difference:

Look at residual time series plots/ACF/PACF plots for all possible combinations of 0 or 1 non-seasonal difference and 0 or 1 seasonal difference

Don't use more than ONE seasonal difference or more than TWO total differences (seasonal + non-seasonal)

If the autocorrelation at the seasonal period is positive, consider adding an SAR term to the model

If the autocorrelation at the seasonal period is negative, consider adding an SMA term to the model

Do notmix SAR and SMA terms in the same model, and avoid using more than one of either kind

Example: seasonal time series

4/19/2011

20

ACF shows a strong seasonal component

Seasonal ARIMA > flu.s<-arima(flu, order=c(0,0,0), seas=list(order=c(0,1,0), 12))

MA signature?

acf(flu.s$resid)

pacf(flu.s$resid)

4/19/2011

21

Seasonal ARIMA with an MA term > flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,0), 12))

Remember the rules…

If the autocorrelation at the seasonal period is positive,

consider adding an SAR term to the model

If the autocorrelation at the seasonal period is negative,

consider adding an SMA term to the model

4/19/2011

22

Seasonal ARIMA with MA and SMA terms

> flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,1), 12))

Results > flu.s

Call:

arima(x = flu, order = c(0, 0, 1), seasonal = list(order = c(0,

1, 1), 12))

Coefficients:

ma1 sma1

0.7294 -0.5825

s.e. 0.0678 0.1015

sigma^2 estimated as 0.0047: log likelihood=148.41, aic=-290.82

4/19/2011

23

Prediction > predict<-predict(flu.s, n.ahead=36)

> predict

$pred

Jan Feb Mar Apr May Jun Jul

1979 0.4237531 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864

1980 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864

1981 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864

> ts.plot(cbind(flu, predict$pred), lty=1:2, col=1:2)

> lines(predict$pred+2*predict$se, col="red", lty=3)

> lines(predict$pred-2*predict$se ,col="red", lty=3)

Prediction

4/19/2011

24

Why use a SARIMA vs. a linear model?

SARIMA models deal with seasonality in a more implicit

manner

Can't easily see in the ARIMA output how the average

December, say, differs from the average July

If you want to isolate the seasonal pattern (e.g., know the

difference between December and July), then a linear

model may be better

Also, SARIMA models are really better if the seasonal

pattern is both strong and stable over time

What have we learned?

What will mean CO2 levels be in Jan 2015?

399.25 PPM

What will the increase in CO2 levels be over the next 10 years?

Dec 2010 = 387.27

Dec 2020 = 411.66

24.39 PPM

4/19/2011

25

What have we learned?

Have we seen an epidemic of flu recently?

Jan 1976

Was there a year of particularly low flu rates?

Dec 1969

What have we learned?

Elements of exploratory time series analysis

Time series plots and classical decomposition

Autocovariances and autocorrelations

Stationarity and differencing

Models of time series

Linear models

Moving averages (MA) and autoregressive (AR) processes

Specification/identification of ARMA/ARIMA models

SARIMA models

Estimation/prediction

For linear models and ARIMA models and combination of the two