Forecasting with limited data: Combining ARIMA and diffusion models
ARIMA predict
Transcript of ARIMA predict
4/19/2011
2
Seasonally differenced using a seasonal
harmonic model
Checking residuals for stationarity > adf.test(flu.shgls$residuals, alternative = "stationary")
Augmented Dickey-Fuller Test
data: flu.shgls$residuals
Dickey-Fuller = -4.8395, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
4/19/2011
3
Choosing a TS model
Model fitting > flu.ar<-arima(flu.shgls$residuals, order=c(2,0,0))
> flu.ar$aic
[1] -349.0261
> flu.ma<-arima(flu.shgls$residuals, order=c(0,0,1))
> flu.ma$aic
[1] -354.4418
> flu.arma<-arima(flu.shgls$resid, order=c(1,0,1))
> flu.arma$aic
[1] -352.8236
> flu.arma2<-arima(flu.shgls$resid, order=c(0,0,2))
> flu.arma2$aic
[1] -352.9634
4/19/2011
4
Cool code for finding the best ARMA best.order<-c(0,0,0)
best.aic<-Inf
for (i in 0:2) for (j in 0:2) {
fit.aic<-AIC(arima(resid(flu.shgls), order=c(i,0,j)))
if (fit.aic < best.aic) {
best.order <- c(i,0,j)
best.arma <- arima(resid(flu.shgls), order=best.order)
best.aic <-fit.aic
}}
Results > best.arma
Series: resid(flu.shgls)
ARIMA(0,0,1) with non-zero mean
Call: arima(x = resid(flu.shgls), order = best.order)
Coefficients:
ma1 intercept
0.7003 0.0001
s.e. 0.0696 0.0091
sigma^2 estimated as 0.003797: log likelihood = 180.22
AIC = -354.44 AICc = -354.25 BIC = -345.79
Results are exactly the same as the flu.arma2 model we ran by
hand!
4/19/2011
6
Is the model adequate? > Box.test(best.arma$resid, lag = 10, type = "Ljung", fitdf=2)
Box-Ljung test
data: best.arma$resid
X-squared = 6.2012, df = 8, p-value = 0.6247
Forecasting
Predict() function can be used to forecast future values from a fitted regression model and ARMA model Sum the two to give a forecast for the overall series
> new.T<-time(ts(start=1979, end=c(1981, 12), fr=12))
> TIME<-(new.T-mean(time(flu)))/sd(time(flu))
> SIN<-COS<-matrix(nr=length(new.T), nc=6)
> for (i in 1:6) {
COS[,i] <-cos(2*pi*i*time(new.T))
SIN[,i] <-sin(2*pi*i*time(new.T)) }
> SIN<-SIN[,-(3:6)]
> COS<-COS[,-(5:6)]
> new.data<-data.frame(TIME=as.vector(TIME), SIN=SIN, COS=COS)
> predict.gls<-predict(flu.shgls, new.data)
> predict.arma<-predict(best.arma, n.ahead=36)
> flu.pred<-ts((predict.gls+predict.arma$pred), st=1979, freq=12)
4/19/2011
7
Forecasting > ts.plot(cbind(flu, flu.pred),lty=1:2, col=1:2)
What are my predicted values? > flu.pred
Jan Feb Mar Apr May Jun Jul
1979 0.3924846 0.3864422 0.2830972 0.2245747 0.1580188 0.1349402 0.1576668
1980 0.4278026 0.3758563 0.2725114 0.2139888 0.1474329 0.1243543 0.1470809
1981 0.4172167 0.3652704 0.2619255 0.2034029 0.1368470 0.1137684 0.1364950
Aug Sep Oct Nov Dec
1979 0.1497663 0.1524658 0.1684255 0.1899788 0.3101741
1980 0.1391804 0.1418799 0.1578396 0.1793929 0.2995882
1981 0.1285945 0.1312940 0.1472537 0.1688070 0.2890023
4/19/2011
8
What is an ARIMA model?
Type of ARMA model that can be used with some kinds of
non-stationary data
Useful for series with stochastic trends
First order or “simple” differencing
Series with deterministic trends should be differenced first
then an ARMA model applied
The “I” in ARIMA stands for integrated, which basically
means you’re differencing
Integrated at the order d (e.g., the dth difference)
Example: non-stationary TS
4/19/2011
10
One more test… > adf.test(lngas, alternative="stationary")
Augmented Dickey-Fuller Test
data: lngas
Dickey-Fuller = -1.2118, Lag order = 5, p-value = 0.9021
alternative hypothesis: stationary
R code for choosing the best ARIMA model > get.best.arima <- function(x.ts, maxord = c(1,1,1))
{
best.aic <- 1e8
n <- length(x.ts)
for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])
{
fit <- arima(x.ts, order = c(p,d,q))
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.model <- c(p,d,q)
}}
list(best.aic, best.fit, best.model)
}
> get.best.arima(lngas, maxord=c(2,2,2))
4/19/2011
11
Results Series: x.ts
ARIMA(0,1,1)
Call: arima(x = x.ts, order = c(p, d, q))
Coefficients:
ma1
0.5243
s.e. 0.0646
sigma^2 estimated as 0.001701: log likelihood = 316.53
AIC = -629.06 AICc = -628.99 BIC = -622.68
BUT, the basic ARIMA output isn’t stored anywhere, so we
can’t “do” anything with it!
Plus, where’s our intercept term?!?
A caution about R
When fitting ARIMA models with R, an intercept term is
NOT included in the model if there is any differencing
We need to force R to do this, using an extra term in the
arima() function
xreg=t or xreg=1:length(lngas)
R will automatically calculate the intercept for us and call
it 1:length(lngas)
http://www.stat.pitt.edu/stoffer/tsa3/Rissues.htm
4/19/2011
12
Correct results > gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))
Series: lngas
ARIMA(0,1,1)
Call: arima(x = lngas, order = c(0, 1, 1), xreg = 1:length(gas))
Coefficients:
ma1 1:length(gas)
0.5181 0.0068
s.e. 0.0654 0.0046
sigma^2 estimated as 0.001681: log likelihood = 317.59
AIC = -629.18 AICc = -629.04 BIC = -619.61
Compare the results > gas.arima1<-arima(lngas, ord=c(0,1,1))
Coefficients:
ma1
0.5243
s.e. 0.0646
AIC = -629.06 AICc = -628.99 BIC = -622.68
> gas.arima2<-arima(diff(lngas), ord=c(0,0,1))
Coefficients:
ma1 intercept
0.5181 0.0068
s.e. 0.0654 0.0046
AIC = -629.18 AICc = -629.04 BIC = -619.61
> gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))
Coefficients:
ma1 1:length(gas)
0.5181 0.0068
s.e. 0.0654 0.0046
AIC = -629.18 AICc = -629.04 BIC = -619.61
4/19/2011
14
Is the model adequate? > tsdiag(gas.arima)
Forecasting > plot(gas, xlim=c(1973,1991))
> nobs=length(gas)
> gas.pred <- predict(gas.arima, n.ahead=36,
newxreg=(nobs+1):(nobs+36))
> lines(exp(gas.pred$pred), col="red")
4/19/2011
16
ACF/PACF for a nonstationary series
Choose the best ARIMA model > get.best.arima(newGTS, maxord=c(2,2,2))
[[3]]
[1] 2 1 1
> gts.arima<-arima(newGTS, ord=c(2,1,1), xreg=1:length(newGTS))
Series: newGTS
ARIMA(2,1,1)
Coefficients:
ar1 ar2 ma1 1:length(newGTS)
0.5114 0.2923 -1.0000 0.0014
s.e. 0.0464 0.0467 0.0062 0.0002
sigma^2 estimated as 0.007272: log likelihood = 447.61
AIC = -885.22 AICc = -885.08 BIC = -864.89
4/19/2011
17
Check the residuals > tsdiag(gts.arima)
Forecast
> plot(newGTS,xlim=c(1970,2010))
> nobs=length(newGTS)
> gts.pred<-predict(gts.arima, n.ahead=60,
newxreg=(nobs+1):(nobs+60))
> lines(gts.pred$pred,col="red")
4/19/2011
18
Why use ARIMA vs. a linear model?
Detecting the trend with an ARIMA model is implicit
Can’t calculate the exact slope of the trend line
If understanding the dynamics of the trend is important
to your research question, may want to use a linear
model instead
ARIMA models also better for stochastic time series
Additional non-stationary ARIMA models
Seasonal ARIMA (aka SARIMA)
An ARIMA model with an additional seasonal parameter
The seasonal part of an ARIMA model has the same
structure as the non-seasonal part:
Can have an SAR factor, an SMA factor, and/or an order of
differencing
These factors operate across multiples of lag s
e.g., the number of periods in a season
ARIMA(p, d, q)x(P, D, Q)
P=number of seasonal autoregressive terms (SAR)
D=number of seasonal differences
Q=number of seasonal moving average terms (SMA)
4/19/2011
19
Rules for model fitting
The first step is to determine whether or not a seasonal difference is needed, in addition to or perhaps instead of a non-seasonal difference:
Look at residual time series plots/ACF/PACF plots for all possible combinations of 0 or 1 non-seasonal difference and 0 or 1 seasonal difference
Don't use more than ONE seasonal difference or more than TWO total differences (seasonal + non-seasonal)
If the autocorrelation at the seasonal period is positive, consider adding an SAR term to the model
If the autocorrelation at the seasonal period is negative, consider adding an SMA term to the model
Do notmix SAR and SMA terms in the same model, and avoid using more than one of either kind
Example: seasonal time series
4/19/2011
20
ACF shows a strong seasonal component
Seasonal ARIMA > flu.s<-arima(flu, order=c(0,0,0), seas=list(order=c(0,1,0), 12))
MA signature?
acf(flu.s$resid)
pacf(flu.s$resid)
4/19/2011
21
Seasonal ARIMA with an MA term > flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,0), 12))
Remember the rules…
If the autocorrelation at the seasonal period is positive,
consider adding an SAR term to the model
If the autocorrelation at the seasonal period is negative,
consider adding an SMA term to the model
4/19/2011
22
Seasonal ARIMA with MA and SMA terms
> flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,1), 12))
Results > flu.s
Call:
arima(x = flu, order = c(0, 0, 1), seasonal = list(order = c(0,
1, 1), 12))
Coefficients:
ma1 sma1
0.7294 -0.5825
s.e. 0.0678 0.1015
sigma^2 estimated as 0.0047: log likelihood=148.41, aic=-290.82
4/19/2011
23
Prediction > predict<-predict(flu.s, n.ahead=36)
> predict
$pred
Jan Feb Mar Apr May Jun Jul
1979 0.4237531 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864
1980 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864
1981 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864
> ts.plot(cbind(flu, predict$pred), lty=1:2, col=1:2)
> lines(predict$pred+2*predict$se, col="red", lty=3)
> lines(predict$pred-2*predict$se ,col="red", lty=3)
Prediction
4/19/2011
24
Why use a SARIMA vs. a linear model?
SARIMA models deal with seasonality in a more implicit
manner
Can't easily see in the ARIMA output how the average
December, say, differs from the average July
If you want to isolate the seasonal pattern (e.g., know the
difference between December and July), then a linear
model may be better
Also, SARIMA models are really better if the seasonal
pattern is both strong and stable over time
What have we learned?
What will mean CO2 levels be in Jan 2015?
399.25 PPM
What will the increase in CO2 levels be over the next 10 years?
Dec 2010 = 387.27
Dec 2020 = 411.66
24.39 PPM
4/19/2011
25
What have we learned?
Have we seen an epidemic of flu recently?
Jan 1976
Was there a year of particularly low flu rates?
Dec 1969
What have we learned?
Elements of exploratory time series analysis
Time series plots and classical decomposition
Autocovariances and autocorrelations
Stationarity and differencing
Models of time series
Linear models
Moving averages (MA) and autoregressive (AR) processes
Specification/identification of ARMA/ARIMA models
SARIMA models
Estimation/prediction
For linear models and ARIMA models and combination of the two