On least-squares estimation of the residual variance in the first-order moving average model

Computational Statistics & Data Analysis 29 (1999) 485–499

On least-squares estimation of theresidual variance in the �rst-order

moving average model

R.P. Mentz a;∗, P.A. Morettin b, C.M.C. Toloi b

aUniversity of Tucuman and CONICET, ArgentinabUniversity of Sao Paulo, Brazil

Received 1 November 1997; revised 1 October 1998

Abstract

In the �rst-order moving average model we analyze the behavior of the estimator of the vari-ance of the random residual coming from the method of least squares. This procedure is incorpo-rated into some widely used computer programs. We show through simulations that the asymptoticformulas for the bias and variance of the maximum likelihood estimator, can be used as approx-imations for the least-squares estimator, at least when the model parameter is far from the regionof non-invertibility. Asymptotic results are developed using the “long autoregression” idea, and thisleads to a closed-form expression for the least-squares estimator. In turn this is compared with themaximum likelihood estimator under normality, both in its exact and in an approximated version,which is obtained by approximating the matrix in the exponent of the Gaussian likelihood function.This comparison is illustrated by some numerical examples. The dependency of the results about bi-ases on the values of the model parameter is emphasized. c© 1999 Elsevier Science B.V. All rightsreserved.

Keywords: Moving average model; Residual variance estimation; Least squares; Asymptotic bias;Asymptotic mean square error

∗ Correspondence address. Facultad de Ciencias Economicas, Inst. de Investigaciones Estadisticas,Universidad Nacional de Tucuman, Casilla de Correo 209, 4000 Tucuman, Argentina.

0167-9473/99/$ – see front matter c© 1999 Elsevier Science B.V. All rights reserved.PII: S0167-9473(98)00080-2

486 R.P. Mentz et al. / Computational Statistics & Data Analysis 29 (1999) 485–499

1. Introduction

1.1. The model

The �rst-order moving average time series model, denoted MA(1), is de�ned by

Xt − �= �t + ��t−1; t= : : : ;−1; 0; 1; : : : ; (1.1)

where Xt is the observable time series, �t is a white noise residual with zero meanand constant variance �2, and �; � and �2 are parameters, 0¡�2¡∞. We call �2the residual variance of the process. Further speci�cation of the �t will be made inSection 1.2.The process is stationary for any choice of �. If � is less than one in absolute

value (1:1) can be inverted into an in�nite autoregression,

�t =∞∑s=0

(−�)s(Xt−s − �); t= : : : ;−1; 0; 1; : : : : (1.2)

The covariance sequence of the process is 0 = �2(1 + �2), 1 = −1 = �2�, and j=0 for |j|¿1. The correlation sequence in turn has �1 = �(1 + �2)−1 and �j=0for |j|¿1.Moving average models in general are important tools both in theoretical and

empirical time-series analyses. They enter into the de�nition of ARMA models forstationary time series and ARIMA models for some non-stationary ones (Box andJenkins, 1970, 3rd ed. of 1994), and these proved to be important empirical tools, aswitnessed by the applications arising in a variety of �elds. Some of the di�cultiesencountered in the statistical study of these models are due to the nature of the MApart, for example in that it postulates a linear combination of unobservable randomvariables. The MA(1) is the simplest case of these models and as will be shownbelow, serves to exhibit some of the di�culties.

1.2. Objectives of this paper

For purposes of inference we consider a sample X1; : : : ; XT from Eq. (1.1). Esti-mation procedures frequently used are the methods of moments (MM), least squares(LS) and maximum likelihood (ML) under normality. Some authors have also con-sidered estimation from frequency domain considerations. We shall not study MMin this paper (see Mentz et al., 1997) and ML will be only brie y considered inSection 3.1.The method of LS is not particularly suited for MA models. If we interpret the

method as requiring to minimize∑�2t with respect to � and �, then substitution

of �t by Eq. (1.2) will “contain an in�nite number of terms so that we cannotexpress (the sum) as a �nite linear function of the observations” (Priestley, 1981,Section 5.4.2). However, LSE is frequently considered in practice, in treatises ontime series and in many standard computer programs commercially available. Inour case we used BMDP Statistical Software (BMDP, 1990), ITSM (Brockwell and

R.P. Mentz et al. / Computational Statistics & Data Analysis 29 (1999) 485–499 487

Davis, 1991b) and SCA (Scienti�c Computing Associates, 1993). These programsconsider the estimation of parameters like � and � in Eq. (1.1) by minimizing aproperly de�ned sum of squares, as we shall see in Section 2. An estimate of �2 isthen de�ned as proportional to the minimized sum of squares. We call this the LSestimator of �2 and denote it by �2LS.The main objectives of this paper are: (1) to show through simulations that the

asymptotic formulas for the bias and variance of the MLE estimator of �2 applyalso to the LSE, at least for values of � far from the region of non-invertibility;(2) to derive an approximate expression for the bias of the LSE estimator; (3) toderive a closed-form approximation of the LSE, and (4) to relate this approximationto a well-known approximation used in MLE, and to illustrate its behavior withsimulations.Di�erent assumptions are made for the �t: to deal with (2) and (3) above we

assume that they are independent identically distributed (0; �2); to deal with (1) and(4) we add that they are normal. The latter is used to de�ne the likelihood functionand hence to compare the LSE with the MLE. It follows that our simulations aremade with pseudonormal random numbers.It is relevant to ask under what assumptions can an asymptotic theory for LSE

in the MA(1) model be developed. Existing theorems for ARMA models (see e.g.Brockwell and Davis, 1991, Ch. 8 and 10, or Fuller, 1996, Ch. 8) give conditionsfor the asymptotic theory of MLE under which the LSE has the same properties.A question is whether the LSE are consistent under more general conditions thanthe MLE. Another question concerns robustness, whether the LSE is more robustthan the MLE to deviations from normality, since it is not based on the Gaussianlikelihood. We shall not deal with these questions here.We pay considerable attention to bias. It has been argued (Simono�, 1993) that

in the estimation of the variance of a statistic, attention should concentrate on thebias: a misleadingly optimistic result is obtained from a negatively biased estimator.In spite of this emphasis, we also consider the variance of the variance estimator,its standard error and mean square error.In spite of its inferential role, not many papers have been written about large-

sample biases and variances of residual variance estimator in MA models; for MMand ML see the authors’ paper of 1997. There are more known results about ARmodels, some of these, as well as some new results, are presented in the authors’paper of 1998. For an interesting discussion see Harvey (1993, Ch. 3).

1.3. Asymptotic results for sample covariances and correlations

We can estimate the covariances j of the model by

cj=1T

T−j∑t=1

(Xt − �X )(Xt+j − �X )= c−j; j=0; 1; : : : ; T − 1: (1.3)

Other estimators are considered in the literature, for example, by changing inEq. (1.3) the denominator, the value to be subtracted from the X ’s, or the range


of the sums; see, for example, Anderson (1971) or Fuller (1996). We use Eq. (1.3)because for T¿p, a covariance matrix with elements c|i−j| is positive de�nite, aproperty that we shall use below. With these estimators of the covariances we formthe estimators of the correlations �j,

rj= cj=c0; j=1; 2; : : : ; T − 1: (1.4)

From the above indicated sources we deduce large-sample moments of these estima-tors. They are derived in the general case of a linear model Yt =

∑∞−∞ �jet−j, under

some conditions on the random variables et and coe�cients �j; Eq. (1.1) is a linearmodel. For further reference we note that, with r= r1,

E(c0 − 0)≈ − �2

T(1 + �)2; E(c1 − 1)≈ − �2

T(1 + 3�+ �2);

E(c0 − 0)2≈ 2�4

T(1 + 4�2 + �4); E(c1 − 1)2≈ �

4

T(1 + 5�2 + �4);

E(c0 − 0)(c1 − 1)≈ 4�4

T�(1 + �2); E(r − �)2≈ 1 + �

2 + 4�4 + �6 + �8

T (1 + �2)4;

(1.5)

E(r − �)≈ − 1 + 4�+ �2 + 4�3 + �4 + 4�5 + �6

T (1 + �2)3:

These expressions have errors o(1=T ).

2. Estimation by least squares

2.1. Estimation of the coe�cients and residual variance

One procedure that is often used in practice, is to consider that the LS criterionfunction is the quadratic form appearing in the exponent of the Gaussian likelihoodfunction. This function will be written down in Section 3.1. It is often preferredto use an alternative expression using elements of the minimum mean square errorprediction (MMSE) of a time series. Letting now Xj denote a zero mean MA(1)(in our case we can take the mean-corrected values to approximate them), we take

S(�)=T∑j=1

(Xj − Xj)2rj−1

(2.1)

as the criterion, where Xj is the MMSE predictor of Xj and E(Xj − Xj)2 = �2rj−1;these quantities are de�ned recursively by

r0 = 1 + �2; rj+1 = 1 + �2 − �2

rj; j=0; 1; 2; : : : ;

X1 = 0; Xj+1 =�(Xj − Xj)rj−1

; j=1; 2; 3; : : : :(2.2)

See, for example, Brockwell and Davis (1991a) that discuss this theory in detail.In their analysis they restrict attention to |�|¡1, in which case the estimators arising


from the minimization of Eq. (2.1) have the same asymptotic properties of the MLEprocedure. They suggest using �2LS = S(�LS)=(T − 1), which is incorporated in thesoftware ITSM.

2.2. A simulation study

To study the behavior for �nite samples of the bias in the estimation proceduredescribed in the preceding section, a simulation study was performed, which is herebrie y described. Model (1:1) with �=0 was simulated with pseudorandom normal(0; 1) numbers, and eight values of � ranging from −0.90 to 0.95. Sample sizeswere set at T =50; 100; 200 and 400, and for each case 1000 replications were done.Estimation was done by using SCA. 1 Table 1 reports the numerical results, and issimilar to Table IV presented in Mentz et al. (1997) for the MLE of the model. Theempirical biases are averages over replications and their empirical standard errorsare also based on these replications. The reported asymptotic biases are those of theMLE (see Section 3) and they are used to compute the “Studentized values”: thevalues in column 6 come from column 3 minus 4 divided into 5. Column 7 reportsthe empirical mean square error, computed as the sum of the variance (obtained as1000 times the square of column 5) and the square of the bias reported in column 3.The main observations stemming from Table 1 can be summarized as follows:

(1) For all values of �. (a) Empirical biases decrease in absolute values as T in-creases, as expected; (b) Empirical standard errors decrease monotonically as T in-creases, as expected; (c) Empirical standard errors for a given T have similar valuesfor di�erent values of �. They are well predicted by the MLE asymptotic value of�2[(2=T )=1000]1=2 (cf. Harvey, 1993, Ch. 3) for �2 = 1; (d) Empirical mean squareerrors are well approximated by the MLE asymptotic value of �4(2=T + 4=T 2) for�4 = 1. (2) For −0:40≤ �≤ 0:60. (a) Empirical biases have the same signs as theasymptotic biases of MLE; (b) Studentized biases are less than 3 in absolute value,except for one case (�= − 0:40; T = 50). Of the remaining 19 cases only 2 arelarger than 2 in absolute values; (3) No studentized biases are larger than 2 in ab-solute values for T =200 or 400; (d) For a �xed �, studentized values for T =200and 400 are smaller in absolute values than for T =50 and 100, except in one case(�=0:20); (3) For �=−0:90; 0:80 and 0.95. (a) Empirical biases di�er in signsfrom those of the MLE; (b) Studentized biases are larger than 3 in absolute values;(d) Studentized values decrease as T increases.We conclude that behavior of the LSE in terms of bias is comparable to that of

the MLE for moderately large samples sizes and for values of the parameter removedfrom the boundary of the region of invertibility; for values closer to this boundary,even for sample sizes as large as T =400 the �t to the MLE asymptotic theory is notgood, and only for larger sample sizes the results of this theory are expected to beuseful in practice. Hence, while the MLE behaves without dependence on the values

1 A preliminary simulation study was done by using ITSM with some of the indicated values of �,the same sample sizes, and 100 replications for each case. The �ndings of that study were re�ned andcon�rmed by those reported here.


Table 1Analysis of simulations, LSE estimation of �2

Value of � Sample Empirical Asymptotic Empirical Studentized Empirical meansize bias bias MLE standard error bias square error

1 2 3 4 5 6 7

−0.9 50 0.01397 −0.04000 0.00636 8.48585 0.04063100 0.00599 −0.02000 0.00433 6.00231 0.01876200 0.00313 −0.01000 0.00311 4.22186 0.00969400 0.00134 −0.00500 0.00217 2.92166 0.00469

−0.4 50 −0.06078 −0.04000 0.00603 − 3.44610 0.04007100 −0.02317 −0.02000 0.00431 − 0.73550 0.01908200 −0.00900 −0.01000 0.00312 0.32051 0.00979400 −0.00574 −0.00500 0.00219 − 0.33790 0.00482

−0.1 50 −0.02624 −0.04000 0.00612 2.24837 0.03812100 −0.01280 −0.02000 0.00435 1.65517 0.01907200 −0.00943 −0.01000 0.00307 0.18567 0.00949400 −0.00288 −0.00500 0.00226 0.93801 0.00514

0.1 50 −0.03052 −0.04000 0.00643 1.47434 0.04232100 −0.01059 −0.02000 0.00472 1.99364 0.02241200 −0.00872 −0.01000 0.00333 0.38438 0.0112400 −0.00183 −0.00500 0.00233 1.36051 0.00544

0.2 50 −0.05130 −0.04000 0.00593 − 1.90556 0.03776100 −0.02148 −0.02000 0.00442 − 0.33484 0.01996200 −0.01018 −0.01000 0.00315 − 0.05714 0.01004400 −0.00679 −0.00500 0.00230 − 0.77826 0.00533

0.6 50 −0.04931 −0.04000 0.00601 − 1.54909 0.03858100 −0.03097 −0.02000 0.00413 − 2.65618 0.01804200 −0.01376 −0.01000 0.00310 − 1.21290 0.00978400 −0.00522 −0.00500 0.00216 − 0.10185 0.00472

0.8 50 0.03069 −0.04000 0.00615 11.49431 0.03880100 0.02403 −0.02000 0.00430 10.23953 0.01908200 0.01225 −0.01000 0.00317 7.01893 0.01017400 0.00425 −0.00500 0.00227 4.07489 0.00517

0.95 50 0.05743 −0.04000 0.00624 15.61378 0.04221100 0.02998 −0.02000 0.00453 11.03311 0.02144200 0.01754 −0.01000 0.00316 8.71519 0.01032400 0.00894 −0.00500 0.00227 6.14097 0.00522

of the parameter in the true model, the LSE shows dependence on this value. Thisphenomenon is in agreement with our �ndings in other situations, some of whichare pointed out in our (1997) paper on the MA model, and will also be illustratedin Section 3.2.Another interesting �nding is that for the variance of the variance estimator, the

approximation provided by the MLE theory behaves better than that of the bias, inparticular is not a�ected by the values of the main parameter �. This explains whythe values of the empirical mean square error reported in column 7 are consistently


close to those coming from the asymptotic theory, namely 0.04160, 0.02040, 0.01010and 0.00503 for T =50; 100; 200 and 400, respectively.

2.3. Asymptotic theory for the residual variance estimator

To develop the statistical theory for the LSE we use the idea of a “long autore-gression” (Durbin, 1959), which amounts to approximating the series in Eq. (1.2)by the �nite sum

�∗t =BT∑s=0

(−�)s(Xt−s − �); (2.3)

where in the theory BT →∞ as T→∞ in such a way that BT (or a power of it) isdominated by T; BT =T→ 0 as T→∞. Using Eq. (2.3) together with the substitutionof �X for �, the following approximation can be justi�ed,

�2LS≈1

T − BT − 2T∑

t=BT+1

BT∑j=0

(−�) j(Xt−j − �X )

2

: (2.4)

Under these conditions in the appendix we show that the following asymptotic rep-resentation of Eq. (2.4) holds:

�2LS≈c0

1− �2 + 2T−1∑j=1

cj(−�) j1− �2 ; (2.5)

and we arrive at the expression

E(�2LS − �2) =−�2(1− 3�2)T (1− �2) +

2�2(1 + 2�2 + 21�4)(1− �2)3 E(�− �)2

+C + o(1=T ); (2.6)

where � is the LSE of �, and

C =2�E(c0 − 0)(�− �)

(1− �2)2 − 2(1 + �2)E(c1 − 1)(�− �)(1− �2)2

+ 2∞∑j=2

−j(−�) j−1(1− �2) + 2�(−�) j(1− �2)2 Ecj(�− �): (2.7)

The �rst term on the right-hand side of Eq. (2.6) is the contribution comingfrom the biases of the covariance estimators, that is, E(cj − j) for j≥ 0. Termswith factors E(� − �)s do not contribute to the order 0(1=T ) when s=1 or s≥ 3.The expected value E(� − �)2 appearing in Eq. (2.6) can be approximated by theasymptotic variance of the MLE of �, which to 0(1=T ) is (1− �2)=T . Hence,

E(�2LS − �2)=�2

T6�2 + 18�4

(1− �2)2 + C + o(1=T ): (2.8)


The component C includes covariances between the LSE � and the cj, for whichwe have no exact or approximate expressions. In a practical situation a data analystcan approximate them numerically by using the bootstrap procedure described, forexample, in Efron and Tibshirani (1993), (Ch. 8).As simple examples we took two series of lengths T =100 each, of the kind we

employed in the simulations reported in Table 1, one for �=−0:40 and one for�=0:20, that is, values of � far from the region of non-invertibility. 100 bootstrapreplications were done for each series, 10 covariances were included in the sums,and the results were as follows:

�=−0:40; �=−0:2643; C =−0:0244; E(�2LS − �2)=−0:0230;

�=0:20; �=0:3054; C =−0:0182; E(�2LS − �2)=−0:0161:We use C and E to indicate that we used the estimated values in place of the pa-rameters �2 and �. The bias of the MLE estimator is −0:02, and the approximationsare seen to be reasonable. Other quantities were estimated via the bootstrap proce-dure, to control the accuracy: estimates of �2 = 1 we obtained as 1.0076 and 1.0036,respectively.

3. Estimation by maximum likelihood

3.1. Estimation of the coe�cients and residual variance

When the residuals in Eq. (1.1) are independent N (0; �2) the likelihood functionof the observations X1; X2; : : : ; XT is the function of �; � and � given by

L=(2�)−T=2|�|−1=2exp{−12(X − �)′�−1(X − �)

}; (3.1)

where X =(X1; : : : ; XT )′; �=(�; : : : ; �)′, and � is the covariance matrix with com-ponents |i−j|. This can also be written as

L=(2��2)−T=2|P|−1=2exp{− 12�2

(X − �)′P−1(X − �)}; (3.2)

where �= �2P.Maximum likelihood estimators of �; � and �2 are obtained by minimizing

Eq. (3.2) over its parameter space. No explicit formulas for these estimators areknown, not even in the case of q=1. However, asymptotic expressions for some ofthe biases are known: Tanaka (1984) and Cordeiro and Klein (1994) give for theMA(1) model,

E(�− �)= 1 + 2�T

+ o(1=T ): (3.3)

Minimizing Eq. (3.2) with respect to �2 only, shows that

�2ML =1T(X − �)′P−1(X − �): (3.4)


For this estimator in the MA(1) model Tanaka (1984) and Cordeiro and Klein (1993)give

E(�2ML − �2)= − 2�2

T+ o(1=T ): (3.5)

3.2. A numerical analysis of an approximation

In the present authors’ paper (1997) the exact form of Eq. (3.4) was considered,as well as an approximation consisting in using instead of the components of P−1

along diagonals,

(−�) k 11− �2 ; k =0; 1; : : : ; T − 1: (3.6)

The e�ect of this approximation is that P times a matrix with these componentsproduces a matrix that di�ers from the identity matrix in the components of the �rstand T th rows.With this approximation plus estimating � by X , it can be shown that the MLE of

the residual variance becomes identical to Eq. (2.5), the approximate representation ofthe LSE. The fact that the LS and ML estimators coincide under some approximationshas been noted elsewhere. Brockwell and Davis (1991a, Section 8.7) note that if thedeterminant of the covariance matrix appearing in the Gaussian likelihood functionis asymptotically negligible compared with the sum of squares in its exponent, asis the case when the parameter is constrained to the region of invertibility, thenminimization of the sum of squares is equivalent to maximization of the likelihoodand the LS and ML estimator have the same asymptotic properties.Table 2 reports the �ndings of a simulation study aimed at illustrating the e�ects

of the indicated approximation. Model (1.1) with �=0 was simulated with pseudo-random normal (0,1) numbers, and eight values of � ranging from −0:40 to 0.95.Sample size was set at T =100 and for each case 1000 replications were done.Columns 2 and 3 contain the averages over replications of the MLE of � and �2,

respectively, obtained by processing the simulated series with BMDP. 2 In paren-theses below the averages are the calculated standard deviations. Similar values arereported in column 4 when �2 is estimated by Eq. (3.4) with �=0 and the com-ponents of P−1 obtained by replacing � by � in the exact form of the inverse. 3

Columns 5 and 6 contain the values obtained by using the approximation (2.5) withcovariances cj ranging up to K =10 or 20, respectively.

2A preliminary simulation study was done by using ITSM with a few of the indicated values of �,the same sample size, and 100 replications for each case. The �ndings of that study were re�ned andcon�rmed by those reported here.3In the preliminary study mentioned in footnote (2) the quadratic form was computed directly as adouble sum using the exact form of the component of the inverse matrix. For the study reported herea much faster algorithm was used. Linear systems of the form X =PY were solved recursively for Yby the forward solution of the method of successive elimination. For the case of the MA(1) model,see Anderson and Mentz (1993) for details.


Table 2Analysis of an approximation by simulations

Value of � � by �2(1) �2(2) �2(3) �2(4)BMDP by BMDP Exact Appr. K =10 Appr. K =20

1 2 3 4 5 6

−0:40 −0:4100 1.0006 0.9911 0.9873 0.9871(0.0996) (0.1376) (0.1368) (0.1370) (0.1372)

−0:10 −0:0984 0.9978 0.9882 0.9791 0.9791(0.1080) (0.1418) (0.1412) (0.1397) (0.1397)

0.10 0.1066 0.9978 0.9881 0.9793 0.9793(0.1098) (0.1417) (0.1409) (0.1396) (0.1396)

0.20 0.2092 0.9974 0.9877 0.9800 0.9800(0.1089) (0.1416) (0.1406) (0.1397) (0.1397)

0.40 0.4140 0.9959 0.9864 0.9832 0.9833(0.1022) (0.1413) (0.1402) (0.1405) (0.1404)

0.60 0.6190 0.9931 0.9842 0.9918 0.9907(0.0924) (0.1409) (0.1399) (0.1508) (0.1468)

0.80 0.8166 0.9901 0.9826 1.1646 1.0375(0.0700) (0.1407) (0.1397) (1.3911) (0.8363)

0.95 0.9473 1.0008 0.9934 3.9730 3.4136(0.0367) (0.1460) (0.1452) (9.8494) (13.3612)

The observations stemming from Table 2 can be summarized as follows: (1) Forall values of �. (a) The MLE of � with BMDP was included as a check, and theresults show a reasonable agreement with the asymptotic theory, under which thebias satis�es Eq. (3.3) and Var(�)≈ (1− �2)=T ; (b) The MLE of �2 and that usingthe exact form of P−1 show comparable results; it is worth noting that the empiri-cal standard deviations cluster close to 0:1414=

√0:02=

√2=T , since 2�4=T can be

taken as the asymptotic variance of the MLE of �2 (cf. Harvey, 1993, Ch. 3); (2)For −0:40≤ �≤ 0:60. The estimation of �2 by the approximation (2.5) with K =10or 20, show results comparable to BMDP or exact MLE. Negative estimates wereonly obtained in one replication for �=0:40 and one for �=0:60. (3) For �= 0:80and 0.95. (a) Negative estimates of �2 = 1 appear with increasing frequencies, therewere negative estimates in 43 of the 1000 replications for �=0:80 and 525 for�=0:95; (b) Even discarding the negative estimates and averaging over the pos-itive values, Table 2 shows clear di�erences in the behavior of theapproximations.


We conclude that the approximation used for the components of P−1 works wellin the estimation by ML of the residual variance, for values of � small in absolutevalue for a sample size of T =100. As � approaches the boundary of the region ofinvertibility, the use of the approximation requires larger sample sizes to be usefulin the indicated way.

4. Summary and conclusions

4.1. Summary

We studied the estimation of the residual variance of the MA(1) model by meansof the estimator �2LS arising from the method of least squares. In Section 2.3 andthe appendix we developed an approach to study the asymptotic behavior of �2LSas sample size T→∞ that does not require the assumption of normal error terms.We obtained a rather simple closed-form approximation to this estimator as a linearcombination of sample covariances with weights given by functions of the LSE of� (the approximation is also valid for �2ML only that the weights will be functionsof the MLE or �). This approximation can safely be used provided the underly-ing � is smaller than 0.60 in absolute value and T is at least 100: our simula-tions in Table 2, done with normal data, show that in these ranges its behavior iswell predicted by the asymptotic behavior of the exact MLE in terms of bias andvariance.In Section 2.2 we studied by simulations the �nite-sample behavior of �2LS

(Table 1). With normal data and using a standard computer program we concen-trated in the analysis of bias and variance. We concluded that the bias in the LSEdepends on the value of �, opposite to the MLE which has a constant asymp-totic bias. The bias of the LSE is well approximated by −2�2=T for T =50 orlarger, provided that the underlying � is smaller than approximately 0.60 in ab-solute value. If |�|¿0:60 the approximation fails, at least for T ≤ 400 and tendsto get worse as |�|→1. Finally, in Section 3, after a brief presentation of �2MLwe relate the closed-form approximation to the LSE mentioned above to an often-used approximation to the likelihood function, and illustrate it by simulations(Table 2).

4.2. Concluding remarks

Estimation in ARMA models is justi�ed through asymptotic results, consistency(weak or strong) and asymptotic normality. The assumptions considered in this paperfor the MA(1) model are su�cient for the MLE: i.i.d. (0; �2) errors with �2 positiveand �nite, and |�|¡1 (invertibility). These assumptions are also su�cient to provethat the LSE has the same properties (see for example, Brockwell and Davis 1991a,Theorems 10.8.1 and 10.8.2, or Fuller, 1996, Theorems 8.4.1 and 8.4.2). Hence,under these conditions the asymptotic theory provides no guidance to distinguishbetween MLE and LSE, and it follows that di�erences, if any, should come from


more re�ned asymptotic results, or from �nite-sample considerations, either throughtheory or simulations.In connection with the focus of our paper Harvey (1993, p. 69) writes: “As usual,

the simple MA(1) models provides insight into the behavior of models other thanthe pure AR. The question of estimates on or outside the invertibility region is nowof some importance”.A question that simulations tend to answer is how large has to be the �nite

sample size for the asymptotic theory to be a valid approximation. Another ishow results depend on their choice of parameter values. For simulations to allowfor comparisons with MLE, pseudonormal error terms are called for. Simulationswith other error distributions will address the important but di�erent question ofrobustness.If a data analyst plans to use the assumption of normality in his analyses, he will

�nd useful the results in a 1997 study by the present authors; he can construct theestimator �2ML(1 + 2=T ) corrected (approximately) for bias. Our results in Section2.2, Table 1, show the di�erences that arise if he uses the LSE instead, and therisks he faces if he still uses the MLE theory, in particular if |�|¿0:60 and T issmall. If with his data he uses the LSE procedure, then our results in Section 2.3 areuseful, and our simulation study in Section 2 serves as a reference: if |�|¿0:60, evenlarge normal data sets tend to give unexpected results in terms of biases, existingtheory for MLE fails to provide adequate guidance, the true correction for bias iscomplicated since it depends on the value of �.

5. For Further Reading

The following references are also of interest to the reader: Mentz et al. 1998.

Acknowledgements

Appreciation is expressed to Eng. Carlos Martinez for help in conducting thesimulations leading to Table 2. We thank two referees and the editor for critiquesand comments that led to substantial improvements in our presentation.

Appendix. Derivation of Eqs. (2.5)–(2.8)

Eq. (2.4) can be written as

�2LS ≈BT∑j=0

�2jc∗jj + 2BT∑j=1

∑0≤ k¡j

(−�) j+kc∗jk ; (A.1)

where

c∗jk =1

T − BT − 2T∑

t=BT+1

(Xt−j − X )(Xt−k − X ): (A.2)


These quantities are not the same as those de�ned in Eq. (1.3), they all have thesame number of terms and they do not form a Toeplitz matrix, since, for example,c11 and c22 are not equal. However, we have that, for �xed BT

limT→∞

TE(c∗jk − j−k)

= limT→∞

TE

1T − BT − 2

T∑BT+1

[(Xt−j − �)−(X − �)][(Xt−k − �)−(X−�)]−E[(Xt−j − �)(Xt−k − �)]

= limT→∞

TE

{1

T − BT − 2T∑

t=BT+1

[−(Xt−j − �)(X − �)

−(Xt−k − �) (X − �)+ (X − �)2]}

= limT→∞

TE[−(X − �)2]= limT→∞

−T Var(X )= − 2�f(0)= − �2(1 + �)2;(A.3)

where f is the MA(1) spectral density function f(�)= (�2=2�)(1 + �2 + 2� cos �);−�≤ �≤ �; see, for example, Anderson (1971) Theorem 8.3.1. It then follows thatthe c∗’s and the c’s have the same asymptotic biases, except when |j − k|=1; seethe second formula in the �rst line of Eq. (1.5). To the degree of approximationused here, it follows that we can approximate (A.1) by using c’s instead of c∗’s,which after summing the series involving � leads to

�2LS ≈c0

1− �2 + 2∞∑k=1

ck(−�) k1− �2 : (A.4)

Expression (2.5) is derived from Eq. (A.4) by letting the sum range up to T − 1,since with T observations we obtain sample covariances up to this order only.In terms of parameters we have that

01− �2 + 2

∞∑j=1

j(−�) j1− �2 =

0 + 2(−�) 11− �2 = �2; (A.5)

so that

�2LS − �2 ≈(

c01− �2 −

01− �2

)+

(c1

−2�1− �2 − 1

−2�1− �2

)+ 2

∞∑j=2

cj(−�) j1− �2 ;

(A.6)

where we used that j=0 for j≥ 2.We want a Taylor expansion that retains terms up to second order, when the

variables in the expansion are the c’s and �. The coe�cientes of (cj − j) s vanishfor s¿1, and the coe�cient of (�− �) is

2� 0(1− �2)2 −

2(1 + �2) 1(1− �2)2 (A.7)


which equals 0 because 0 = �2(1 + �2); 1 = �2�. The coe�cient of (�− �)2 is2�2(1 + 2�2 + 21�4)

(1− �2)3 : (A.8)

Hence,

�2LS − �2≈c0 − 01− �2 − 2�(c1 − 1)

1− �2 +2

1− �2∞∑j=2

cj(−�) j

+2�2(1 + 2�2 + 21�4)

(1− �2)3 (�− �)2

+2�(c0 − 0)(1− �2)2 (�− �)−

2(1 + �2)(c1 − 1)(1− �2)2 (�− �)

+ 2∞∑j=2

cj(�− �)−j(−�)j−1(1− �2) + 2�(−�) j(1− �2)2 : (A.9)

The expected value of the fourth term in Eq. (A.9) is the second term on theright-hand side of Eq. (2.6), and the expected value of the last three terms is thecomponent C in Eq. (2.7). To evaluate the expected value of the sum of the �rstthree terms we use results in Eq. (1.5) and

Ecj≈−Var(X )= − 2�f(0)T

=−�2(1 + �)2

T; j≥ 2; (A.10)

and Eq. (2.6) follows.

References

Anderson, T.W., 1971. The Statistical Analysis of Time Series. Wiley, New York.Anderson, T.W., Mentz, R.P., 1993. Evaluation of quadratic forms and traces for iterative estimationin �rst-order moving average models. Comm. Statist. Theory Methods 22, 931–963.

BMDP, 1990. BMDP Statistical Software. Los Angeles, California.Box, G.E.P., Jenkins, G.M., 1970. Time Series Analysis, Forecasting and Control. San Francisco,Holden Day, 3rd ed. with Gregory C. Reinsel, 1994, Prentice-Hall, NJ.

Brockwell, P.J., Davis, R.A., 1991a. Time Series: Theory and Methods, 2nd ed. Springer,New York.

Brockwell, P.J., Davis, R.A., 1991b. ITSM: An Interactive Time Series Modelling Package for the PC.Springer, New York.

Cordeiro, G.M., Klein, R., 1994. Bias correction of maximum likelihood estimates for ARMA models.Statist. Probab. Lett. 169–176.

Durbin, J., 1959. E�cient estimation of parameters in moving average models. Biometrika 46,306–316.

Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.Fuller, W.A., 1996. Introduction to Statistical Time Series, 2nd ed. Wiley, New York.Harvey, A.C., 1993. Time Series Models, 2nd ed. MIT Press, Cambridge, MA.Mentz, R.P., Morettin, P.A., Toloi, C.M.C., 1997. Residual variance estimation in moving averagemodels. Comm. Statist. Theory Methods 26 (8), 1905–1923.

Mentz, R.P., Morettin, P.A., Toloi, C.M.C., 1998. On residual variance estimation in autoregressivemodels. J. Time Series Anal. 19 (2), 187–208.


Priestley, M.B., 1981. Spectral Analysis of Time Series. Academic Press, London.Scienti�c Computing Associates, 1993. The SCA Statistical System (Release V.2). Lisle, Illinois.Simono�, J.S., 1993. The relative importance of bias and variability in the estimation of the varianceof a statistic. Statistician 42, 3–7.

Tanaka, K., 1984. An asymptotic expansion associated with the maximum likelihood estimator inARMA models. J. Roy. Statist. Soc. Ser. B 46, 58–67.

On least-squares estimation of the residual variance in the first-order moving average model

Documents

Transcript of On least-squares estimation of the residual variance in the first-order moving average model