Small Area SS

11
Small-Area Estimation With State–Space Models Subject to Benchmark Constraints Danny PFEFFERMANN and Richard TILLER This article shows how to benchmark small-area estimators, produced by fitting separate state–space models within the areas, to aggregates of the survey direct estimators within a group of areas. State–space models are used by the U.S. Bureau of Labor Statistics (BLS) for the production of all of the monthly employment and unemployment estimates in census divisions and the states. Computation of the benchmarked estimators and their variances is accomplished by incorporating the benchmark constraints within a joint model for the direct estimators in the different areas, which requires the development of a new filtering algorithm for state–space models with correlated measurement errors. The new filter coincides with the familiar Kalman filter when the measurement errors are uncorrelated. The properties and implications of the use of the benchmarked estimators are discussed and illustrated using BLS unemployment series. The problem of small-area estimation is how to produce reliable estimates of area (domain) characteristics and compute their variances when the sample sizes within the areas are too small to warrant the use of traditional direct survey estimates. This problem is commonly handled by borrowing strength from either neighboring areas and/or from previous surveys, using appropriate cross-sectional/time series models. To protect against possible model breakdowns and for consistency in publication, the area model–dependent estimates often must be benchmarked to an estimate for a group of the areas, which it is sufficiently accurate. The latter estimate is a weighted sum of the direct survey estimates in the various areas, so that the benchmarking process defines another way of borrowing strength across the areas. KEY WORDS: Autocorrelated measurement errors; Generalized least squares; Recursive filtering; Sampling errors; Unemployment. 1. INTRODUCTION The U.S. Bureau of Labor Statistics (BLS) uses state–space time series models for the production of the monthly employ- ment and unemployment estimates in the 9 census divisions (CDs) and the 50 states and the District of Columbia that make them up. The models are fitted to the direct sample estimates obtained from the Current Population Survey (CPS). Using models is necessary because both the state CPS samples and the CD samples are too small to produce reliable direct esti- mates, which is a typical “small-area estimation” problem. The coefficient of variation (CV) of the direct unemployment esti- mates varies from about 4.5% to 8.5% for the CDs and from about 8% to about 16% for the states. The use of time series models significantly reduces the variances of the estimators by borrowing information from past estimates; see the application in Section 5. (For a recent review of small-area estimation meth- ods, see Pfeffermann 2002. Section 6 considers the use of time series models.) [Rao (2003) provides a detailed methodological account of this topic.] The state–space models are fitted independently for each CD and state, and combine a model for the true population values with a model for the sampling errors. The direct survey esti- mates are the sums of these two unknown components. The published estimates are the differences between the direct es- timates and the estimates of the sampling errors, as obtained under the combined model. At the end of each calendar year, the monthly model-dependent estimates are modified so that their annual mean equals the corresponding mean of the direct CPS estimates. The purpose of the benchmarking is to provide protection against possible model failure. However, this bench- marking procedure has two major shortcomings: Danny Pfeffermann is Professor of Statistics, Hebrew University of Jerusalem, Jerusalem 91905, Israel, and Southampton Statistical Sciences Research Institute, University of Southampton, Southampton, SO17 1BJ, U.K. (E-mail:[email protected]). Richard Tiller is Senior Mathematical Sta- tistician, Statistical Methods Staff, Office of Employment and Unemploy- ment Statistics, Bureau of Labor Statistics, Washington, DC 20212 (E-mail: [email protected]). The authors thank the editor, an associate editor, and the reviewer for constructive comments and suggestions. 1. The mean annual CPS estimates are still unreliable, par- ticularly for states, because the monthly estimates are highly correlated due to large-sample overlaps between different months (see Sec. 2). 2. The benchmarking is retrospective, occurring at the end of each year after that the monthly model-dependent estimates have already been published, and hence they provide no protection to the real time estimates. (The benchmarked estimates are used for trend estimation.) In this article we study a solution to the benchmarking prob- lem that addresses the two shortcomings of the current BLS procedure. The proposed benchmarking consists of two steps: a. Fit the model jointly to all the D = 9 CDs, each month adding the constraint D d=1 w dt ˆ Y model dt = D d=1 w dt ˆ Y cps dt , t = 1, 2,..., (1) where ˆ Y model dt and ˆ Y cps dt define the model-dependent estimator and the direct CPS estimator at time t, and the w dt ’s are fixed weights. b. Fit the model jointly to all of the states in any given CD, each month adding the constraint sd w st ˆ Y model st = ˆ Y bench dt , d = 1,..., D, t = 1, 2,..., (2) where ˆ Y bench dt is the benchmarked estimator for CD d at time t as obtained in step a. The justification for incorporating the constraints (1) is that the direct CPS estimators, which are unreliable in single CDs (notably in the smaller ones), can be trusted when averaged over different CDs. Note in this respect, that the sampling er- rors of the direct estimates, that are highly correlated within a CD are independent between CDs. As illustrated in the appli- cation in Section 5, the benchmarked CD estimators are much In the Public Domain Journal of the American Statistical Association December 2006, Vol. 101, No. 476, Applications and Case Studies DOI 10.1198/016214506000000591 1387

Transcript of Small Area SS

Small-Area Estimation With StatendashSpace ModelsSubject to Benchmark Constraints

Danny PFEFFERMANN and Richard TILLER

This article shows how to benchmark small-area estimators produced by fitting separate statendashspace models within the areas to aggregatesof the survey direct estimators within a group of areas Statendashspace models are used by the US Bureau of Labor Statistics (BLS) forthe production of all of the monthly employment and unemployment estimates in census divisions and the states Computation of thebenchmarked estimators and their variances is accomplished by incorporating the benchmark constraints within a joint model for thedirect estimators in the different areas which requires the development of a new filtering algorithm for statendashspace models with correlatedmeasurement errors The new filter coincides with the familiar Kalman filter when the measurement errors are uncorrelated The propertiesand implications of the use of the benchmarked estimators are discussed and illustrated using BLS unemployment series The problem ofsmall-area estimation is how to produce reliable estimates of area (domain) characteristics and compute their variances when the samplesizes within the areas are too small to warrant the use of traditional direct survey estimates This problem is commonly handled by borrowingstrength from either neighboring areas andor from previous surveys using appropriate cross-sectionaltime series models To protect againstpossible model breakdowns and for consistency in publication the area modelndashdependent estimates often must be benchmarked to anestimate for a group of the areas which it is sufficiently accurate The latter estimate is a weighted sum of the direct survey estimates in thevarious areas so that the benchmarking process defines another way of borrowing strength across the areas

KEY WORDS Autocorrelated measurement errors Generalized least squares Recursive filtering Sampling errors Unemployment

1 INTRODUCTION

The US Bureau of Labor Statistics (BLS) uses statendashspacetime series models for the production of the monthly employ-ment and unemployment estimates in the 9 census divisions(CDs) and the 50 states and the District of Columbia that makethem up The models are fitted to the direct sample estimatesobtained from the Current Population Survey (CPS) Usingmodels is necessary because both the state CPS samples andthe CD samples are too small to produce reliable direct esti-mates which is a typical ldquosmall-area estimationrdquo problem Thecoefficient of variation (CV) of the direct unemployment esti-mates varies from about 45 to 85 for the CDs and fromabout 8 to about 16 for the states The use of time seriesmodels significantly reduces the variances of the estimators byborrowing information from past estimates see the applicationin Section 5 (For a recent review of small-area estimation meth-ods see Pfeffermann 2002 Section 6 considers the use of timeseries models) [Rao (2003) provides a detailed methodologicalaccount of this topic]

The statendashspace models are fitted independently for each CDand state and combine a model for the true population valueswith a model for the sampling errors The direct survey esti-mates are the sums of these two unknown components Thepublished estimates are the differences between the direct es-timates and the estimates of the sampling errors as obtainedunder the combined model At the end of each calendar yearthe monthly model-dependent estimates are modified so thattheir annual mean equals the corresponding mean of the directCPS estimates The purpose of the benchmarking is to provideprotection against possible model failure However this bench-marking procedure has two major shortcomings

Danny Pfeffermann is Professor of Statistics Hebrew University ofJerusalem Jerusalem 91905 Israel and Southampton Statistical SciencesResearch Institute University of Southampton Southampton SO17 1BJUK (E-mailmsdannyhujiacil) Richard Tiller is Senior Mathematical Sta-tistician Statistical Methods Staff Office of Employment and Unemploy-ment Statistics Bureau of Labor Statistics Washington DC 20212 (E-mailTillerRichardblsgov) The authors thank the editor an associate editor andthe reviewer for constructive comments and suggestions

1 The mean annual CPS estimates are still unreliable par-ticularly for states because the monthly estimates arehighly correlated due to large-sample overlaps betweendifferent months (see Sec 2)

2 The benchmarking is retrospective occurring at the endof each year after that the monthly model-dependentestimates have already been published and hence theyprovide no protection to the real time estimates (Thebenchmarked estimates are used for trend estimation)

In this article we study a solution to the benchmarking prob-lem that addresses the two shortcomings of the current BLSprocedure The proposed benchmarking consists of two steps

a Fit the model jointly to all the D = 9 CDs each monthadding the constraint

Dsum

d=1

wdtYmodeldt =

Dsum

d=1

wdtYcpsdt t = 12 (1)

where Ymodeldt and Ycps

dt define the model-dependent estimatorand the direct CPS estimator at time t and the wdtrsquos are fixedweights

b Fit the model jointly to all of the states in any given CDeach month adding the constraint

sum

sisind

wstYmodelst = Ybench

dt d = 1 D t = 12 (2)

where Ybenchdt is the benchmarked estimator for CD d at time t

as obtained in step a

The justification for incorporating the constraints (1) is thatthe direct CPS estimators which are unreliable in single CDs(notably in the smaller ones) can be trusted when averagedover different CDs Note in this respect that the sampling er-rors of the direct estimates that are highly correlated within aCD are independent between CDs As illustrated in the appli-cation in Section 5 the benchmarked CD estimators are much

In the Public DomainJournal of the American Statistical Association

December 2006 Vol 101 No 476 Applications and Case StudiesDOI 101198016214506000000591

1387

1388 Journal of the American Statistical Association December 2006

more accurate than the corresponding CPS estimates thus jus-tifying benchmarking the state estimates to the benchmarkedCD estimates as in (2) The basic idea behind using either ofthese constraints is that if all of the direct CPS estimates jointlyincrease or decrease due to some sudden external shocks thatare not accounted for by the model then the benchmarked es-timators will reflect this change much faster than the model-dependent estimators obtained by fitting separate models foreach CD and state This property is strikingly illustrated in theapplication in Section 5 using actual CD unemployment seriesNote also that by incorporating the constraints (1) and (2) thebenchmarked estimators in a given month ldquoborrow strengthrdquoboth from past data and cross-sectionally unlike the model-dependent estimators in current use that borrow strength onlyfrom the past This property is reflected by the reduced varianceof the benchmarked estimators which is shown in Section 5

Remark 1 Unlike in classical benchmarking problems thatuse external (independent) data (eg census figures or es-timates from another large sample) for the benchmarkingprocess the model-dependent estimates on the left side of (1)are benchmarked to a weighted mean of the direct CPS esti-mates which are the input data for the models A similar prob-lem exists with the benchmarking of the state estimates in (2)External independent data to which the monthly CD or state es-timates can be benchmarked are not available even for isolatedmonths (See Hillmer and Trabelsi 1987 Doran 1992 Durbinand Quenneville 1997 for benchmarking procedures to externaldata sources in the context of statendashspace modeling)

An important question underlying use of the constraints(1) and (2) is defining the weights wdt and wst Possible de-finitions for the weights wdt in (1) are

w1dt = 1

D w2dt = NdtsumD

d=1 Ndt

(3)

w3dt = 1varcpsdtsumD

d=1[1varcpsdt ]

where Ndt and varcpsdt denote the total size of the labor force and

the variance of the direct CPS estimate in CD d at month t Us-ing the weights w1dt is appropriate when the direct estimatesare totals using w2dt is appropriate when the direct estimatesare proportions Therefore using these weights guarantees thatthe CD estimates each month add up to the correspondingnational CPS estimate By using similar weights in (2) thebenchmarked state estimates add up to the benchmarked CDestimate in each of the CDs and hence also to the nationalCPS estimate thus satisfying publication-consistency require-ments Using w3dt minimizes the variance of the benchmarksumD

d=1 wdtYcpsdt but does not satisfy publication consistency

The proposed solution of combining the individual CD orstate models into a joint statendashspace model with built-in bench-mark constrains intensifies the computations The dimensionof the state vector in the separate models is 29 (see the nextsection) implying that by fitting the model jointly to say the9 CDs the dimension of the joint state vector would be 261Fitting statendashspace models of this size puts heavy demands onCPU time and memory This creates problems in a production

environment in which all of the CD and state estimates (and nu-merous other estimates) must be produced and published eachmonth soon after the new sample data become available A pos-sible solution to this problem studied herein that also allowscomputing the variances of the benchmarked estimators is to in-clude the sampling errors as part of the error terms in the modelmeasurement equation instead of the current practice of fittinga linear time series model for the sampling errors and includingthem in the state vector Implementing this solution reduces thedimension of each of the separate state vectors by half becausethe sampling errors make up 15 elements of the state vector (seethe next section) As explained in Section 2 including the sam-pling errors in the measurement equation does not change themodel

Using this solution however does induce autocorrelated er-rors in the measurement equation of the statendashspace modelbecause as mentioned earlier the sampling errors are highlycorrelated over time This raises the need to develop a re-cursive filtering algorithm with good statistical properties forstatendashspace models with autocorrelated measurement errorsAlthough originally motivated by computational considera-tions the development of such an algorithm is of general inter-est because the common practice of fitting a linear time seriesmodel for the measurement errors when they are autocorrelatedis not always practical See the next section for the samplingerror model approximation used by the BLS To the best ofour knowledge no such algorithm has been studied previouslyPfeffermann and Burck (1990) likewise added constraints of theform (1) to a statendashspace model and developed an appropriaterecursive filtering algorithm but their model did not containautocorrelated sampling errors so the measurement errors areindependent cross-sectionally and over time

Implementation of the proposed benchmarking procedurewhich is the primary focus of this article therefore requiressolving three problems

1 Develop a general recursive filtering algorithm for statendashspace models with correlated measurement errors

2 Incorporate the benchmark constraints and compute thecorresponding benchmarked estimates (estimates of em-ployment or unemployment measures in the present ap-plication)

3 Compute the variances of the benchmarked estimators

We emphasize with regard to the third problem that the con-straints in (1) and (2) are imposed only for computing thebenchmarked estimators not when computing the variancesof the benchmarked estimators which account for all of thesources of variability including the errors in the benchmarkequations As explained later it would have been necessaryto develop a new filtering algorithm for computing the correctvariances even if the sampling errors were left in the state vec-tor

Remark 2 An alternative and simpler way of enforcing thebenchmark constraints is by pro-rating the separate model-dependent estimates each month such that for CDs for exam-ple

Yprodt = Ymodel

dt

(Dsum

d=1

wdtYcpsdt

Dsum

d=1

wdtYmodeldt

) (4)

Pfeffermann and Tiller Small-Area Estimation 1389

Using (4) again satisfiessumD

d=1 wdtYprodt = sumD

d=1 wdtYcpsdt and

does not require changing the current BLS modeling practiceThis benchmarking method is often used in small-area estima-tion applications that use cross-sectional models (see eg Fayand Herriot 1979 Battese Harter and Fuller 1988 Rao 2003)However the properties of using (4) are unclear and it does notlend itself to simple parametric estimation of the variances ofthe pro-rated estimators Variance estimation is an essential re-quirement for any benchmarking procedure For the case of asingle statendashspace model (with no benchmarking) Pfeffermannand Tiller (2005) developed bootstrap variance estimators thataccount for estimation of the model parameters with bias ofcorrect order but extension of this resampling method to thepro-rated predictors defined by (4) is not straightforward andrequires a double-bootstrap procedure that is not practical inthe present context because of the very intensive computationsinvolved Another simple benchmarking procedure is a differ-ence adjustment of the form

Ydifdt = Ymodel

dt +(

Dsum

d=1

wdtYcpsdt minus

Dsum

d=1

wdtYmodeldt

) (5)

But this procedure inflates the variances of the benchmarkedestimators and has the further drawback of adding the sameamount to each of the model-dependent estimators irrespectiveof their magnitude

The benchmarking procedure developed in this article over-comes the problems mentioned with respect to procedures(4) and (5) and is not more complicated once an appropriatecomputer program is written Note in this respect that any otherldquobuilt-inrdquo benchmarking procedure would require the develop-ment of a new filtering algorithm even if the sampling errorswere left in the state vector This is so because the benchmarkerrors that need to be accounted for when computing the vari-ance of the benchmarked estimator are correlated concurrentlyand over time with the sampling errors As mentioned earlieravailable algorithms for benchmarking statendashspace models arenot suitable for handling benchmarks of the form (1) or (2)

In what follows we focus on benchmarking the CD estimates(the first step in the benchmarking process) but we commenton the adjustments required when benchmarking the state esti-mates in the second step Section 2 presents the BLS models incurrent use Section 3 develops the recursive filtering algorithmfor statendashspace models with correlated measurement errors anddiscusses its properties Section 4 shows how to incorporate thebenchmark constraints and compute the variances of the bench-marked estimators Section 5 describes the application of theproposed procedure using the unemployment series with spe-cial attention to the year 2001 when the World Trade Centerwas attacked Section 6 concludes by discussing some remain-ing problems in the application of this procedure

2 BUREAU OF LABOR STATISTICS MODELIN CURRENT USE

In this section we consider a single CD and hence we dropthe subscript d from the notation The model used by the BLScombines a model for the true CD values (total unemploy-ment or employment) with a model for the sampling errors

The model is discusses in detail including parameter estima-tion and model diagnostics in Tiller (1992) Next we provide abrief description of the model to aid understanding of the de-velopments and applications in subsequent sections

21 Model Assumed for Population Values

Let yt denote the direct CPS estimate at time t and let Yt de-note the corresponding population value so that et = yt minus Yt isthe sampling error The model is

Yt = Lt + St + It It sim N(0 σ 2I )

Lt = Ltminus1 + Rtminus1 + ηLt ηLt sim N(0 σ 2L )

Rt = Rtminus1 + ηRt ηRt sim N(0 σ 2R)

St =6sum

j=1

Sjt (6)

Sjt = cosωjSjtminus1 + sinωjSlowastjtminus1 + vjt vjt sim N(0 σ 2

S )

Slowastjt = minus sinωjSjtminus1 + cosωjS

lowastjtminus1 + vlowast

jt vlowastjt sim N(0 σ 2

S )

ωj = 2π j

12 j = 1 6

The model defined by (6) is known in the time series litera-ture as the basic structural model (BSM) with Lt Rt St and It

defining the trend level slope seasonal effect and ldquoirregularrdquoterm operating at time t The error terms It ηLt ηRt νjt and νlowast

jtare independent white noise series The model for the trend ap-proximates a local linear trend whereas the model for the sea-sonal effect uses the classical decomposition of the seasonalcomponent into six subcomponents Sjt that represent the con-tribution of the cyclical functions corresponding to the six fre-quencies (harmonics) of a monthly seasonal series The addednoise permits the seasonal effects to evolve stochastically overtime but in such a way that the expectation of the sum of 12 suc-cessive seasonal effects is 0 (See Harvey 1989 for discussionof the BSM)

Remark 3 The future state models (but not the CD models)will be bivariate with the other dependent variable in each statebeing ldquothe number of persons in the state receiving unemploy-ment insurance benefitsrdquo when modeling the unemployment es-timates and ldquothe number of payroll jobs in business establish-mentsrdquo when modeling the employment estimates These twoseries are census figures with no sampling errors The bench-marking will be applied only to the unemployment and employ-ment estimates

22 Model Assumed for the Sampling Errors

The CPS variance of the sampling error varies with the levelof the series Denoting v2

t = var(et) the model assumed for thestandardized residuals elowast

t = etvt is AR(15) which is used as anapproximation to the sum of an MA(15) process and an AR(2)process The MA(15) process accounts for the autocorrelationsimplied by the sample overlap underlying the CPS samplingdesign By this design households selected to the sample aresurveyed for 4 successive months are left out of the samplefor the next 8 months and then are surveyed again for 4 moremonths The AR(2) model accounts for the autocorrelations

1390 Journal of the American Statistical Association December 2006

arising from the fact that households dropped from the surveyare replaced by households from the same ldquocensus tractrdquo Thereduced ARMA representation of the sum of the two processesis ARMA(217) which is approximated by an AR(15) model

The separate models holding for the population values andthe sampling errors are cast into a single statendashspace model forthe observations yt (the CPS estimates) of the form

yt = Ztαlowastt αlowast

t = Tαlowasttminus1 + ηlowast

t

E(ηlowastt ) = 0E(ηlowast

t ηlowastprimet ) = Qlowast (7)

with the first equation to the left defining the measurement(observation) equation and the second equation defining thestate (transition) equation The state vector αlowast

t consists ofthe trend level Lt the slope Rt the 11 seasonal coefficientsSjtSlowast

kt j = 1 6 k = 1 5 the irregular term It andthe concurrent and 14 lags of the sampling errorsmdasha total of29 elements The sampling errors and the irregular term are in-cluded in the state vector so that there is no residual error termin the observation equation Note in this respect that includingthe sampling errors in the measurement equation as describedin Section 1 and pursued later instead of the current practice ofincluding them in the state equation does not change the modelholding for the observations as long as the model fitted for thesampling errors reproduces their autocovariances

The parameters indexing the model are estimated separatelyfor each CD in two steps In the first step the AR(15) coeffi-cients and residual variance are estimated by solving the corre-sponding YulendashWalker equations (The sampling error varianceand autocorrelations are estimated externally by the BLS) Inthe second step the remaining model variances are estimatedby maximizing the model likelihood with the AR(15) modelparameters held fixed at their estimated values (see Tiller 1992for details)

The monthly employment and unemployment estimates pub-lished by the BLS are obtained under the model (7) and therelationship Yt = yt minus et as

Yt = yt minus et = Lt + St + It (8)

where Lt St and It denote the estimated components at time tas obtained by application of the Kalman filter (Harvey 1989)

3 FILTERING OF STATEndashSPACE MODELS WITHAUTOCORRELATED MEASUREMENT ERRORS

In this section we develop a recursive filtering algorithm forstatendashspace models with autocorrelated measurement errors Bythis we mean an algorithm that updates the most recent predic-tor of the state vector every time that a new observation be-comes available This filter is required for implementing thebenchmarking procedure discussed in Section 1 (see Sec 4)but as already mentioned it is general and can be used forother applications of statendashspace models with autocorrelatedmeasurement errors In Section 32 we discuss the propertiesof the proposed filter

31 Recursive Filtering Algorithm

Consider the following linear statendashspace model for a (possi-bly vector) time series yt

Measurement equation yt = Ztαt + et

E(et) = 0E(eteprimet) = ttE(eτ eprime

t) = τ t (9a)

Transition equation αt = Tαtminus1 + ηt

E(ηt) = 0E(ηtηprimet) = QE(ηtη

primetminusk) = 0 k gt 0 (9b)

It is also assumed that E(ηteprimeτ ) = 0 for all t and τ (The re-

striction to time-invariant matrices T and Q is for convenienceextension of the filter to models that contain fixed effects in theobservation equation is straightforward) Clearly what distin-guishes this model from the standard linear statendashspace modelis that the measurement errors et (the sampling errors in theBLS model) are correlated over time Note in particular thatunlike the BLS model representation in (7) where the samplingerrors and the irregular term are part of the state vector so thatthere are no measurement errors in the observation equationthe measurement (sampling) errors now are featured in the ob-servation equation The recursive filtering algorithm developedherein takes into account the autocovariance matrices τ t

At Time 1 Let α1 = (I minus K1Z1)Tα0 + K1y1 be the fil-tered state estimator at time 1 where α0 is an initial estima-tor with covariance matrix P0 = E[(α0 minus α0)(α0 minus α0)

prime] andK1 = P1|0Zprime

1Fminus11 is the ldquoKalman gainrdquo We assume for conve-

nience that α0 is independent of the observations The matrixP1|0 = TP0Tprime + Q is the covariance matrix of the predictionerrors α1|0 minus α1 = Tα0 minus α1 and F1 = Z1P1|0Zprime

1 + 11 is thecovariance matrix of the innovations (one-step-ahead predictionerrors) ν1 = y1 minus y1|0 = y1 minus Z1α1|0 Because y1 = Z1α1 + e1

α1 = (I minus K1Z1)Tα0 + K1Z1α1 + K1e1 (10)

At Time 2 Let α2|1 = Tα1 define the predictor of α2 attime 1 with covariance matrix P2|1 = E[(α2|1 minus α2)(α2|1 minusα2)

prime] An unbiased predictor α2 of α2 [ie E(α2 minus α2) = 0]based on α2|1 and the observation y2 is the generalized leastsquares (GLS) predictor in the regression model(

Tα1y2

)=

(I

Z2

)α2 +

(u2|1e2

) u2|1 = Tα1 minus α2 (11)

that is

α2 =[(IZprime

2)Vminus12

(I

Z2

)]minus1

(IZprime2)V

minus12

(Tα1y2

) (12)

where

V2 = var

(u2|1e2

)=

[P2|1 C2

Cprime2 22

](13)

and C2 = cov[Tα1 minus α2 e2] = TK112 [follows from(9a) and (10)] Note that V2 is the covariance matrix of theerrors u2|1 and e2 and not of the predictors Tα1 and y2 As dis-cussed later and proved in Appendix A the GLS predictor α2 isthe best linear unbiased predictor (BLUP) of α2 based on Tα1and y2 with covariance matrix

E[(α minus α2)(α minus α2)prime] =

[(IZprime

2)Vminus12

(I

Z2

)]minus1

= P2 (14)

Pfeffermann and Tiller Small-Area Estimation 1391

At Time t Let αt|tminus1 = Tαtminus1 define the predictor of αtat time t minus 1 with covariance matrix E[(αt|tminus1 minus αt)(αt|tminus1 minusαt)

prime] = TPtminus1Tprime+Q = Pt|tminus1 where Ptminus1 = E[(αtminus1 minusαtminus1)times(αtminus1 minus αtminus1)

prime] Set the random coefficients regression model(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = Tαtminus1 minus αt

(15)

and define

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 Ct

Cprimet tt

] (16)

The covariance matrix Ct = cov[Tαtminus1 minus αt et] is computedas follows Let [IZprime

j]Vminus1j = [Bj1Bj2] where Bj1 contains the

first q columns of [IZprimej]Vminus1

j and Bj2 contains the remain-ing columns with q = dim(αj) Define Aj = TPjBj1 Aj =TPjBj2 j = 2 t minus 1 A1 = TK1 Then

Ct = cov[Tαtminus1 minus αt et]= Atminus1Atminus2 middot middot middotA2A11t + Atminus1Atminus2 middot middot middotA3A22t

+ middot middot middot + Atminus1Atminus2tminus2t + Atminus1tminus1t (17)

The GLS predictor of αt based on Tαtminus1 and yt and thecovariance matrix of the prediction errors are obtained from(15) and (16) as

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

(IZprimet)V

minus1t

(Tαtminus1

yt

)

(18)

Pt = E[(αt minus αt)(αt minus αt)prime] =

[(IZprime

t)Vminus1t

(I

Zt

)]minus1

Alternatively the predictor αt can be written (App C) as

αt = Tαtminus1 + (Pt|tminus1Zprimet minus Ct)

times [ZtPt|tminus1Zprimet minus ZtCt minus Cprime

tZprimet + tt]minus1

times (yt minus ZtTαtminus1) (19)

Written in this way the predictor of αt at time t is seen to equalthe predictor of αt at time t minus 1 plus a correction factor thatdepends on the magnitude of the innovation (one-step-aheadprediction error) when predicting yt at time t minus 1

32 Properties of the Filtering Algorithm

Assuming known model parameters the recursive GLS filterdefined by (18) or (19) has the following properties

1 At every time point t the filter produces the BLUP of αtbased on the predictor αt|tminus1 = Tαtminus1 from time t minus 1and the new observation yt The BLUP property meansthat E(αt minus αt) = 0 and var[dprime(αt minus αt)] le var[dprime times(αL

t minus αt)] for every vector coefficient dprime and any otherlinear unbiased predictor of the form αL

t = L1αt|tminus1 +L2yt + l with general fixed matrices L1 and L2 and vec-tor l See Appendix A for proof of this property

2 When the measurement errors are independent the GLSfilter coincides with the familiar Kalman filter (Harvey1989) see Appendix B for the proof Thus the recursiveGLS filter can be viewed as an extension of the Kalmanfilter for the case of correlated measurement errors

3 Unlike the Kalman filter which assumes independentmeasurement errors and therefore yields the BLUP of αt

based on all the individual observations y(t) = (yprime1

yprimet)

prime (ie the BLUP out of all of the linear unbiased pre-dictors of the form

sumti=1 Jityi + jt for general fixed matri-

ces Jit and vector jt) the filter (18) yields the BLUP of αt

based on αt|tminus1 and the new observation yt

(The predictor αt|tminus1 is itself a linear combination of all theobservations until time t minus 1 but the values of the matrix co-efficients are fixed by the previous steps of the filter see alsoRemark 4)

Computing the BLUP of αt under correlated measurementerrors out of all possible unbiased predictors that are lin-ear combinations of all the individual observations in y(t)

with arbitrary coefficients [or the minimum mean squared er-ror predictor E(αt|ytytminus1 y1λ) under normality of themodel error terms where λ defines the model parameters] re-quires joint modeling of y(t) for every time t For long series thecomputations become very heavy and generally are not practi-cal in a production environment that requires routine runs ofmany series with high-dimensional state vectors in a short timeEmpirical evidence so far suggests that the loss in efficiencyfrom using the GLS algorithm instead of the BLUP based onall of the individual observations is mild Section 5 provides anempirical comparison of the two filters

Remark 4 For general covariance matrices τ t betweenthe measurement errors it is impossible to construct a re-cursive filtering algorithm that is a linear combination of thepredictor from the previous time point and the new observa-tion and is BLUP out of all unbiased predictors of the formsumt

i=1 Jityi + jt To see this consider a very simple exampleof three observations y1 y2 and y3 with common mean micro andvariance σ 2 If the three observations are independent thenthe BLUP of micro based on the first two observations is y(2) =(y1 + y2)2 and the BLUP based on the three observations isy(3) = (y1 + y2 + y3)3 = (23)y(2) + (13)y3 The BLUP y(3)

is the Kalman filter predictor for time 3 Suppose however thatcov(y1 y2) = cov(y2 y3) = σ 2ρ1 and cov(y1 y3) = σ 2ρ2 =σ 2ρ1 The BLUP of micro based on the first two observations isagain y(2) = (y1 + y2)2 but the BLUP of micro based on the threeobservations in this case is yopt

(3) = ay1 + by2 + ay3 where a =(1minusρ1)(3minus4ρ1 +ρ2) and b = (1minus2ρ1 +ρ2)(3minus4ρ1 +ρ2)Clearly because a = b the predictor yopt

(3) cannot be written as alinear combination of y(2) and y3 For example if ρ1 = 5 andρ2 = 25 then yopt

(3) = 4y1 + 2y2 + 4y3

4 INCORPORATING THEBENCHMARK CONSTRAINTS

41 Joint Modeling of Several Concurrent Estimatesand Their Weighted Mean

In this section we jointly model the concurrent CD estimatesand their weighted mean In Section 42 we show how to in-corporate the benchmark constraints and compute the variancesof the benchmarked predictors We follow the BLS modelingpractice and assume that the state vectors and the measurementerrors are independent between the CDs Section 6 considers an

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

1388 Journal of the American Statistical Association December 2006

more accurate than the corresponding CPS estimates thus jus-tifying benchmarking the state estimates to the benchmarkedCD estimates as in (2) The basic idea behind using either ofthese constraints is that if all of the direct CPS estimates jointlyincrease or decrease due to some sudden external shocks thatare not accounted for by the model then the benchmarked es-timators will reflect this change much faster than the model-dependent estimators obtained by fitting separate models foreach CD and state This property is strikingly illustrated in theapplication in Section 5 using actual CD unemployment seriesNote also that by incorporating the constraints (1) and (2) thebenchmarked estimators in a given month ldquoborrow strengthrdquoboth from past data and cross-sectionally unlike the model-dependent estimators in current use that borrow strength onlyfrom the past This property is reflected by the reduced varianceof the benchmarked estimators which is shown in Section 5

Remark 1 Unlike in classical benchmarking problems thatuse external (independent) data (eg census figures or es-timates from another large sample) for the benchmarkingprocess the model-dependent estimates on the left side of (1)are benchmarked to a weighted mean of the direct CPS esti-mates which are the input data for the models A similar prob-lem exists with the benchmarking of the state estimates in (2)External independent data to which the monthly CD or state es-timates can be benchmarked are not available even for isolatedmonths (See Hillmer and Trabelsi 1987 Doran 1992 Durbinand Quenneville 1997 for benchmarking procedures to externaldata sources in the context of statendashspace modeling)

An important question underlying use of the constraints(1) and (2) is defining the weights wdt and wst Possible de-finitions for the weights wdt in (1) are

w1dt = 1

D w2dt = NdtsumD

d=1 Ndt

(3)

w3dt = 1varcpsdtsumD

d=1[1varcpsdt ]

where Ndt and varcpsdt denote the total size of the labor force and

the variance of the direct CPS estimate in CD d at month t Us-ing the weights w1dt is appropriate when the direct estimatesare totals using w2dt is appropriate when the direct estimatesare proportions Therefore using these weights guarantees thatthe CD estimates each month add up to the correspondingnational CPS estimate By using similar weights in (2) thebenchmarked state estimates add up to the benchmarked CDestimate in each of the CDs and hence also to the nationalCPS estimate thus satisfying publication-consistency require-ments Using w3dt minimizes the variance of the benchmarksumD

d=1 wdtYcpsdt but does not satisfy publication consistency

The proposed solution of combining the individual CD orstate models into a joint statendashspace model with built-in bench-mark constrains intensifies the computations The dimensionof the state vector in the separate models is 29 (see the nextsection) implying that by fitting the model jointly to say the9 CDs the dimension of the joint state vector would be 261Fitting statendashspace models of this size puts heavy demands onCPU time and memory This creates problems in a production

environment in which all of the CD and state estimates (and nu-merous other estimates) must be produced and published eachmonth soon after the new sample data become available A pos-sible solution to this problem studied herein that also allowscomputing the variances of the benchmarked estimators is to in-clude the sampling errors as part of the error terms in the modelmeasurement equation instead of the current practice of fittinga linear time series model for the sampling errors and includingthem in the state vector Implementing this solution reduces thedimension of each of the separate state vectors by half becausethe sampling errors make up 15 elements of the state vector (seethe next section) As explained in Section 2 including the sam-pling errors in the measurement equation does not change themodel

Using this solution however does induce autocorrelated er-rors in the measurement equation of the statendashspace modelbecause as mentioned earlier the sampling errors are highlycorrelated over time This raises the need to develop a re-cursive filtering algorithm with good statistical properties forstatendashspace models with autocorrelated measurement errorsAlthough originally motivated by computational considera-tions the development of such an algorithm is of general inter-est because the common practice of fitting a linear time seriesmodel for the measurement errors when they are autocorrelatedis not always practical See the next section for the samplingerror model approximation used by the BLS To the best ofour knowledge no such algorithm has been studied previouslyPfeffermann and Burck (1990) likewise added constraints of theform (1) to a statendashspace model and developed an appropriaterecursive filtering algorithm but their model did not containautocorrelated sampling errors so the measurement errors areindependent cross-sectionally and over time

Implementation of the proposed benchmarking procedurewhich is the primary focus of this article therefore requiressolving three problems

1 Develop a general recursive filtering algorithm for statendashspace models with correlated measurement errors

2 Incorporate the benchmark constraints and compute thecorresponding benchmarked estimates (estimates of em-ployment or unemployment measures in the present ap-plication)

3 Compute the variances of the benchmarked estimators

We emphasize with regard to the third problem that the con-straints in (1) and (2) are imposed only for computing thebenchmarked estimators not when computing the variancesof the benchmarked estimators which account for all of thesources of variability including the errors in the benchmarkequations As explained later it would have been necessaryto develop a new filtering algorithm for computing the correctvariances even if the sampling errors were left in the state vec-tor

Remark 2 An alternative and simpler way of enforcing thebenchmark constraints is by pro-rating the separate model-dependent estimates each month such that for CDs for exam-ple

Yprodt = Ymodel

dt

(Dsum

d=1

wdtYcpsdt

Dsum

d=1

wdtYmodeldt

) (4)

Pfeffermann and Tiller Small-Area Estimation 1389

Using (4) again satisfiessumD

d=1 wdtYprodt = sumD

d=1 wdtYcpsdt and

does not require changing the current BLS modeling practiceThis benchmarking method is often used in small-area estima-tion applications that use cross-sectional models (see eg Fayand Herriot 1979 Battese Harter and Fuller 1988 Rao 2003)However the properties of using (4) are unclear and it does notlend itself to simple parametric estimation of the variances ofthe pro-rated estimators Variance estimation is an essential re-quirement for any benchmarking procedure For the case of asingle statendashspace model (with no benchmarking) Pfeffermannand Tiller (2005) developed bootstrap variance estimators thataccount for estimation of the model parameters with bias ofcorrect order but extension of this resampling method to thepro-rated predictors defined by (4) is not straightforward andrequires a double-bootstrap procedure that is not practical inthe present context because of the very intensive computationsinvolved Another simple benchmarking procedure is a differ-ence adjustment of the form

Ydifdt = Ymodel

dt +(

Dsum

d=1

wdtYcpsdt minus

Dsum

d=1

wdtYmodeldt

) (5)

But this procedure inflates the variances of the benchmarkedestimators and has the further drawback of adding the sameamount to each of the model-dependent estimators irrespectiveof their magnitude

The benchmarking procedure developed in this article over-comes the problems mentioned with respect to procedures(4) and (5) and is not more complicated once an appropriatecomputer program is written Note in this respect that any otherldquobuilt-inrdquo benchmarking procedure would require the develop-ment of a new filtering algorithm even if the sampling errorswere left in the state vector This is so because the benchmarkerrors that need to be accounted for when computing the vari-ance of the benchmarked estimator are correlated concurrentlyand over time with the sampling errors As mentioned earlieravailable algorithms for benchmarking statendashspace models arenot suitable for handling benchmarks of the form (1) or (2)

In what follows we focus on benchmarking the CD estimates(the first step in the benchmarking process) but we commenton the adjustments required when benchmarking the state esti-mates in the second step Section 2 presents the BLS models incurrent use Section 3 develops the recursive filtering algorithmfor statendashspace models with correlated measurement errors anddiscusses its properties Section 4 shows how to incorporate thebenchmark constraints and compute the variances of the bench-marked estimators Section 5 describes the application of theproposed procedure using the unemployment series with spe-cial attention to the year 2001 when the World Trade Centerwas attacked Section 6 concludes by discussing some remain-ing problems in the application of this procedure

2 BUREAU OF LABOR STATISTICS MODELIN CURRENT USE

In this section we consider a single CD and hence we dropthe subscript d from the notation The model used by the BLScombines a model for the true CD values (total unemploy-ment or employment) with a model for the sampling errors

The model is discusses in detail including parameter estima-tion and model diagnostics in Tiller (1992) Next we provide abrief description of the model to aid understanding of the de-velopments and applications in subsequent sections

21 Model Assumed for Population Values

Let yt denote the direct CPS estimate at time t and let Yt de-note the corresponding population value so that et = yt minus Yt isthe sampling error The model is

Yt = Lt + St + It It sim N(0 σ 2I )

Lt = Ltminus1 + Rtminus1 + ηLt ηLt sim N(0 σ 2L )

Rt = Rtminus1 + ηRt ηRt sim N(0 σ 2R)

St =6sum

j=1

Sjt (6)

Sjt = cosωjSjtminus1 + sinωjSlowastjtminus1 + vjt vjt sim N(0 σ 2

S )

Slowastjt = minus sinωjSjtminus1 + cosωjS

lowastjtminus1 + vlowast

jt vlowastjt sim N(0 σ 2

S )

ωj = 2π j

12 j = 1 6

The model defined by (6) is known in the time series litera-ture as the basic structural model (BSM) with Lt Rt St and It

defining the trend level slope seasonal effect and ldquoirregularrdquoterm operating at time t The error terms It ηLt ηRt νjt and νlowast

jtare independent white noise series The model for the trend ap-proximates a local linear trend whereas the model for the sea-sonal effect uses the classical decomposition of the seasonalcomponent into six subcomponents Sjt that represent the con-tribution of the cyclical functions corresponding to the six fre-quencies (harmonics) of a monthly seasonal series The addednoise permits the seasonal effects to evolve stochastically overtime but in such a way that the expectation of the sum of 12 suc-cessive seasonal effects is 0 (See Harvey 1989 for discussionof the BSM)

Remark 3 The future state models (but not the CD models)will be bivariate with the other dependent variable in each statebeing ldquothe number of persons in the state receiving unemploy-ment insurance benefitsrdquo when modeling the unemployment es-timates and ldquothe number of payroll jobs in business establish-mentsrdquo when modeling the employment estimates These twoseries are census figures with no sampling errors The bench-marking will be applied only to the unemployment and employ-ment estimates

22 Model Assumed for the Sampling Errors

The CPS variance of the sampling error varies with the levelof the series Denoting v2

t = var(et) the model assumed for thestandardized residuals elowast

t = etvt is AR(15) which is used as anapproximation to the sum of an MA(15) process and an AR(2)process The MA(15) process accounts for the autocorrelationsimplied by the sample overlap underlying the CPS samplingdesign By this design households selected to the sample aresurveyed for 4 successive months are left out of the samplefor the next 8 months and then are surveyed again for 4 moremonths The AR(2) model accounts for the autocorrelations

1390 Journal of the American Statistical Association December 2006

arising from the fact that households dropped from the surveyare replaced by households from the same ldquocensus tractrdquo Thereduced ARMA representation of the sum of the two processesis ARMA(217) which is approximated by an AR(15) model

The separate models holding for the population values andthe sampling errors are cast into a single statendashspace model forthe observations yt (the CPS estimates) of the form

yt = Ztαlowastt αlowast

t = Tαlowasttminus1 + ηlowast

t

E(ηlowastt ) = 0E(ηlowast

t ηlowastprimet ) = Qlowast (7)

with the first equation to the left defining the measurement(observation) equation and the second equation defining thestate (transition) equation The state vector αlowast

t consists ofthe trend level Lt the slope Rt the 11 seasonal coefficientsSjtSlowast

kt j = 1 6 k = 1 5 the irregular term It andthe concurrent and 14 lags of the sampling errorsmdasha total of29 elements The sampling errors and the irregular term are in-cluded in the state vector so that there is no residual error termin the observation equation Note in this respect that includingthe sampling errors in the measurement equation as describedin Section 1 and pursued later instead of the current practice ofincluding them in the state equation does not change the modelholding for the observations as long as the model fitted for thesampling errors reproduces their autocovariances

The parameters indexing the model are estimated separatelyfor each CD in two steps In the first step the AR(15) coeffi-cients and residual variance are estimated by solving the corre-sponding YulendashWalker equations (The sampling error varianceand autocorrelations are estimated externally by the BLS) Inthe second step the remaining model variances are estimatedby maximizing the model likelihood with the AR(15) modelparameters held fixed at their estimated values (see Tiller 1992for details)

The monthly employment and unemployment estimates pub-lished by the BLS are obtained under the model (7) and therelationship Yt = yt minus et as

Yt = yt minus et = Lt + St + It (8)

where Lt St and It denote the estimated components at time tas obtained by application of the Kalman filter (Harvey 1989)

3 FILTERING OF STATEndashSPACE MODELS WITHAUTOCORRELATED MEASUREMENT ERRORS

In this section we develop a recursive filtering algorithm forstatendashspace models with autocorrelated measurement errors Bythis we mean an algorithm that updates the most recent predic-tor of the state vector every time that a new observation be-comes available This filter is required for implementing thebenchmarking procedure discussed in Section 1 (see Sec 4)but as already mentioned it is general and can be used forother applications of statendashspace models with autocorrelatedmeasurement errors In Section 32 we discuss the propertiesof the proposed filter

31 Recursive Filtering Algorithm

Consider the following linear statendashspace model for a (possi-bly vector) time series yt

Measurement equation yt = Ztαt + et

E(et) = 0E(eteprimet) = ttE(eτ eprime

t) = τ t (9a)

Transition equation αt = Tαtminus1 + ηt

E(ηt) = 0E(ηtηprimet) = QE(ηtη

primetminusk) = 0 k gt 0 (9b)

It is also assumed that E(ηteprimeτ ) = 0 for all t and τ (The re-

striction to time-invariant matrices T and Q is for convenienceextension of the filter to models that contain fixed effects in theobservation equation is straightforward) Clearly what distin-guishes this model from the standard linear statendashspace modelis that the measurement errors et (the sampling errors in theBLS model) are correlated over time Note in particular thatunlike the BLS model representation in (7) where the samplingerrors and the irregular term are part of the state vector so thatthere are no measurement errors in the observation equationthe measurement (sampling) errors now are featured in the ob-servation equation The recursive filtering algorithm developedherein takes into account the autocovariance matrices τ t

At Time 1 Let α1 = (I minus K1Z1)Tα0 + K1y1 be the fil-tered state estimator at time 1 where α0 is an initial estima-tor with covariance matrix P0 = E[(α0 minus α0)(α0 minus α0)

prime] andK1 = P1|0Zprime

1Fminus11 is the ldquoKalman gainrdquo We assume for conve-

nience that α0 is independent of the observations The matrixP1|0 = TP0Tprime + Q is the covariance matrix of the predictionerrors α1|0 minus α1 = Tα0 minus α1 and F1 = Z1P1|0Zprime

1 + 11 is thecovariance matrix of the innovations (one-step-ahead predictionerrors) ν1 = y1 minus y1|0 = y1 minus Z1α1|0 Because y1 = Z1α1 + e1

α1 = (I minus K1Z1)Tα0 + K1Z1α1 + K1e1 (10)

At Time 2 Let α2|1 = Tα1 define the predictor of α2 attime 1 with covariance matrix P2|1 = E[(α2|1 minus α2)(α2|1 minusα2)

prime] An unbiased predictor α2 of α2 [ie E(α2 minus α2) = 0]based on α2|1 and the observation y2 is the generalized leastsquares (GLS) predictor in the regression model(

Tα1y2

)=

(I

Z2

)α2 +

(u2|1e2

) u2|1 = Tα1 minus α2 (11)

that is

α2 =[(IZprime

2)Vminus12

(I

Z2

)]minus1

(IZprime2)V

minus12

(Tα1y2

) (12)

where

V2 = var

(u2|1e2

)=

[P2|1 C2

Cprime2 22

](13)

and C2 = cov[Tα1 minus α2 e2] = TK112 [follows from(9a) and (10)] Note that V2 is the covariance matrix of theerrors u2|1 and e2 and not of the predictors Tα1 and y2 As dis-cussed later and proved in Appendix A the GLS predictor α2 isthe best linear unbiased predictor (BLUP) of α2 based on Tα1and y2 with covariance matrix

E[(α minus α2)(α minus α2)prime] =

[(IZprime

2)Vminus12

(I

Z2

)]minus1

= P2 (14)

Pfeffermann and Tiller Small-Area Estimation 1391

At Time t Let αt|tminus1 = Tαtminus1 define the predictor of αtat time t minus 1 with covariance matrix E[(αt|tminus1 minus αt)(αt|tminus1 minusαt)

prime] = TPtminus1Tprime+Q = Pt|tminus1 where Ptminus1 = E[(αtminus1 minusαtminus1)times(αtminus1 minus αtminus1)

prime] Set the random coefficients regression model(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = Tαtminus1 minus αt

(15)

and define

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 Ct

Cprimet tt

] (16)

The covariance matrix Ct = cov[Tαtminus1 minus αt et] is computedas follows Let [IZprime

j]Vminus1j = [Bj1Bj2] where Bj1 contains the

first q columns of [IZprimej]Vminus1

j and Bj2 contains the remain-ing columns with q = dim(αj) Define Aj = TPjBj1 Aj =TPjBj2 j = 2 t minus 1 A1 = TK1 Then

Ct = cov[Tαtminus1 minus αt et]= Atminus1Atminus2 middot middot middotA2A11t + Atminus1Atminus2 middot middot middotA3A22t

+ middot middot middot + Atminus1Atminus2tminus2t + Atminus1tminus1t (17)

The GLS predictor of αt based on Tαtminus1 and yt and thecovariance matrix of the prediction errors are obtained from(15) and (16) as

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

(IZprimet)V

minus1t

(Tαtminus1

yt

)

(18)

Pt = E[(αt minus αt)(αt minus αt)prime] =

[(IZprime

t)Vminus1t

(I

Zt

)]minus1

Alternatively the predictor αt can be written (App C) as

αt = Tαtminus1 + (Pt|tminus1Zprimet minus Ct)

times [ZtPt|tminus1Zprimet minus ZtCt minus Cprime

tZprimet + tt]minus1

times (yt minus ZtTαtminus1) (19)

Written in this way the predictor of αt at time t is seen to equalthe predictor of αt at time t minus 1 plus a correction factor thatdepends on the magnitude of the innovation (one-step-aheadprediction error) when predicting yt at time t minus 1

32 Properties of the Filtering Algorithm

Assuming known model parameters the recursive GLS filterdefined by (18) or (19) has the following properties

1 At every time point t the filter produces the BLUP of αtbased on the predictor αt|tminus1 = Tαtminus1 from time t minus 1and the new observation yt The BLUP property meansthat E(αt minus αt) = 0 and var[dprime(αt minus αt)] le var[dprime times(αL

t minus αt)] for every vector coefficient dprime and any otherlinear unbiased predictor of the form αL

t = L1αt|tminus1 +L2yt + l with general fixed matrices L1 and L2 and vec-tor l See Appendix A for proof of this property

2 When the measurement errors are independent the GLSfilter coincides with the familiar Kalman filter (Harvey1989) see Appendix B for the proof Thus the recursiveGLS filter can be viewed as an extension of the Kalmanfilter for the case of correlated measurement errors

3 Unlike the Kalman filter which assumes independentmeasurement errors and therefore yields the BLUP of αt

based on all the individual observations y(t) = (yprime1

yprimet)

prime (ie the BLUP out of all of the linear unbiased pre-dictors of the form

sumti=1 Jityi + jt for general fixed matri-

ces Jit and vector jt) the filter (18) yields the BLUP of αt

based on αt|tminus1 and the new observation yt

(The predictor αt|tminus1 is itself a linear combination of all theobservations until time t minus 1 but the values of the matrix co-efficients are fixed by the previous steps of the filter see alsoRemark 4)

Computing the BLUP of αt under correlated measurementerrors out of all possible unbiased predictors that are lin-ear combinations of all the individual observations in y(t)

with arbitrary coefficients [or the minimum mean squared er-ror predictor E(αt|ytytminus1 y1λ) under normality of themodel error terms where λ defines the model parameters] re-quires joint modeling of y(t) for every time t For long series thecomputations become very heavy and generally are not practi-cal in a production environment that requires routine runs ofmany series with high-dimensional state vectors in a short timeEmpirical evidence so far suggests that the loss in efficiencyfrom using the GLS algorithm instead of the BLUP based onall of the individual observations is mild Section 5 provides anempirical comparison of the two filters

Remark 4 For general covariance matrices τ t betweenthe measurement errors it is impossible to construct a re-cursive filtering algorithm that is a linear combination of thepredictor from the previous time point and the new observa-tion and is BLUP out of all unbiased predictors of the formsumt

i=1 Jityi + jt To see this consider a very simple exampleof three observations y1 y2 and y3 with common mean micro andvariance σ 2 If the three observations are independent thenthe BLUP of micro based on the first two observations is y(2) =(y1 + y2)2 and the BLUP based on the three observations isy(3) = (y1 + y2 + y3)3 = (23)y(2) + (13)y3 The BLUP y(3)

is the Kalman filter predictor for time 3 Suppose however thatcov(y1 y2) = cov(y2 y3) = σ 2ρ1 and cov(y1 y3) = σ 2ρ2 =σ 2ρ1 The BLUP of micro based on the first two observations isagain y(2) = (y1 + y2)2 but the BLUP of micro based on the threeobservations in this case is yopt

(3) = ay1 + by2 + ay3 where a =(1minusρ1)(3minus4ρ1 +ρ2) and b = (1minus2ρ1 +ρ2)(3minus4ρ1 +ρ2)Clearly because a = b the predictor yopt

(3) cannot be written as alinear combination of y(2) and y3 For example if ρ1 = 5 andρ2 = 25 then yopt

(3) = 4y1 + 2y2 + 4y3

4 INCORPORATING THEBENCHMARK CONSTRAINTS

41 Joint Modeling of Several Concurrent Estimatesand Their Weighted Mean

In this section we jointly model the concurrent CD estimatesand their weighted mean In Section 42 we show how to in-corporate the benchmark constraints and compute the variancesof the benchmarked predictors We follow the BLS modelingpractice and assume that the state vectors and the measurementerrors are independent between the CDs Section 6 considers an

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

Pfeffermann and Tiller Small-Area Estimation 1389

Using (4) again satisfiessumD

d=1 wdtYprodt = sumD

d=1 wdtYcpsdt and

does not require changing the current BLS modeling practiceThis benchmarking method is often used in small-area estima-tion applications that use cross-sectional models (see eg Fayand Herriot 1979 Battese Harter and Fuller 1988 Rao 2003)However the properties of using (4) are unclear and it does notlend itself to simple parametric estimation of the variances ofthe pro-rated estimators Variance estimation is an essential re-quirement for any benchmarking procedure For the case of asingle statendashspace model (with no benchmarking) Pfeffermannand Tiller (2005) developed bootstrap variance estimators thataccount for estimation of the model parameters with bias ofcorrect order but extension of this resampling method to thepro-rated predictors defined by (4) is not straightforward andrequires a double-bootstrap procedure that is not practical inthe present context because of the very intensive computationsinvolved Another simple benchmarking procedure is a differ-ence adjustment of the form

Ydifdt = Ymodel

dt +(

Dsum

d=1

wdtYcpsdt minus

Dsum

d=1

wdtYmodeldt

) (5)

But this procedure inflates the variances of the benchmarkedestimators and has the further drawback of adding the sameamount to each of the model-dependent estimators irrespectiveof their magnitude

The benchmarking procedure developed in this article over-comes the problems mentioned with respect to procedures(4) and (5) and is not more complicated once an appropriatecomputer program is written Note in this respect that any otherldquobuilt-inrdquo benchmarking procedure would require the develop-ment of a new filtering algorithm even if the sampling errorswere left in the state vector This is so because the benchmarkerrors that need to be accounted for when computing the vari-ance of the benchmarked estimator are correlated concurrentlyand over time with the sampling errors As mentioned earlieravailable algorithms for benchmarking statendashspace models arenot suitable for handling benchmarks of the form (1) or (2)

In what follows we focus on benchmarking the CD estimates(the first step in the benchmarking process) but we commenton the adjustments required when benchmarking the state esti-mates in the second step Section 2 presents the BLS models incurrent use Section 3 develops the recursive filtering algorithmfor statendashspace models with correlated measurement errors anddiscusses its properties Section 4 shows how to incorporate thebenchmark constraints and compute the variances of the bench-marked estimators Section 5 describes the application of theproposed procedure using the unemployment series with spe-cial attention to the year 2001 when the World Trade Centerwas attacked Section 6 concludes by discussing some remain-ing problems in the application of this procedure

2 BUREAU OF LABOR STATISTICS MODELIN CURRENT USE

In this section we consider a single CD and hence we dropthe subscript d from the notation The model used by the BLScombines a model for the true CD values (total unemploy-ment or employment) with a model for the sampling errors

The model is discusses in detail including parameter estima-tion and model diagnostics in Tiller (1992) Next we provide abrief description of the model to aid understanding of the de-velopments and applications in subsequent sections

21 Model Assumed for Population Values

Let yt denote the direct CPS estimate at time t and let Yt de-note the corresponding population value so that et = yt minus Yt isthe sampling error The model is

Yt = Lt + St + It It sim N(0 σ 2I )

Lt = Ltminus1 + Rtminus1 + ηLt ηLt sim N(0 σ 2L )

Rt = Rtminus1 + ηRt ηRt sim N(0 σ 2R)

St =6sum

j=1

Sjt (6)

Sjt = cosωjSjtminus1 + sinωjSlowastjtminus1 + vjt vjt sim N(0 σ 2

S )

Slowastjt = minus sinωjSjtminus1 + cosωjS

lowastjtminus1 + vlowast

jt vlowastjt sim N(0 σ 2

S )

ωj = 2π j

12 j = 1 6

The model defined by (6) is known in the time series litera-ture as the basic structural model (BSM) with Lt Rt St and It

defining the trend level slope seasonal effect and ldquoirregularrdquoterm operating at time t The error terms It ηLt ηRt νjt and νlowast

jtare independent white noise series The model for the trend ap-proximates a local linear trend whereas the model for the sea-sonal effect uses the classical decomposition of the seasonalcomponent into six subcomponents Sjt that represent the con-tribution of the cyclical functions corresponding to the six fre-quencies (harmonics) of a monthly seasonal series The addednoise permits the seasonal effects to evolve stochastically overtime but in such a way that the expectation of the sum of 12 suc-cessive seasonal effects is 0 (See Harvey 1989 for discussionof the BSM)

Remark 3 The future state models (but not the CD models)will be bivariate with the other dependent variable in each statebeing ldquothe number of persons in the state receiving unemploy-ment insurance benefitsrdquo when modeling the unemployment es-timates and ldquothe number of payroll jobs in business establish-mentsrdquo when modeling the employment estimates These twoseries are census figures with no sampling errors The bench-marking will be applied only to the unemployment and employ-ment estimates

22 Model Assumed for the Sampling Errors

The CPS variance of the sampling error varies with the levelof the series Denoting v2

t = var(et) the model assumed for thestandardized residuals elowast

t = etvt is AR(15) which is used as anapproximation to the sum of an MA(15) process and an AR(2)process The MA(15) process accounts for the autocorrelationsimplied by the sample overlap underlying the CPS samplingdesign By this design households selected to the sample aresurveyed for 4 successive months are left out of the samplefor the next 8 months and then are surveyed again for 4 moremonths The AR(2) model accounts for the autocorrelations

1390 Journal of the American Statistical Association December 2006

arising from the fact that households dropped from the surveyare replaced by households from the same ldquocensus tractrdquo Thereduced ARMA representation of the sum of the two processesis ARMA(217) which is approximated by an AR(15) model

The separate models holding for the population values andthe sampling errors are cast into a single statendashspace model forthe observations yt (the CPS estimates) of the form

yt = Ztαlowastt αlowast

t = Tαlowasttminus1 + ηlowast

t

E(ηlowastt ) = 0E(ηlowast

t ηlowastprimet ) = Qlowast (7)

with the first equation to the left defining the measurement(observation) equation and the second equation defining thestate (transition) equation The state vector αlowast

t consists ofthe trend level Lt the slope Rt the 11 seasonal coefficientsSjtSlowast

kt j = 1 6 k = 1 5 the irregular term It andthe concurrent and 14 lags of the sampling errorsmdasha total of29 elements The sampling errors and the irregular term are in-cluded in the state vector so that there is no residual error termin the observation equation Note in this respect that includingthe sampling errors in the measurement equation as describedin Section 1 and pursued later instead of the current practice ofincluding them in the state equation does not change the modelholding for the observations as long as the model fitted for thesampling errors reproduces their autocovariances

The parameters indexing the model are estimated separatelyfor each CD in two steps In the first step the AR(15) coeffi-cients and residual variance are estimated by solving the corre-sponding YulendashWalker equations (The sampling error varianceand autocorrelations are estimated externally by the BLS) Inthe second step the remaining model variances are estimatedby maximizing the model likelihood with the AR(15) modelparameters held fixed at their estimated values (see Tiller 1992for details)

The monthly employment and unemployment estimates pub-lished by the BLS are obtained under the model (7) and therelationship Yt = yt minus et as

Yt = yt minus et = Lt + St + It (8)

where Lt St and It denote the estimated components at time tas obtained by application of the Kalman filter (Harvey 1989)

3 FILTERING OF STATEndashSPACE MODELS WITHAUTOCORRELATED MEASUREMENT ERRORS

In this section we develop a recursive filtering algorithm forstatendashspace models with autocorrelated measurement errors Bythis we mean an algorithm that updates the most recent predic-tor of the state vector every time that a new observation be-comes available This filter is required for implementing thebenchmarking procedure discussed in Section 1 (see Sec 4)but as already mentioned it is general and can be used forother applications of statendashspace models with autocorrelatedmeasurement errors In Section 32 we discuss the propertiesof the proposed filter

31 Recursive Filtering Algorithm

Consider the following linear statendashspace model for a (possi-bly vector) time series yt

Measurement equation yt = Ztαt + et

E(et) = 0E(eteprimet) = ttE(eτ eprime

t) = τ t (9a)

Transition equation αt = Tαtminus1 + ηt

E(ηt) = 0E(ηtηprimet) = QE(ηtη

primetminusk) = 0 k gt 0 (9b)

It is also assumed that E(ηteprimeτ ) = 0 for all t and τ (The re-

striction to time-invariant matrices T and Q is for convenienceextension of the filter to models that contain fixed effects in theobservation equation is straightforward) Clearly what distin-guishes this model from the standard linear statendashspace modelis that the measurement errors et (the sampling errors in theBLS model) are correlated over time Note in particular thatunlike the BLS model representation in (7) where the samplingerrors and the irregular term are part of the state vector so thatthere are no measurement errors in the observation equationthe measurement (sampling) errors now are featured in the ob-servation equation The recursive filtering algorithm developedherein takes into account the autocovariance matrices τ t

At Time 1 Let α1 = (I minus K1Z1)Tα0 + K1y1 be the fil-tered state estimator at time 1 where α0 is an initial estima-tor with covariance matrix P0 = E[(α0 minus α0)(α0 minus α0)

prime] andK1 = P1|0Zprime

1Fminus11 is the ldquoKalman gainrdquo We assume for conve-

nience that α0 is independent of the observations The matrixP1|0 = TP0Tprime + Q is the covariance matrix of the predictionerrors α1|0 minus α1 = Tα0 minus α1 and F1 = Z1P1|0Zprime

1 + 11 is thecovariance matrix of the innovations (one-step-ahead predictionerrors) ν1 = y1 minus y1|0 = y1 minus Z1α1|0 Because y1 = Z1α1 + e1

α1 = (I minus K1Z1)Tα0 + K1Z1α1 + K1e1 (10)

At Time 2 Let α2|1 = Tα1 define the predictor of α2 attime 1 with covariance matrix P2|1 = E[(α2|1 minus α2)(α2|1 minusα2)

prime] An unbiased predictor α2 of α2 [ie E(α2 minus α2) = 0]based on α2|1 and the observation y2 is the generalized leastsquares (GLS) predictor in the regression model(

Tα1y2

)=

(I

Z2

)α2 +

(u2|1e2

) u2|1 = Tα1 minus α2 (11)

that is

α2 =[(IZprime

2)Vminus12

(I

Z2

)]minus1

(IZprime2)V

minus12

(Tα1y2

) (12)

where

V2 = var

(u2|1e2

)=

[P2|1 C2

Cprime2 22

](13)

and C2 = cov[Tα1 minus α2 e2] = TK112 [follows from(9a) and (10)] Note that V2 is the covariance matrix of theerrors u2|1 and e2 and not of the predictors Tα1 and y2 As dis-cussed later and proved in Appendix A the GLS predictor α2 isthe best linear unbiased predictor (BLUP) of α2 based on Tα1and y2 with covariance matrix

E[(α minus α2)(α minus α2)prime] =

[(IZprime

2)Vminus12

(I

Z2

)]minus1

= P2 (14)

Pfeffermann and Tiller Small-Area Estimation 1391

At Time t Let αt|tminus1 = Tαtminus1 define the predictor of αtat time t minus 1 with covariance matrix E[(αt|tminus1 minus αt)(αt|tminus1 minusαt)

prime] = TPtminus1Tprime+Q = Pt|tminus1 where Ptminus1 = E[(αtminus1 minusαtminus1)times(αtminus1 minus αtminus1)

prime] Set the random coefficients regression model(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = Tαtminus1 minus αt

(15)

and define

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 Ct

Cprimet tt

] (16)

The covariance matrix Ct = cov[Tαtminus1 minus αt et] is computedas follows Let [IZprime

j]Vminus1j = [Bj1Bj2] where Bj1 contains the

first q columns of [IZprimej]Vminus1

j and Bj2 contains the remain-ing columns with q = dim(αj) Define Aj = TPjBj1 Aj =TPjBj2 j = 2 t minus 1 A1 = TK1 Then

Ct = cov[Tαtminus1 minus αt et]= Atminus1Atminus2 middot middot middotA2A11t + Atminus1Atminus2 middot middot middotA3A22t

+ middot middot middot + Atminus1Atminus2tminus2t + Atminus1tminus1t (17)

The GLS predictor of αt based on Tαtminus1 and yt and thecovariance matrix of the prediction errors are obtained from(15) and (16) as

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

(IZprimet)V

minus1t

(Tαtminus1

yt

)

(18)

Pt = E[(αt minus αt)(αt minus αt)prime] =

[(IZprime

t)Vminus1t

(I

Zt

)]minus1

Alternatively the predictor αt can be written (App C) as

αt = Tαtminus1 + (Pt|tminus1Zprimet minus Ct)

times [ZtPt|tminus1Zprimet minus ZtCt minus Cprime

tZprimet + tt]minus1

times (yt minus ZtTαtminus1) (19)

Written in this way the predictor of αt at time t is seen to equalthe predictor of αt at time t minus 1 plus a correction factor thatdepends on the magnitude of the innovation (one-step-aheadprediction error) when predicting yt at time t minus 1

32 Properties of the Filtering Algorithm

Assuming known model parameters the recursive GLS filterdefined by (18) or (19) has the following properties

1 At every time point t the filter produces the BLUP of αtbased on the predictor αt|tminus1 = Tαtminus1 from time t minus 1and the new observation yt The BLUP property meansthat E(αt minus αt) = 0 and var[dprime(αt minus αt)] le var[dprime times(αL

t minus αt)] for every vector coefficient dprime and any otherlinear unbiased predictor of the form αL

t = L1αt|tminus1 +L2yt + l with general fixed matrices L1 and L2 and vec-tor l See Appendix A for proof of this property

2 When the measurement errors are independent the GLSfilter coincides with the familiar Kalman filter (Harvey1989) see Appendix B for the proof Thus the recursiveGLS filter can be viewed as an extension of the Kalmanfilter for the case of correlated measurement errors

3 Unlike the Kalman filter which assumes independentmeasurement errors and therefore yields the BLUP of αt

based on all the individual observations y(t) = (yprime1

yprimet)

prime (ie the BLUP out of all of the linear unbiased pre-dictors of the form

sumti=1 Jityi + jt for general fixed matri-

ces Jit and vector jt) the filter (18) yields the BLUP of αt

based on αt|tminus1 and the new observation yt

(The predictor αt|tminus1 is itself a linear combination of all theobservations until time t minus 1 but the values of the matrix co-efficients are fixed by the previous steps of the filter see alsoRemark 4)

Computing the BLUP of αt under correlated measurementerrors out of all possible unbiased predictors that are lin-ear combinations of all the individual observations in y(t)

with arbitrary coefficients [or the minimum mean squared er-ror predictor E(αt|ytytminus1 y1λ) under normality of themodel error terms where λ defines the model parameters] re-quires joint modeling of y(t) for every time t For long series thecomputations become very heavy and generally are not practi-cal in a production environment that requires routine runs ofmany series with high-dimensional state vectors in a short timeEmpirical evidence so far suggests that the loss in efficiencyfrom using the GLS algorithm instead of the BLUP based onall of the individual observations is mild Section 5 provides anempirical comparison of the two filters

Remark 4 For general covariance matrices τ t betweenthe measurement errors it is impossible to construct a re-cursive filtering algorithm that is a linear combination of thepredictor from the previous time point and the new observa-tion and is BLUP out of all unbiased predictors of the formsumt

i=1 Jityi + jt To see this consider a very simple exampleof three observations y1 y2 and y3 with common mean micro andvariance σ 2 If the three observations are independent thenthe BLUP of micro based on the first two observations is y(2) =(y1 + y2)2 and the BLUP based on the three observations isy(3) = (y1 + y2 + y3)3 = (23)y(2) + (13)y3 The BLUP y(3)

is the Kalman filter predictor for time 3 Suppose however thatcov(y1 y2) = cov(y2 y3) = σ 2ρ1 and cov(y1 y3) = σ 2ρ2 =σ 2ρ1 The BLUP of micro based on the first two observations isagain y(2) = (y1 + y2)2 but the BLUP of micro based on the threeobservations in this case is yopt

(3) = ay1 + by2 + ay3 where a =(1minusρ1)(3minus4ρ1 +ρ2) and b = (1minus2ρ1 +ρ2)(3minus4ρ1 +ρ2)Clearly because a = b the predictor yopt

(3) cannot be written as alinear combination of y(2) and y3 For example if ρ1 = 5 andρ2 = 25 then yopt

(3) = 4y1 + 2y2 + 4y3

4 INCORPORATING THEBENCHMARK CONSTRAINTS

41 Joint Modeling of Several Concurrent Estimatesand Their Weighted Mean

In this section we jointly model the concurrent CD estimatesand their weighted mean In Section 42 we show how to in-corporate the benchmark constraints and compute the variancesof the benchmarked predictors We follow the BLS modelingpractice and assume that the state vectors and the measurementerrors are independent between the CDs Section 6 considers an

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

1390 Journal of the American Statistical Association December 2006

arising from the fact that households dropped from the surveyare replaced by households from the same ldquocensus tractrdquo Thereduced ARMA representation of the sum of the two processesis ARMA(217) which is approximated by an AR(15) model

The separate models holding for the population values andthe sampling errors are cast into a single statendashspace model forthe observations yt (the CPS estimates) of the form

yt = Ztαlowastt αlowast

t = Tαlowasttminus1 + ηlowast

t

E(ηlowastt ) = 0E(ηlowast

t ηlowastprimet ) = Qlowast (7)

with the first equation to the left defining the measurement(observation) equation and the second equation defining thestate (transition) equation The state vector αlowast

t consists ofthe trend level Lt the slope Rt the 11 seasonal coefficientsSjtSlowast

kt j = 1 6 k = 1 5 the irregular term It andthe concurrent and 14 lags of the sampling errorsmdasha total of29 elements The sampling errors and the irregular term are in-cluded in the state vector so that there is no residual error termin the observation equation Note in this respect that includingthe sampling errors in the measurement equation as describedin Section 1 and pursued later instead of the current practice ofincluding them in the state equation does not change the modelholding for the observations as long as the model fitted for thesampling errors reproduces their autocovariances

The parameters indexing the model are estimated separatelyfor each CD in two steps In the first step the AR(15) coeffi-cients and residual variance are estimated by solving the corre-sponding YulendashWalker equations (The sampling error varianceand autocorrelations are estimated externally by the BLS) Inthe second step the remaining model variances are estimatedby maximizing the model likelihood with the AR(15) modelparameters held fixed at their estimated values (see Tiller 1992for details)

The monthly employment and unemployment estimates pub-lished by the BLS are obtained under the model (7) and therelationship Yt = yt minus et as

Yt = yt minus et = Lt + St + It (8)

where Lt St and It denote the estimated components at time tas obtained by application of the Kalman filter (Harvey 1989)

3 FILTERING OF STATEndashSPACE MODELS WITHAUTOCORRELATED MEASUREMENT ERRORS

In this section we develop a recursive filtering algorithm forstatendashspace models with autocorrelated measurement errors Bythis we mean an algorithm that updates the most recent predic-tor of the state vector every time that a new observation be-comes available This filter is required for implementing thebenchmarking procedure discussed in Section 1 (see Sec 4)but as already mentioned it is general and can be used forother applications of statendashspace models with autocorrelatedmeasurement errors In Section 32 we discuss the propertiesof the proposed filter

31 Recursive Filtering Algorithm

Consider the following linear statendashspace model for a (possi-bly vector) time series yt

Measurement equation yt = Ztαt + et

E(et) = 0E(eteprimet) = ttE(eτ eprime

t) = τ t (9a)

Transition equation αt = Tαtminus1 + ηt

E(ηt) = 0E(ηtηprimet) = QE(ηtη

primetminusk) = 0 k gt 0 (9b)

It is also assumed that E(ηteprimeτ ) = 0 for all t and τ (The re-

striction to time-invariant matrices T and Q is for convenienceextension of the filter to models that contain fixed effects in theobservation equation is straightforward) Clearly what distin-guishes this model from the standard linear statendashspace modelis that the measurement errors et (the sampling errors in theBLS model) are correlated over time Note in particular thatunlike the BLS model representation in (7) where the samplingerrors and the irregular term are part of the state vector so thatthere are no measurement errors in the observation equationthe measurement (sampling) errors now are featured in the ob-servation equation The recursive filtering algorithm developedherein takes into account the autocovariance matrices τ t

At Time 1 Let α1 = (I minus K1Z1)Tα0 + K1y1 be the fil-tered state estimator at time 1 where α0 is an initial estima-tor with covariance matrix P0 = E[(α0 minus α0)(α0 minus α0)

prime] andK1 = P1|0Zprime

1Fminus11 is the ldquoKalman gainrdquo We assume for conve-

nience that α0 is independent of the observations The matrixP1|0 = TP0Tprime + Q is the covariance matrix of the predictionerrors α1|0 minus α1 = Tα0 minus α1 and F1 = Z1P1|0Zprime

1 + 11 is thecovariance matrix of the innovations (one-step-ahead predictionerrors) ν1 = y1 minus y1|0 = y1 minus Z1α1|0 Because y1 = Z1α1 + e1

α1 = (I minus K1Z1)Tα0 + K1Z1α1 + K1e1 (10)

At Time 2 Let α2|1 = Tα1 define the predictor of α2 attime 1 with covariance matrix P2|1 = E[(α2|1 minus α2)(α2|1 minusα2)

prime] An unbiased predictor α2 of α2 [ie E(α2 minus α2) = 0]based on α2|1 and the observation y2 is the generalized leastsquares (GLS) predictor in the regression model(

Tα1y2

)=

(I

Z2

)α2 +

(u2|1e2

) u2|1 = Tα1 minus α2 (11)

that is

α2 =[(IZprime

2)Vminus12

(I

Z2

)]minus1

(IZprime2)V

minus12

(Tα1y2

) (12)

where

V2 = var

(u2|1e2

)=

[P2|1 C2

Cprime2 22

](13)

and C2 = cov[Tα1 minus α2 e2] = TK112 [follows from(9a) and (10)] Note that V2 is the covariance matrix of theerrors u2|1 and e2 and not of the predictors Tα1 and y2 As dis-cussed later and proved in Appendix A the GLS predictor α2 isthe best linear unbiased predictor (BLUP) of α2 based on Tα1and y2 with covariance matrix

E[(α minus α2)(α minus α2)prime] =

[(IZprime

2)Vminus12

(I

Z2

)]minus1

= P2 (14)

Pfeffermann and Tiller Small-Area Estimation 1391

At Time t Let αt|tminus1 = Tαtminus1 define the predictor of αtat time t minus 1 with covariance matrix E[(αt|tminus1 minus αt)(αt|tminus1 minusαt)

prime] = TPtminus1Tprime+Q = Pt|tminus1 where Ptminus1 = E[(αtminus1 minusαtminus1)times(αtminus1 minus αtminus1)

prime] Set the random coefficients regression model(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = Tαtminus1 minus αt

(15)

and define

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 Ct

Cprimet tt

] (16)

The covariance matrix Ct = cov[Tαtminus1 minus αt et] is computedas follows Let [IZprime

j]Vminus1j = [Bj1Bj2] where Bj1 contains the

first q columns of [IZprimej]Vminus1

j and Bj2 contains the remain-ing columns with q = dim(αj) Define Aj = TPjBj1 Aj =TPjBj2 j = 2 t minus 1 A1 = TK1 Then

Ct = cov[Tαtminus1 minus αt et]= Atminus1Atminus2 middot middot middotA2A11t + Atminus1Atminus2 middot middot middotA3A22t

+ middot middot middot + Atminus1Atminus2tminus2t + Atminus1tminus1t (17)

The GLS predictor of αt based on Tαtminus1 and yt and thecovariance matrix of the prediction errors are obtained from(15) and (16) as

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

(IZprimet)V

minus1t

(Tαtminus1

yt

)

(18)

Pt = E[(αt minus αt)(αt minus αt)prime] =

[(IZprime

t)Vminus1t

(I

Zt

)]minus1

Alternatively the predictor αt can be written (App C) as

αt = Tαtminus1 + (Pt|tminus1Zprimet minus Ct)

times [ZtPt|tminus1Zprimet minus ZtCt minus Cprime

tZprimet + tt]minus1

times (yt minus ZtTαtminus1) (19)

Written in this way the predictor of αt at time t is seen to equalthe predictor of αt at time t minus 1 plus a correction factor thatdepends on the magnitude of the innovation (one-step-aheadprediction error) when predicting yt at time t minus 1

32 Properties of the Filtering Algorithm

Assuming known model parameters the recursive GLS filterdefined by (18) or (19) has the following properties

1 At every time point t the filter produces the BLUP of αtbased on the predictor αt|tminus1 = Tαtminus1 from time t minus 1and the new observation yt The BLUP property meansthat E(αt minus αt) = 0 and var[dprime(αt minus αt)] le var[dprime times(αL

t minus αt)] for every vector coefficient dprime and any otherlinear unbiased predictor of the form αL

t = L1αt|tminus1 +L2yt + l with general fixed matrices L1 and L2 and vec-tor l See Appendix A for proof of this property

2 When the measurement errors are independent the GLSfilter coincides with the familiar Kalman filter (Harvey1989) see Appendix B for the proof Thus the recursiveGLS filter can be viewed as an extension of the Kalmanfilter for the case of correlated measurement errors

3 Unlike the Kalman filter which assumes independentmeasurement errors and therefore yields the BLUP of αt

based on all the individual observations y(t) = (yprime1

yprimet)

prime (ie the BLUP out of all of the linear unbiased pre-dictors of the form

sumti=1 Jityi + jt for general fixed matri-

ces Jit and vector jt) the filter (18) yields the BLUP of αt

based on αt|tminus1 and the new observation yt

(The predictor αt|tminus1 is itself a linear combination of all theobservations until time t minus 1 but the values of the matrix co-efficients are fixed by the previous steps of the filter see alsoRemark 4)

Computing the BLUP of αt under correlated measurementerrors out of all possible unbiased predictors that are lin-ear combinations of all the individual observations in y(t)

with arbitrary coefficients [or the minimum mean squared er-ror predictor E(αt|ytytminus1 y1λ) under normality of themodel error terms where λ defines the model parameters] re-quires joint modeling of y(t) for every time t For long series thecomputations become very heavy and generally are not practi-cal in a production environment that requires routine runs ofmany series with high-dimensional state vectors in a short timeEmpirical evidence so far suggests that the loss in efficiencyfrom using the GLS algorithm instead of the BLUP based onall of the individual observations is mild Section 5 provides anempirical comparison of the two filters

Remark 4 For general covariance matrices τ t betweenthe measurement errors it is impossible to construct a re-cursive filtering algorithm that is a linear combination of thepredictor from the previous time point and the new observa-tion and is BLUP out of all unbiased predictors of the formsumt

i=1 Jityi + jt To see this consider a very simple exampleof three observations y1 y2 and y3 with common mean micro andvariance σ 2 If the three observations are independent thenthe BLUP of micro based on the first two observations is y(2) =(y1 + y2)2 and the BLUP based on the three observations isy(3) = (y1 + y2 + y3)3 = (23)y(2) + (13)y3 The BLUP y(3)

is the Kalman filter predictor for time 3 Suppose however thatcov(y1 y2) = cov(y2 y3) = σ 2ρ1 and cov(y1 y3) = σ 2ρ2 =σ 2ρ1 The BLUP of micro based on the first two observations isagain y(2) = (y1 + y2)2 but the BLUP of micro based on the threeobservations in this case is yopt

(3) = ay1 + by2 + ay3 where a =(1minusρ1)(3minus4ρ1 +ρ2) and b = (1minus2ρ1 +ρ2)(3minus4ρ1 +ρ2)Clearly because a = b the predictor yopt

(3) cannot be written as alinear combination of y(2) and y3 For example if ρ1 = 5 andρ2 = 25 then yopt

(3) = 4y1 + 2y2 + 4y3

4 INCORPORATING THEBENCHMARK CONSTRAINTS

41 Joint Modeling of Several Concurrent Estimatesand Their Weighted Mean

In this section we jointly model the concurrent CD estimatesand their weighted mean In Section 42 we show how to in-corporate the benchmark constraints and compute the variancesof the benchmarked predictors We follow the BLS modelingpractice and assume that the state vectors and the measurementerrors are independent between the CDs Section 6 considers an

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

Pfeffermann and Tiller Small-Area Estimation 1391

At Time t Let αt|tminus1 = Tαtminus1 define the predictor of αtat time t minus 1 with covariance matrix E[(αt|tminus1 minus αt)(αt|tminus1 minusαt)

prime] = TPtminus1Tprime+Q = Pt|tminus1 where Ptminus1 = E[(αtminus1 minusαtminus1)times(αtminus1 minus αtminus1)

prime] Set the random coefficients regression model(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = Tαtminus1 minus αt

(15)

and define

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 Ct

Cprimet tt

] (16)

The covariance matrix Ct = cov[Tαtminus1 minus αt et] is computedas follows Let [IZprime

j]Vminus1j = [Bj1Bj2] where Bj1 contains the

first q columns of [IZprimej]Vminus1

j and Bj2 contains the remain-ing columns with q = dim(αj) Define Aj = TPjBj1 Aj =TPjBj2 j = 2 t minus 1 A1 = TK1 Then

Ct = cov[Tαtminus1 minus αt et]= Atminus1Atminus2 middot middot middotA2A11t + Atminus1Atminus2 middot middot middotA3A22t

+ middot middot middot + Atminus1Atminus2tminus2t + Atminus1tminus1t (17)

The GLS predictor of αt based on Tαtminus1 and yt and thecovariance matrix of the prediction errors are obtained from(15) and (16) as

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

(IZprimet)V

minus1t

(Tαtminus1

yt

)

(18)

Pt = E[(αt minus αt)(αt minus αt)prime] =

[(IZprime

t)Vminus1t

(I

Zt

)]minus1

Alternatively the predictor αt can be written (App C) as

αt = Tαtminus1 + (Pt|tminus1Zprimet minus Ct)

times [ZtPt|tminus1Zprimet minus ZtCt minus Cprime

tZprimet + tt]minus1

times (yt minus ZtTαtminus1) (19)

Written in this way the predictor of αt at time t is seen to equalthe predictor of αt at time t minus 1 plus a correction factor thatdepends on the magnitude of the innovation (one-step-aheadprediction error) when predicting yt at time t minus 1

32 Properties of the Filtering Algorithm

Assuming known model parameters the recursive GLS filterdefined by (18) or (19) has the following properties

1 At every time point t the filter produces the BLUP of αtbased on the predictor αt|tminus1 = Tαtminus1 from time t minus 1and the new observation yt The BLUP property meansthat E(αt minus αt) = 0 and var[dprime(αt minus αt)] le var[dprime times(αL

t minus αt)] for every vector coefficient dprime and any otherlinear unbiased predictor of the form αL

t = L1αt|tminus1 +L2yt + l with general fixed matrices L1 and L2 and vec-tor l See Appendix A for proof of this property

2 When the measurement errors are independent the GLSfilter coincides with the familiar Kalman filter (Harvey1989) see Appendix B for the proof Thus the recursiveGLS filter can be viewed as an extension of the Kalmanfilter for the case of correlated measurement errors

3 Unlike the Kalman filter which assumes independentmeasurement errors and therefore yields the BLUP of αt

based on all the individual observations y(t) = (yprime1

yprimet)

prime (ie the BLUP out of all of the linear unbiased pre-dictors of the form

sumti=1 Jityi + jt for general fixed matri-

ces Jit and vector jt) the filter (18) yields the BLUP of αt

based on αt|tminus1 and the new observation yt

(The predictor αt|tminus1 is itself a linear combination of all theobservations until time t minus 1 but the values of the matrix co-efficients are fixed by the previous steps of the filter see alsoRemark 4)

Computing the BLUP of αt under correlated measurementerrors out of all possible unbiased predictors that are lin-ear combinations of all the individual observations in y(t)

with arbitrary coefficients [or the minimum mean squared er-ror predictor E(αt|ytytminus1 y1λ) under normality of themodel error terms where λ defines the model parameters] re-quires joint modeling of y(t) for every time t For long series thecomputations become very heavy and generally are not practi-cal in a production environment that requires routine runs ofmany series with high-dimensional state vectors in a short timeEmpirical evidence so far suggests that the loss in efficiencyfrom using the GLS algorithm instead of the BLUP based onall of the individual observations is mild Section 5 provides anempirical comparison of the two filters

Remark 4 For general covariance matrices τ t betweenthe measurement errors it is impossible to construct a re-cursive filtering algorithm that is a linear combination of thepredictor from the previous time point and the new observa-tion and is BLUP out of all unbiased predictors of the formsumt

i=1 Jityi + jt To see this consider a very simple exampleof three observations y1 y2 and y3 with common mean micro andvariance σ 2 If the three observations are independent thenthe BLUP of micro based on the first two observations is y(2) =(y1 + y2)2 and the BLUP based on the three observations isy(3) = (y1 + y2 + y3)3 = (23)y(2) + (13)y3 The BLUP y(3)

is the Kalman filter predictor for time 3 Suppose however thatcov(y1 y2) = cov(y2 y3) = σ 2ρ1 and cov(y1 y3) = σ 2ρ2 =σ 2ρ1 The BLUP of micro based on the first two observations isagain y(2) = (y1 + y2)2 but the BLUP of micro based on the threeobservations in this case is yopt

(3) = ay1 + by2 + ay3 where a =(1minusρ1)(3minus4ρ1 +ρ2) and b = (1minus2ρ1 +ρ2)(3minus4ρ1 +ρ2)Clearly because a = b the predictor yopt

(3) cannot be written as alinear combination of y(2) and y3 For example if ρ1 = 5 andρ2 = 25 then yopt

(3) = 4y1 + 2y2 + 4y3

4 INCORPORATING THEBENCHMARK CONSTRAINTS

41 Joint Modeling of Several Concurrent Estimatesand Their Weighted Mean

In this section we jointly model the concurrent CD estimatesand their weighted mean In Section 42 we show how to in-corporate the benchmark constraints and compute the variancesof the benchmarked predictors We follow the BLS modelingpractice and assume that the state vectors and the measurementerrors are independent between the CDs Section 6 considers an

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

1392 Journal of the American Statistical Association December 2006

extension of the joint model that allows for cross-sectional cor-relations between corresponding components of the state vec-tors operating in different CDs

Suppose that the models underlying the CD estimates are asin (9) with the correlated measurement (sampling) errors in-cluded in the observation equation In what follows we addthe subscript d to all of the model components to distinguishbetween the CDs The (direct) estimates ydt and the measure-ment errors edt are now scalars and Zdt is a row vector (here-inafter denoted as zprime

dt) Let yt = (y1t ydtsumD

d=1 wdtydt)prime

define the concurrent estimates for the D CDs and theirweighted mean [the right side of the benchmark equation (1)]The corresponding vector of measurement errors is et =(e1t eDt

sumDd=1 wdtedt)

prime Let Zlowastt = ID otimes zprime

dt (a block-diagonal matrix with zprime

dt as the dth block) Tt = ID otimes T Zt =[Zlowast

tw1tzprime

1tmiddotmiddotmiddotwdtzprimeDt

] αt = (αprime

1t αprimeDt)

prime and ηt = (ηprime1t η

primeDt)

primeBy (9) and the independence of the state vectors and measure-ment errors between CDs the joint model holding for yt is

yt = Ztαt + et

(20a)E(et) = 0 E(eτ eprime

t) = τ t =[

τ t hτ t

hprimetτ ντ t

]

αt = Tαtminus1 + ηtE(ηt) = 0 E(ηtη

primet) = ID otimes Qd = Q (20b)

E(ηtηprimetminusk) = 0 k gt 0

where

τ t = diag[σ1τ t σDτ t) σdτ t = cov[edτ edt)

ντ t =Dsum

d=1

wdτ wdtσdτ t = cov

[Dsum

d=1

wdτ edτ

Dsum

d=1

wdtedt

]

hτ t = (h1τ t hDτ t)prime

hdτ t = wdtσdτ t = cov

[edτ

Dsum

d=1

wdtedt

]

Remark 5 The model (20a)ndash(20b) is the same as the separatemodels defined by (9a)ndash(9b) for the different CDs Adding themodel for

sumDd=1 wdtydt to the observation equation provides no

new information

42 Recursive Filtering With Correlated Measurementsand Benchmark Constraints

To compute the benchmarked predictors we apply the recur-sive GLS filter (19) to the joint model defined by (20a)ndash(20b)setting the variance of the benchmarked errors

sumDd=1 wdtedt

to 0 The idea behind this procedure is as follows By(8) and (20) the model-dependent predictor of the signal (truepopulation value) in CD d at time t takes the form Ymodel

dt =zprime

dtαdt Thus the benchmark constraints (1) can be written as

Dsum

d=1

wdtzprimedtαdt =

Dsum

d=1

wdtydt t = 12 (21)

In contrast under the model (20)

Dsum

d=1

wdtydt =Dsum

d=1

wdtzprimedtαdt +

Dsum

d=1

wdtedt (22)

and so a simple way of satisfying the benchmark constraintsis by imposing

sumDd=1 wdtydt = sumD

d=1 wdtzprimedtαdt or equivalently

by setting

var

[Dsum

d=1

wdtedt

]= cov

[edt

Dsum

d=1

wdtedt

]= 0

d = 1 D t = 12 (23)

It is important to emphasize that using (23) is just a conve-nient technical way of forcing the constraints and hence com-puting the benchmarked predictors In Appendix D we showhow to compute the true variances of the benchmarked predic-tors accounting in particular for the errors

sumDd=1 wdtedt in the

right side of (22) [and no longer imposing (23)] The imposi-tion of (23) in the GLS filter (19) is implemented by replacingthe covariance matrix tt of the measurement equation (9a) by

the matrix lowasttt =

[tt 0(D)

0prime(D)

0

] where 0(D) is the null vector of or-

der D and setting the elements of the last column of the matrixCbmk

t = cov[Tαbmktminus1 minus αt et] to 0

Application of the GLS filter (19) to the model (9) with thebenchmark constraints imposed by application of (23) yieldsthe vector of CD benchmarked estimators

Zlowastt α

bmkt = Zlowast

t

Tαbmk

tminus1 + (Pbmkt|tminus1Zprime

t minus Cbmkt0 )

times [ZtPbmk

t|tminus1Zprimet minus ZtCbmk

t0 minus Cbmkprimet0 Zprime

t + lowasttt

]minus1

times (yt minus ZtTαbmktminus1 )

(24)

where Zlowastt = ID otimes zprime

dtPbmkt|tminus1 = TPbmk

tminus1 Tprime + QPbmktminus1 = E[(αtminus1 minus

αbmktminus1 )(αtminus1 minus αbmk

tminus1 )prime] and by defining et0 = (e1t eDt0)primeCbmk

t0 = cov[Tαbmktminus1 minus αt et0]

Remark 6 The matrix Pbmkt|tminus1 = E[(Tαbmk

tminus1 minus αt)(Tαbmktminus1 minus

αt)prime] is the true prediction error covariance matrix See Appen-

dix D for the computation of Pbmkt = E[(αt minus αbmk

t )(αt minus αbmkt )prime]

and Cbmkt = cov[Tαbmk

tminus1 minus αt et] under the model (9) withoutthe constraints The matrix Pbmk

t accounts for the variability ofthe state vector components and the variances and covariancesof the sampling errors (defining the matrices tt and Cbmk

t )Note again that the benchmarking filter developed in this ar-

ticle and particularly the covariance matrix Pbmkt is different

from the statendashspace benchmarking filters developed in otherworks where the series observations are benchmarked to ex-ternal (independent) data sources For example Doran (1992)considered benchmark constraints (possibly a set) of the formRtαt = rt where the rtrsquos are constants In our case the bench-marks

sumDd=1 wdtydt are random and depend heavily on the sumsumD

d=1 wdtzprimedtαdt Another difference between the present filter

and the other filters developed in the context of statendashspacemodeling is the accounting for correlated measurement errorsin the present filter

43 Simulation Results

To test whether the proposed benchmarking algorithm withcorrelated measurement errors as developed in Section 42 per-forms properly we designed a small simulation study Thestudy consists of simulating 10000 time series of length 45 foreach of 3 models and computing the simulation variances of

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

Pfeffermann and Tiller Small-Area Estimation 1393

the benchmark prediction errors (αbmkt minus αt) which we com-

pare with the true variances under the model Pbmkt = E[(αbmk

t minusαt)(α

bmkt minus αt)

prime] as defined by (D3) in Appendix D We alsocompare the simulation covariances cov[Tαbmk

tminus1 minus αt et] withthe true covariances Cbmk

t under the model as defined by (D4)The models used for generating the three sets of time serieshave the general form

ydt = αdt + edt

αdt = αdtminus1 + ηdt

edt = εdt + 55εdtminus1 + 30εdtminus2 + 10εdtminus3

d = 123 (25)

where ηdt and εdt are independent white noise series Therandom-walk variances for the three models are var(ηdt) =(01 8812) The corresponding measurement error vari-ances are var(edt) = (30 08121) with autocorrelationscorr(et etminus1) = 53 corr(et etminus2) = 25 and corr(et etminus3) =07 (same autocorrelations for the three models) The bench-mark constraints are

α1t + α2t + α3t = y1t + y2t + y3t t = 1 45 (26)

Table 1 shows for each of the three series the valuesof the true variance pd45 = E(αbmk

d45 minus αd45)2 and covari-

ance cd45 = cov[αbmkd44 minus αd45 ed45] for the last time point

(t = 45) along with the corresponding simulation variancepSim

d45 = sum10000k=1 (α

bmk(k)d45 minus α

(k)d45)

2104 and covariance cSimd45 =

sum10000k=1 (α

bmk(k)d44 minus α

(k)d45)e

(k)d45104 where the superscript k indi-

cates the kth simulationThe results presented in Table 1 show a close fit even for

the third model where both the random-walk variance and themeasurement error variance are relatively high These resultssupport the theoretical expressions

44 Benchmarking of State Estimates

We considered so far (and again in Sec 5) the benchmarkingof the CD estimates but as discussed in Section 1 the secondstep of the benchmarking process is to benchmark the state esti-mates to the corresponding benchmarked CD estimate obtainedin the first step using the constraints (2) This step is carriedout similarly to the benchmarking of the CD estimates but itrequires changing the computation of the covariance matricesτ t = E(eτ eprime

t) [see (20a)] and hence the computation of the co-variances Cbmk

t = cov[Tαbmktminus1 minus αt et] [see (D4)] and the vari-

ance matrices Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime] [see (D3)]

Table 1 Theoretical and Simulation Variances and Covariances Underthe Model (25) and Benchmark (26) for Last Time Point (t = 45)

10000 Simulations

Variances Covariances

Theoretical Simulation Theoretical SimulationSeries pd45 psim

d45 cd45 csimd45

1 274 276 039 0412 1122 1119 615 6143 337 344 063 068

The reason why the matrices τ t have to be computed dif-ferently when benchmarking the states is that in this case thebenchmarking is carried out by imposing the constraints

sum

sisind

wstzprimestαst = zprime

dtαbmkdt = zprime

dtαdt + zprimedt(α

bmkdt minus αdt) (27)

so that the benchmarking error is now ebmkdt = zprime

dt(αbmkdt minus αdt)

which has a different structure than the benchmarking errorsumDd=1 wdtedt underlying (1) [see (22)] The vectors et now have

the form edt = (ed

1t edSt ebmk

dt ) where edst is the sampling

error for state s belonging to CD d The computation of thematrices τ t = E(ed

t edprimet ) is carried out by expressing the er-

rors lbmkdt = αbmk

dt minus αdt when benchmarking the CDs as linearcombinations of the concurrent and past CD sampling errorsej = (e1j eDj

sumDd=1 wdjedj)

prime and the concurrent and pastmodel residuals ηj = αj minus Tαjminus1 [see (20b)] j = 12 tNote that the CD sampling error is a weighted average of thestate sampling errors

edj = ydj minus Ydj =sum

sisind

wsj(ysj minus Ysj) =sum

sisind

wsjedsj (28)

5 ESTIMATION OF UNEMPLOYMENT IN CENSUSDIVISIONS WITH BENCHMARKING

In what follows we apply the benchmarking methodology de-veloped in Section 42 for estimating total unemployment in thenine CDs for the period January 1998ndashDecember 2003 Verysimilar results and conclusions are obtained when estimatingthe CD total employment The year 2001 is of special interestbecause it was affected by the start of a recession in March andthe attack on the World Trade Center in September These twoevents provide a good test for the performance of the bench-marking method

It was mentioned in Section 3 that the loss in efficiencyfrom using the recursive GLS filter (19) instead of the optimalpredictors based on all the individual observations (CPS esti-mates in our case) is mild (no benchmarking in either case)Table 2 shows for each division the means and standard devi-ations (STDs) of the monthly ratios between the STD of theGLS and the STD of the optimal predictor when predicting thetotal unemployment Yt and the trend levels Lt [see (6)] As canbe seen the largest loss in efficiency measured by the meansis 3

Next we combined the individual division models into thejoint model (20) The benchmark constraints are as defined

Table 2 Means and STDs (in parentheses) of Monthly Ratios Betweenthe STD of the GLS Predictor and the STD of the Optimal Predictor

1998ndash2003

Prediction of Prediction ofDivision unemployment trend

New England 103(002) 102(002)Middle Atlantic 102(002) 102(002)East North Central 100(001) 100(001)West North Central 102(002) 102(002)South Atlantic 102(001) 102(001)East South Central 100(001) 100(001)West South Central 102(001) 101(001)Mountain 103(002) 103(002)Pacific 102(001) 102(001)

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

1394 Journal of the American Statistical Association December 2006

Figure 1 Monthly Total Unemployment National CPS and Sumof Unbenchmarked Division Model Estimates (numbers in 100000)( CPS DivModels)

in (1) with wdt = 1 such that the model-dependent predictors ofthe CD total unemployment are benchmarked to the CPS esti-mator of national unemployment The CV of the latter estimatoris 2

Figure 1 compares the monthly sum of the model-dependentpredictors in the nine divisions without benchmarking withthe corresponding CPS national unemployment estimate In thefirst part of the observation period the sum of the model predic-tors is close to the corresponding CPS estimate In 2001 how-ever there is evidence of systematic model underestimationwhich as explained earlier results from the start of a recessionin March and the attack on the World Trade Center in Septem-ber The bias of the model-dependent predictors is further high-lighted in Figure 2 which plots the differences between the twosets of estimators As can be seen all of the differences fromMarch 2001 to April 2002 are negative and in some monthsthe absolute difference is larger than twice the standard error(SE) of the CPS estimator Successive negative differences arealready observed in the second half of 2000 but they are muchsmaller in absolute value than the differences occurring afterMarch 2001

Figures 3ndash5 show the direct CPS estimators the unbench-marked predictors and the benchmarked predictors for threeout of the nine census divisions We restrict the graph to theperiod January 2000ndashJanuary 2003 to better illuminate how

Figure 2 Monthly Total Unemployment Difference Between Sum ofUnbenchmarked Division Model Estimates and National CPS (numbersin 100000) [ difference + (minus)2timesSE(CPS)]

Figure 3 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment South Atlantic Division (numbers in100000) ( CPS BMK UnBMK)

the benchmarking corrects in real time the underestimation ofthe unbenchmarked predictors in the year 2001 Similar correc-tions are observed for the other divisions except New England(not shown) where the benchmarked predictors have a slightlylarger positive bias than the unbenchmarked predictors This isexplained by the fact that unlike in the other eight divisions inthis division the unbenchmarked predictors in 2001 are actuallyhigher than the CPS estimators

Another important conclusion that can be reached from Fig-ures 3ndash5 is that imposing the benchmark constraints in regularperiods affects the predictors only very mildly This is ex-pected because in regular periods the benchmark constraintsare approximately satisfied under the correct model even with-out imposing them directly In fact the degree to which the un-benchmarked predictors satisfy the constraints can be viewedas a model diagnostic tool To further illustrate this point Ta-ble 3 gives for each of the nine CDs the means and STDs ofthe monthly ratios between the benchmarked predictor and theunbenchmarked predictor separately for 1998ndash2003 excluding2001 and also for 2001 As can be seen in 2001 some of themean ratios are about 4 but in the other years the means neverexceed 1 demonstrating that in normal periods the effect ofbenchmarking on the separate model-dependent predictors isindeed very mild

Figure 4 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment East South Central Division (numbersin 100000) ( CPS BMK UnBMK)

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

Pfeffermann and Tiller Small-Area Estimation 1395

Figure 5 CPS Benchmarked and Unbenchmarked Monthly Esti-mates of Total Unemployment Pacific Division (numbers in 100000)( CPS BMK UnBMK)

Finally in Section 1 we mentioned that by imposing thebenchmark constraints the predictor in any given area ldquobor-rows strengthrdquo from other areas The effect of this property isdemonstrated by comparing the STDs of the benchmarked pre-dictors with the STDs of the unbenchmarked predictors Fig-ures 6ndash8 show the two sets of STDs along with the STDs ofthe direct CPS estimators for the same three divisions as inFigures 3ndash5 Note that the STDs of the CPS estimators are withrespect to the distribution of the sampling errors over repeatedsampling whereas the STDs of the two other predictors also ac-count for the variability of the state vector components As canbe seen the STDs of the benchmarked predictors are systemati-cally lower than the STDs of the unbenchmarked predictors forall of the months and both sets of STDs are much lower thanthe STDs of the corresponding CPS estimators This pattern re-peats itself in all of the divisions Table 4 further compares theSTDs of the benchmarked and unbenchmarked predictors Themean ratios in Table 4 show gains in efficiency of up to 15 insome of the divisions using benchmarking The ratios are verystable throughout the years despite the fact that the STDs of thetwo sets of the model-based predictors vary between monthsdue to changes in the STDs of the sampling errors see alsoFigures 6ndash8

6 CONCLUDING REMARKS AND OUTLINE OFFURTHER RESEARCH

Agreement of small-area modelndashdependent estimators withthe direct sample estimate in a ldquolarge areardquo that contains the

Table 3 Means and STDs (in parentheses) of Ratios Between theBenchmarked and Unbenchmarked Predictors of Total

Unemployment in Census Divisions

Division 1998ndash2000 2002ndash2003 2001

New England 101(016) 103(017)Middle Atlantic 100(014) 103(019)East North Central 100(013) 103(013)West North Central 101(014) 103(011)South Atlantic 100(016) 104(016)East South Central 101(016) 103(014)West South Central 100(014) 104(022)Mountain 100(011) 102(011)Pacific 100(017) 104(020)

Figure 6 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment South Atlantic Division (numbersin 10000) ( CPS BMK UnBMK)

small areas is a common requirement for statistical agenciesproducing official statistics (See eg Fay and Herriot 1979Battese et al 1988 Pfeffermann and Barnard 1991 Rao 2003for proposed modifications to meet this requirement) Thepresent article has shown how this requirement can be imple-mented using statendashspace models As explained in Section 1benchmarking constraints of the form (1) or (2) cannot be im-posed within the standard Kalman filter but instead require thedevelopment of a filter that produces the correct variances ofthe benchmarked estimators under the model The filter devel-oped in this article has the further advantage of being applica-ble in situations where the measurement errors are correlatedover time The GLS filter developed for the case of correlatedmeasurement errors (without incorporating the benchmark con-straints) produces the BLUP of the state vector at time t out ofall of the predictors that are linear combinations of the state pre-dictor from time t minus 1 and the new observation at time t Whenthe measurement errors are independent over time the GLS fil-ter is the same as the Kalman filter An important feature ofthe proposed benchmarking procedure illustrated in Section 5is that by jointly modeling a large number of areas and impos-ing the benchmark constraints the benchmarked predictor inany given area borrows strength from other areas resulting inreduced variance

Figure 7 STD of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment East South Central Division(numbers in 10000) ( CPS BMK UnBMK)

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

1396 Journal of the American Statistical Association December 2006

Figure 8 STDs of CPS Benchmarked and Unbenchmarked Esti-mates of Total Monthly Unemployment Pacific Division (numbers in10000) ( CPS BMK UnBMK)

We are presently studying the benchmarking of the state es-timates as outlined in Sections 1 and 44 Another importanttask is the development of a smoothing algorithm that accountsfor correlated measurement errors and incorporates the bench-marking constraints Clearly as new data accumulate it is de-sirable to modify past predictors this is particularly importantfor trend estimation An appropriate ldquofixed-pointrdquo smoother hasbeen constructed and is currently being tested

Finally the present BLS models assume that the state vectorsoperating in different CDs or states are independent It seemsplausible however that changes in the trend or seasonal effectsover time are correlated between neighboring CDs and evenmore so between states within the same CD Accounting forthese correlations within the joint model defined by (20) is sim-ple and might further improve the efficiency of the predictors

APPENDIX A PROOF OF THE BEST LINEARUNBIASED PREDICTOR PROPERTY OF THE

GENERALIZED LEAST SQUARESPREDICTOR αt [EQ (18)]

The model holding for Yt = (αprimet|tminus1yprime

t)prime is

Yt =(

αt|tminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

) ut|tminus1 = αt|tminus1 minus αt

where αt|tminus1 is the predictor of αt from time t minus 1 Denote ut =(uprime

t|tminus1 eprimet)

primeXt = [IZprimet]prime In what follows all of the expectations are

over the joint distribution of the observations Yt and the state vec-tors αt so that E(ut) = 0E(utuprime

t) = Vt [see (16)] A predictor αlowastt is

unbiased for αt if E(αlowastt minus αt) = 0

Table 4 Means and STDs (in parentheses) of RatiosBetween STDs of Benchmarked and Unbenchmarked

Predictors of Total Unemployment in CensusDivisions 1998ndash2003

Division Means (STDs)

New England 85(013)Middle Atlantic 94(005)East North Central 96(004)West North Central 89(008)South Atlantic 96(004)East South Central 86(007)West South Central 92(006)Mountain 88(009)Pacific 96(004)

Theorem The predictor αt = (XprimetV

minus1t Xt)

minus1XprimetV

minus1t Yt = PtXprime

t timesVminus1

t Yt is the BLUP of αt in the sense of minimizing the predictionerror variance out of all the unbiased predictors that are linear combi-nations of αt|tminus1 and yt

Proof αt minus αt = PtXprimetV

minus1t (Xtαt + ut) minus αt = PtXprime

tVminus1t ut so that

E(αt minus αt) = 0 and var(αt minus αt) = E[(αt minus αt)(αt minus αt)prime] = Pt

Clearly αt is linear in Yt Let αL

t = L1αt|tminus1 + L2yt + l = LYt + l be any other linear un-biased predictor of αt and define Dt = L minus PtXprime

tVminus1t such that L =

Dt + PtXprimetV

minus1t Because αL

t is unbiased for αt

E(αLt minus αt) = E[(Dt + PtXprime

tVminus1t )(Xtαt + ut) + l minus αt]

= E[DtXtαt + Dtut + PtXprimetV

minus1t ut + l]

= E[DtXtαt + l] = 0

Thus for αLt to be unbiased for αt irrespective of the distribution of αt

DtXt = 0 and l = 0 which implies that var(αLt minus αt) = var[(Dt +

PtXprimetV

minus1t )(Xtαt + ut) minus αt] = var[Dtut + PtXprime

tVminus1t ut]

Hence

var(αLt minus αt) = E[(Dtut + PtXprime

tVminus1t ut)(uprime

tDprimet + uprime

tVminus1t XtPt)]

= DtVtDprimet + PtXprime

tVminus1t VtDprime

t + DtVtVminus1t XtPt + Pt

= Pt + DtVtDprimet

= var(αt minus αt) + DtVtDprimet because DtXt = 0

APPENDIX B EQUALITY OF THE GENERALIZEDLEAST SQUARE FILTER AND THE KALMAN FILTER

WHEN THE MEASUREMENT ERRORS AREUNCORRELATED OVER TIME

Consider the model yt = Ztαt + etαt = Tαtminus1 + ηt whereet and ηt are independent white noise series with var(et) = t andvar(ηt) = Qt Let αt|tminus1 = Tαtminus1 define the predictor of αt at timet minus 1 with prediction error variance matrix Pt|tminus1 assumed to bepositive definite

GLS setup at time t(

Tαtminus1yt

)=

(I

Zt

)αt +

(ut|tminus1

et

)

[see (15)] where now

Vt = var

(ut|tminus1

et

)=

[Pt|tminus1 0

0 t

]

[same as (16) except that Ct = 0]The GLS predictor at time t is

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(αt|tminus1

yt

)

= [Pminus1t|tminus1 + Zprime

tminus1t Zt]minus1[Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [Pt|tminus1 minus Pt|tminus1ZprimetF

minus1t ZtPt|tminus1][Pminus1

t|tminus1αt|tminus1 + Zprimet

minus1t yt]

= [I minus Pt|tminus1ZprimetF

minus1t Zt]αt|tminus1 + Pt|tminus1Zprime

tminus1t yt

minus Pt|tminus1Zprimet

minus1t yt + Pt|tminus1Zprime

tFminus1t yt

= αt|tminus1 + Pt|tminus1ZprimetF

minus1t (yt minus Ztαt|tminus1)

Ft = ZtPt|tminus1Zprimet + t

The last expression is the same as the Kalman filter predictor (Harvey1989 eq 323a)

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval

Pfeffermann and Tiller Small-Area Estimation 1397

APPENDIX C COMPUTATION OF αt [EQ (19)]

Consider the GLS predictor

αt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1(IZprime

t)Vminus1t

(Tαtminus1

yt

)

[see (18)] where Vt =[

Pt|tminus1 Ct

Cprimet t

] A familiar result on the inverse of

a partitioned matrix applied to the matrix Vt yields the expressions

(IZprimet)V

minus1t = [

(Pminus1t|tminus1 + Pminus1

t|tminus1CtHtCprimetP

minus1t|tminus1 minus Zprime

tHtCprimetP

minus1t|tminus1)

(minusPminus1t|tminus1CtHt + Zprime

tHt)]

(C1)

and

Pt =[(IZprime

t)Vminus1t

(I

Zt

)]minus1

= [Pminus1

t|tminus1 + (Zprimet minus Pminus1

t|tminus1Ct)Ht(Zt minus CprimetP

minus1t|tminus1)

]minus1 (C2)

where Ht = [t minus CprimetP

minus1t|tminus1Ct]minus1 It follows from (C1) and (C2) that

αt = Pt(IZprimet)V

minus1t

(Tαtminus1

yt

)

= Tαtminus1 + Pt(Zprimet minus Pminus1

t|tminus1Ct)Ht(yt minus ZtTαtminus1) (C3)

Computing the matrix in the second row of (C2) using a standard ma-trix inversion lemma (Harvey 1989 p 108) and substituting in the sec-ond row of (C3) yields after some algebra equation (19)

APPENDIX D COMPUTATION OFPbmk

t = var(αbmkt minus αt ) AND Cbmk

t = cov[Tαbmktminus1 minus αt et ]

The benchmarked predictor αbmkt defined by (24) can be written as

αbmkt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt]Tαbmk

tminus1

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t yt (D1)

where Rt = ZtPbmkt|tminus1Zprime

t minus ZtCbmkt0 minus Cbmkprime

t0 Zprimet +

lowasttt Substituting yt =

Ztαt + et [see (20a)] in (D1) and decomposing αt = [I minus (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Zt]αt + (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Ztαt yields

αbmkt minus αt = [I minus (Pbmk

t|tminus1Zprimet minus Cbmk

t0 )Rminus1t Zt](Tαbmk

tminus1 minus αt)

+ (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t et (D2)

Denote Gt = I minus (Pbmkt|tminus1Zprime

t minus Cbmkt0 )Rminus1

t Zt and Kt = (Pbmkt|tminus1Zprime

t minusCbmk

t0 )Rminus1t Then by (D2) the variance matrix of the prediction error

(αbmkt minus αt) under the model (20) is

Pbmkt = E[(αbmk

t minus αt)(αbmkt minus αt)

prime]= GtPbmk

t|tminus1Gprimet + KtttKprime

t + GtCbmkt Kprime

t + KtCbmkprimet Gprime

t (D3)

where Pbmkt|tminus1 = E[(Tαbmk

tminus1 minusαt)(Tαbmktminus1 minusαt)

prime] = TPbmktminus1 Tprime+Q tt =

E(et eprimet) [eq (20a)] and Cbmk

t = cov[Tαbmktminus1 minus αt et] The matrix Cbmk

tis computed similarly to (17) using the chain

Cbmkt = Abmk

tminus1 Abmktminus2 middot middot middotAbmk

2 Abmk1 1t + Abmk

tminus1 Abmktminus2 middot middot middotAbmk

3 Abmk2 2t

+ middot middot middot + Abmktminus1 Abmk

tminus2 tminus2t + Abmktminus1 tminus1t (D4)

where Abmkj = TGj and Abmk

j = TKj with Gj and Kj defined as ear-lier

[Received August 2004 Revised February 2006]

REFERENCES

Battese G E Harter R M and Fuller W A (1988) ldquoAn Error ComponentModel for Prediction of County Crop Areas Using Survey and Satellite DatardquoJournal of the American Statistical Association 83 28ndash36

Doran H E (1992) ldquoConstraining Kalman Filter and Smoothing Estimates toSatisfy Time-Varying Restrictionsrdquo Review of Economics and Statistics 74568ndash572

Durbin J and Quenneville B (1997) ldquoBenchmarking by StatendashSpace Mod-elsrdquo International Statistical Review 65 23ndash48

Fay R E and Herriot R (1979) ldquoEstimates of Income for Small Places AnApplication of JamesndashStein Procedures to Census Datardquo Journal of the Amer-ican Statistical Association 74 269ndash277

Harvey A C (1989) Forecasting Structural Time Series With the Kalman Fil-ter Cambridge UK Cambridge University Press

Hillmer S C and Trabelsi A (1987) ldquoBenchmarking of Economic Time Se-riesrdquo Journal of the American Statistical Association 82 1064ndash1071

Pfeffermann D (1984) ldquoOn Extensions of the GaussndashMarkov Theorem to theCase of Stochastic Regression Coefficientsrdquo Journal of the Royal StatisticalSociety Ser B 46 139ndash148

(2002) ldquoSmall Area Estimation New Developments and DirectionsrdquoInternational Statistical Review 70 125ndash143

Pfeffermann D and Barnard C H (1991) ldquoSome New Estimators for SmallArea Means With Application to the Assessment of Farmland Valuesrdquo Jour-nal of Business amp Economic Statistics 9 73ndash83

Pfeffermann D and Burck L (1990) ldquoRobust Small-Area Estimation Com-bining Time Series and Cross-Sectional Datardquo Survey Methodology 16217ndash237

Pfeffermann D and Tiller R B (2005) ldquoBootstrap Approximation to Predic-tion MSE for StatendashSpace Models With Estimated Parametersrdquo Journal ofTime Series Analysis 26 893ndash916

Rao J N K (2003) Small Area Estimation New York WileyTiller R B (1992) ldquoTime Series Modeling of Sample Survey Data From the

US Current Population Surveyrdquo Journal of Official Statistics 8 149ndash166

tiller_r
Oval
tiller_r
Oval
tiller_r
Oval