Multiple Cases Deletion Diagnostics for Linear Mixed Models

15
This article was downloaded by: [UNIVERSITY OF KWAZULU-NATAL] On: 12 November 2014, At: 22:46 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 Multiple Cases Deletion Diagnostics for Linear Mixed Models T. Zewotir a a School of Statistics and Actuarial Science , University of KwaZulu Natal , Pietermaritzburg Campus, South Africa Published online: 11 Feb 2008. To cite this article: T. Zewotir (2008) Multiple Cases Deletion Diagnostics for Linear Mixed Models, Communications in Statistics - Theory and Methods, 37:7, 1071-1084, DOI: 10.1080/03610920701713229 To link to this article: http://dx.doi.org/10.1080/03610920701713229 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Transcript of Multiple Cases Deletion Diagnostics for Linear Mixed Models

This article was downloaded by: [UNIVERSITY OF KWAZULU-NATAL]On: 12 November 2014, At: 22:46Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theoryand MethodsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/lsta20

Multiple Cases Deletion Diagnostics forLinear Mixed ModelsT. Zewotir aa School of Statistics and Actuarial Science , University of KwaZuluNatal , Pietermaritzburg Campus, South AfricaPublished online: 11 Feb 2008.

To cite this article: T. Zewotir (2008) Multiple Cases Deletion Diagnostics for LinearMixed Models, Communications in Statistics - Theory and Methods, 37:7, 1071-1084, DOI:10.1080/03610920701713229

To link to this article: http://dx.doi.org/10.1080/03610920701713229

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Communications in Statistics—Theory and Methods, 37: 1071–1084, 2008Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610920701713229

LinearModels

Multiple Cases Deletion Diagnosticsfor LinearMixedModels

T. ZEWOTIR

School of Statistics and Actuarial Science, University of KwaZuluNatal, Pietermaritzburg Campus, South Africa

The influence of observations on statistical inference is of importance in statisticaldata analysis. A practical and well-established approach to influence analysis isbased on case deletion. We provide computationally inexpensive diagnostic toolsfor linear mixed models. The diagnostics are a function of basic building blocks,computed only once from the complete data analysis, and provide information onthe influence of the data on different aspects of the model fit. To demonstrate theutility of the diagnostic tools, we analyze the aerosol penetration data and the dataon the perceptions of the health effects of nicotine and cigarettes in South Africa.

Keywords Fixed effects; Generalized Cook’s distance; Influential observations;One-step diagnostics; Random effects; Variance components ratios.

Mathematics Subject Classification Primary 62J10; Secondary 62J20.

1. Introduction

Influence diagnostics are important for statistical data analyses because they allowthe analyst to understand the impact of individual (or a subset of) observationson inferences about the linear mixed model parameters. However, data analystsgenerally do not assess the influence of observations on the mixed model fit. Thebasic reasons are (1) the lack of routine methods (in the literature and standardcomputing packages) for performing such analysis, and (2) the presumably highcosts (in an analyst and computer time) in doing so.

The identification of influential observations in ordinary regression analysis iswell documented in the literature (see, for example, Cook and Weisberg (1982) andChatterjee and Hadi (1986, 1988)). Cook (1977) introduced a statistic to assess theeffect of dropping a case in regression fittings. Since then, many measures have beenproposed for identifying influential subsets of observations. However, the problems

Received November 17, 2005; Accepted July 17, 2007Address correspondence to T. Zewotir, School of Statistics and Actuarial Science,

University of KwaZulu Natal, Pietermaritzburg Campus, South Africa; E-mail: [email protected]

1071

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1072 Zewotir

of detecting multiple influential observations when fitting linear mixed models aremore complicated than those which arise in the ordinary regression analysis, andthere are few works in literature that have been done to address the problems. Weare aware that some work has been done for special cases of the general mixedmodel. In particular, Martin (1992), Haslett and Hayes (1998), and Haslett (1999)considered the problem for the fixed effect linear model with correlated errorscovariance structure, and Christensen et al. (1992a, 1993) considered the problemfor the spatial linear models. However, the approaches of Martin (1992), Haslettand Hayes (1998) and Haslett (1999) cannot be directly applied to a general mixedmodel unless the entire focus of analysis is to make inferences about the fixed effects.Similarly, approach of Christensen et al. (1992a, 1993) cannot be directly applied tothe general linear mixed model unless the focus of analysis is to find the best linearunbiased predictions of functions of the random effects.

In this article, the problems of detecting multiple influential observations whenfitting the general linear mixed model to data are investigated. We are aware thatthe usual single-case deletion diagnostics suggested by Christensen et al. (1992b)is computationally inefficient, and that Haslett and Dillane (2004) proposed the“delete = replace” approach as an alternative approximate procedure to lightenthe computational burden. Recently, Zewotir and Galpin (2005, 2006) simplified thecase deletion approach by developing basic building blocks, similar to those ofthe linear regression diagnostics, which are the key to making deletion diagnosticsusable. However, the mathematical and computational treatment of this article ismore general and much simpler in that one obtains the case deletion diagnostics bymaking use of the basic building blocks that are computed only once from the fulldata.

A generalized Cook’s distance is proposed to measure the influence of a subsetof observations on the model fit. Other diagnostic measures of influence, whichmeasure the change of the confidence ellipsoid’s volume after deleting a subset ofobservations, are also proposed. These measures include the generalized covarianceratio, the generalized Cook–Weisberg’s statistic, and the generalized Andrews–Pregibon’s statistic. The aerosol penetration data (Beckman et al., 1987; Christensenet al., 1992b; Zewotir and Galpin, 2005) and the data on the awareness of the healtheffects of nicotine and cigarettes in South Africa are used to illustrate applicationof the measures, and to show that the measures proposed are useful in practice.

2. Background

We consider a linear mixed model

y = X� + Zu + ��

where � is a p× 1 vector of unknown constants, the fixed effects of the model;X is an n× p known design matrix of fixed numbers associated with ��Z = �Z1�Z2� � � � �Zr �, where Zi is an n× qi known design matrix of the random effect factori� u′ = �u′1�u′2� � � � �u′r �, where ui is a qi × 1 vector of independent random variablesfrom N�0� 2

i � i = 1� 2� � � � � r� � is an n× 1 vector of error terms from N�0� 2e; and

ui and � are mutually independent. One may also write u ∼ N�0� 2eD, where D

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1073

is block diagonal with the ith block being �iIqi, for �i = 2i /

2e , so that y has a

multivariate normal distribution with

E�y = X� and var�y = 2eH� for H = In + ZDZ

′ = In +r∑

i=1

�iZiZ′i�

If the �i’s (and hence H) are known, the maximum likelihood estimates of �� 2e , and

the realized value of u are given by � = W�X�X−1W�X� y� 2e = W�d� d/n� u = DW

�Z� d, respectively, for d = y − X�, and where W�A�B = A′H−1B, in the notation

of the W transformation of Hemmerle and Hartley (1973). Here � is the bestlinear unbiased estimator (BLUE) of � and u is the best linear unbiased predictor(BLUP) of u. If the �i’s are unknown, the maximum likelihood estimates (MLEs) aresubstituted back into D (and/or H) to obtain �� 2

e , and u (see, e.g., Harville, 1977;Jennrich and Sampson, 1976; McCulloch and Searle, 2001; SAS Institute, 1992;Searle et al., 1992). The error contrast matrix R = H−1 −H−1XW�X�X−1X

′H−1

transforms the observations into residuals. That is e = Ry. The correlation betweenei and ej is entirely determined by the elements of R, corr�ei� ej = rij/

√riirjj � The

ith studentized residual is defined as ti = ei/�e

√rii�

3. Notation and Updating Formulae

Let I = �i1� i2� � � � � ik be a set of indices of k observations to be deleted wherek is fixed in advance. Moreover, k is less than the minimum of the number ofobservations that fall in the fixed effects levels. Without loss of generality, wepartition the matrices as if the k-omitted observations indexed by I are in the first krows, but the results apply in general. Let H�I� be the matrix H with the k rows andk columns indexed by I removed. A subscript of (I) on a matrix or vector meansthat the row and/or column corresponding to the k observations indexed by I hasbeen deleted, and a subscript on an estimate (or estimator) means that the estimatehas been calculated with the k observations indexed by I removed. A subscript of[I] on a square matrix means that the k rows and k columns indexed by I removed.Moreover, define W�I��A�B = A

′�IH

−1�I� B�I and partition

C = H−1 =[CII C′

I�I

CI�I C�I�

]and CI =

[CII

CI�I

]�

Theorem 1. For fixed (or known) variance components ratios,

W�I��A�B = W�A�B− A′CIC−1II C

′IB�

The proof is given in the Appendix. The updating formulae for �� 2e , and u then become

��I = � −W�X�X−1X′CI R−1II eI �

2e�I =

1�n− k

�n2e − e′IR

−1II eI

and

u�I = u −DZ′RIR−1II eI �

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1074 Zewotir

4. Influence Measures

4.1. Generalized Cook’s Distance for Fixed Effects

For fixed effects estimate in the linear mixed model, a generalized Cook’s distancecan be defined as

CDI�� = �� − ��I′W�X�X�� − ��I/�p

2e

= e′IR−1II �CII − RII R−1

II eI/�p2e�

Large values of CDI�� indicate that the subset of the observations indexed by I arejointly influential on �. To gain some insight into CDI�� consider the special caseswhen k = 1� CDI�� reduces to

CDi�� = t2i �cii/rii − 1/p�

4.2. Generalized Welsch-Kuh’s Statistic

This diagnostic statistic is also known as DFFITS (see, e.g., Belsley et al., 1980).Its mathematical expression can be derived as a special case of the generalizedCook’s distance by replacing W�I��X�X with W�X�X and 2

e�I with 2e . That is, the

generalized Welsch-Kuh’s statistic is given by

WKI = �� − ��I′W�I��X�X�� − ��I/�p

2e�I

= [e′I(R−1

II − C−1II

)eI]/(

pn

n− k2e −

p

n− ke′IR

−1II eI

)�

For example, when k = 1 and I = �i , then WKI becomes

WKi =t2i �1− rii/cii

np

n−1 − pt2in−1

4.3. Generalized Covariance Ratio

This statistic measures the change of the determinant of the covariance of the MLEof � and is given by

CRI =det�var���I

det�var��= det�2

e�IW�I��X�X−1

det�2eW�X�X−1

=[

1n− k

(n− e′IR

−1II eI/

2e

)]p det�CII

det�RII �

Clearly, the larger the statistic �CRI − 1�, the stronger the joint influence of theobservations indexed by I on the fixed effects estimate.

When k = 1 and I = �i , then CRI becomes

CRi =[

n

n− 1− t2i

n− 1

]pciirii�

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1075

If one uses the trace of the covariance of the MLE of � instead of its determinant,the CR becomes

CRI�� = trace��var���−1�var���I

= 2e�I

2e

trace�W�X�XW�I��X�X−1

=(

n

n− k− e′IR

−1II eI

�n− k2e

)trace�I + X′CIR

−1II C

′IXW�X�X−1

=(

n

n− k− e′IR−1

II eI�n− k2

e

)(trace

(R−1

II CII

)+ p− k)�

Since CRI�� will be close to p if removing the observations indexed by I does notchange the trace, we could use the relative measure �CRI��− p� as a criterion forassessing the influence of a subset of observations on the variance of �. The largerthe statistic �CRI��− p�, the stronger the influence of the subset of observations.

4.4. Analogue of the Cook–Weisberg Statistic

Another statistic used to measure the change of the confidence ellipsoid volumeof � is the Cook–Weisberg statistic. Under the assumption y ∼ N�X�� 2

eH,�∼N��� 2

eW�X�X−1. The 100�1− �% confidence ellipsoid for � is then given byE = �� � �� − �′W�X�X�� − �/�p2

e ≤ F��� p� n− p . When the set indexed byI is omitted E�I = �� � �� − ��I

′W�I��X�X�� − ��I/�p2e�I ≤ F��� p� n− p− k .

Cook and Weisberg (1980) proposed the logarithm of the ratio E�I to E as ameasure of influence on the volume of confidence ellipsoid for � in the multiplelinear regression model, namely,

CWI = log(volume�E�I

volume�E

)�

Noting that the volume of the ellipsoid is proportional to the inverse of thesquare root of the determinant of the associated matrix of the quadratic form, theanalogous measure for fixed effects � becomes

CWI = log{ det�W�X�X2p

e�I�F��� p� n− pp/2

det�W�I��X�X2pe �F��� p� n− p− kp/2

}

= 12log

(det�CII

det�RII

)+ p

2log

(n

n− k− e′IR

−1II eI

�n− k2e

)+ p

2log

(F��� p� n− p

F��� p� n− p− k

)�

If CWI is negative (positive), then the volume of the confidence ellipsoid isdecreasing (increasing) and increases (decreases) precision after deleting the subsetof observations. Regardless of CWI ’s sign, a large �CWI � indicates a strong influenceof the observations indexed by I on � or/and var(�. The constant p

2 log�F��� p� n− p/F��� p� n− p− k does not affect the detection of an influential

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1076 Zewotir

observation, but it plays an important role in determining the sign of CWI . Clearly,the Cook–Weisberg statistic, CWI , can be simplified to

CWi =12log�CRI+

p

2log

(F��� p� n− p

F��� p� n− p− k

)�

Thus apart from the F values, CWI is identical to CRI . In fact, for large n, the ratioof the F values is close to 1, and hence CWI � 1

2 log�CRI.

4.5. Generalized Andrews–Pregibon’s Statistic

The generalized Andrews–Pregibon’s statistic is given by

API =−12

log{e′�Ie�I det�W�I��X�X

e′e det�W�X�X

}

where

e�I = y�I − X�I��I − Z�iu�I = y�I − X�I� − Z�Iu

+ �X�IW�X�X−1X′CI + Z�IDZ′RI R

−1II eI �

Since

Z�IDZ′RI = �H�I − I�IH−1�II − XW�X�X−1X

′CI = −RI�I − X�IW�X�X−1X

′CI �

then

e�I = y�I − X�I� − Z�Iu − RI�IR−1II eI �

and hence

e′�Ie�I = ssq�y�I − X�I� − Z�Iu − RI�I R−1II eI = ssq�e− 2e

′RIR

−1II eI + ssq�RIR

−1II eI

= ssq�e − RIR−1II eI

where ssq(A) stands for the sum of the squares of elements of A. It therefore followsthat

API =12log�det�CII /det�RII +

12log�e′e− 1

2log�ssq�e − RIR

−1II eI �

The larger the API is, the stronger the joint influence of the k observations indexedby I on the model fit.

Note that when k = 1 and I = �i , then API becomes

APi =12log

(ciirii

)+ 1

2log�e′e− 1

2log

(ssq

(e − ei

riiRi

))�

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1077

4.6. Generalized Cook’s Distance for Random Effects

The proposed diagnostic measure measures the squared distance from the completedata predictor of the random effects to the ith case deleted predictor of the randomeffects relative to the variance of the random effects, var�u = 2

eD. This measure isgiven by

CDI�u = �u − u�I′D−1�u − u�I/

2e

= e′IR−1II R

′IZDZ′RIR

−1II eI/

2e

= e′IR−1II �RII − R′

IRI R−1II eI/

2e �

A large CDI�u indicates that the k observations indexed by I are jointly influentialon the predictions of the random effects. Note that when k = 1 and I = �i , thenCDI�u becomes

CDi�u = t2i

(1− ssq�Ri

rii

)�

4.7. Generalized Likelihood Distance

The influence of a subset of observations on the likelihood function may bemeasured by the distance between the likelihood functions with and without thesubset of observations indexed by I . The log-likelihood function evaluated at theMLEs of � and 2

e , based on the full data, is

l��� 2e = −n

2log 2�− n

2log 2

e −12log�det�H− n

2�

The log-likelihood function evaluated at the MLEs of � and 2e , when the k

observations denoted by I are deleted from the full data, is

l(��I�

2e�I

) = −n

2log 2�− n

2log 2

e�I −12log�det�H− �y − X��I

′H−1�y − X��I

22e�I

The generalized likelihood distance is given by

LDI = −2{l(�� 2

e

)− l(��I�

2e�I

)} = nlog(

n

n− k− e′IR−1

II e�n− k2

e

)+ n+ pCDI��

nn−k

− e′ IR−1II eI

�n−k2e

− n�

A large LDI indicates that the k observations indexed by I are jointly influential tothe likelihood function.

In particular when k = 1 and I = �i , we obtain

LDi = nlog(

n

n− 1− t2i

n− 1

)+ n+ pCDi��

nn−1 − t2i

n−1

− n�

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1078 Zewotir

4.8. Influence on Linear Functions of �

In some cases, we may be interested in assessing the influence of k observationsindexed by I on a given subset of �, or more generally, on s linearly independentcombinations of �, namely, � = L′� where L′ is s × p matrix with rank�L = s usingthe MLE of �� � = L′�. Omitting the k observations indexed by I, the MLE of � is��I = L′��I. Using the generalized Cook’s statistic as a measure of the influence ofa subset of observations on �, we obtain

CDI�� = ��− ��I′�L′W�X�XL−1���− ��I/�s

2e

= ���I − �′M���I − �/�s2e

= e′IR−1II C

′IXW�X�X−1MW�X�X−1X′CIR

−1II eI/s�

where M = L�L′W�X�X−1L−1L′. For positive definite M, we have

R−1II C

′IXW�X�X−1X′CIR

−1II − R−1

II C′IXW�X�X−1MW�X�X−1X′CIR

−1II

= R−1II C

′IXW�X�X−1/2�I −W�X�X−1/2MW�X�X−1/2 W�X�X−1/2X′CIR

−1II ≥ 0

and

e′IR−1II C

′IXW�X�XX′CIR

−1II eI/s ≥ e′IR

−1II C

′IXW�X�X−1MW�X�X−1X′CIR

−1II eI/s�

so that

CDI��p/s ≥ CDI��

where p/s is fixed. This means that CDI�� does not need to be calculated unlessCDI �� is large.

One can further simplify CDI�� by imposing some constraints on L. If ourinterest is centered on the influence of a subset of observations on t elements of� (which we assume, without loss of generality, to be the last t components of �)we have L′ = �0 � It�, where 0 is a t × �p− t matrix of zeros. Partitioning X intoX = �X1 �X2�, we obtain

W�X�X−1 =[W�X1�X1 W�X1�X2W�X2�X1 W�X2�X2

]−1

and using the standard matrix identity for the inverse of a partitioned matrix, wefurther simplify

W�X�X−1MW�X�X−1 = W�X�X−1L�L′W�X�X−1L−1L′W�X�X−1

= W�X�X−1 −[W�X1�X1

−1 00 0

]

from which it follows that

CDI�� =p

tCDI��− e′IR

−1II C

′IX1W�X�XX′

1CIR−1II eI/t�

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1079

which measures the influence of the set of observations indexed by I on the elementsof �.

If our interest is centered on the influence of one particular effects’ level,namely, �k�t = 1�L′ = �0 � 1��X1 = X�k� and X2 = Xk and

CDI�� = pCDI��− e′IR−1II C

′IX�k�W�X�k�� X�k�

−1X′�k�CIR

−1II eI �

which measures the influence of a subset of observations on the kth element of �.

4.9. Influence on the Variance Components Ratios

In all the updating formulae given above, we assumed that the variance componentsratios are known, or the estimates of the variance components ratios are correct.If neither of these assumptions hold or if interest is in estimating the variancecomponents ratios, a “one-step” estimate diagnostic (which is a standard method fornonlinear problems) for variance components ratios should be used. In this case, wefirst examine the variance ratio diagnostics and take any necessary remedial actionto influential cases, if any. So that the influence diagnostics about the fixed andrandom effects, and likelihood functions would be performed with the confidencethat the specified covariance matrix is equally sensitive to each subset of deletedobservations.

The Newton-Raphson (NR) and/or Fisher scoring algorithms which requirefirst- and second-order partial derivatives, may be used to obtain a one-step estimateof �

′ = ��1 � � � �r � using � as a starting value.For the NR iterative procedure, ��t+1 = ��t − �F�t−1g�t, while for the Fisher

scoring algorithm, ��t+1 = ��t − �E�F�t−1g�t, where, g�t and F�t are the gradientand the Hessian evaluated at ��t, respectively. The ith element of g is − 1

2 trace�W�Zi�Zi+ 1

22eW�d�ZiW�Zi� d, the �i� jth, i� j= 1� 2� � � � � r, element of F is

12 trace�W�Zi�ZjW�Zj�Zi− 1

2eW�d�ZiW�Zi�ZjW�Zj� d, implying that the

expected value of the �i� jth element of F is − 12 trace�W�Zi�ZjW�Zj�Zi.

Noting that at convergence ��t+1 = ��t − �F �t−1g�t and ��t+1 = ��t −�E�F �t−1g�t give the same results, and making use of Theorem 1, the one-stepupdating formula for variance components ratios becomes

��I = �− �� −G−1g�I

where ��j� k=E�F�i� j�G�j� k= � 12 ssq�Z

′jCIC

−1II C

′IZk− trace�W�Zj�ZkZ

′kCIC

−1II

C′IZj for j� k = 1� 2� � � � � r, and the jth element of g�I is

��

��j�I= ��

��j+

12 ssq�Z

′je

(1− n

n−k+ e′ IR−1

II eI�n−k2e

)+ 12 ssq�Z

′jRIR

−1II eI − eZjZ

′jRIR

−1II eI

2e

(n

n−k− e′ IR−1

II eI�n−k2e

)

+ 12trace�Z′

jCIC−1II C

′IZj�

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1080 Zewotir

but at convergence ��/��j → ��j → 0, and hence the jth element of g�I is

��

��j�I=

12 ssq�Z

′je

(1− n

n−k+ e′ IR−1

II eI�n−k2e

)+ 12 ssq�Z

′jRIR

−1II eI − eZjZ

′jRIR

−1II eI

2e

(n

n−k− e′ IR−1

II eI�n−k2e

)

+ 12trace�Z′

jCIC−1II C

′IZj�

Clearly, the one-step estimate can only be obtained by explicitly recomputing andreinverting (� −G for each I. This is an r × r matrix, where r is the number ofrandom effects in the model. In most practical cases, r will not be large enough toimply that an r × r matrix inversion is expensive.

The analogue of Cook’s distance measure for variance components ratios �, is

CDI�� = ���I − �′�var���−1���I − �

= −g′�I�� −G−1��� −G−1g�I

= g′�I�Ir + var��G−2var��g�I�

Large values of CDI�� highlight points for special attention.

5. Application

5.1. Aerosol Data

Beckman et al. (1987) developed local influence diagnostics for the mixed modeland illustrated their methodology by using aerosol-penetration data. Christensenet al. (1992b) and Zewotir and Galpin (2005) used the same data for comparisonand illustration of their diagnostic methods. We use the same data to illustrate theinfluence measures discussed in Sec. 4. The aim of aerosol-penetration study wasto determine the factors that contribute most to the variability in penetration ofthe filters, and to determine whether the standard aerosol can be replaced by analternative aerosol in quality assurance testing. Two aerosols were crossed with twofilter manufacturers. Within each manufacturer, three filters were used to evaluatethe penetration of the two aerosols. The aerosol and manufacturer effects weretaken as fixed effects and the filters nested within the manufacturer were taken as arandom effect.

The case deletion influence measures (Zewotir and Galpin, 2005) showed thatcases 33 and 31 stand out as the most singly influential on the filter variance ratio,CD(�), with case 14 also being indicated as third influential observation. We presentthe most influential pairs of observations in Table 1. When we delete pairs ofobservations, cases 14 and 13 are the most jointly influential, yet they are not themost individually influential on the filter variance ratio.

5.2. Perceptions of the Health Effects of Nicotine and Cigarettes in Gauteng

South African adults’ awareness of the government health warnings on the harmfuleffects of smoking were investigated by means of a series of interviewer-administeredquestionnaires conducted by fieldworkers of the Human Sciences Research Council

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1081

Table 1The most jointly influential pairs of observations for aerosol data

I CDI�� I CDI�� I WKI I �CRI − 1�{13, 14} 10.0582 {13, 14} 1.29980 {13, 14} 4.36663 {13, 14} 0.97995{31, 33} 1.7912 {14, 18} 0.57889 {14, 18} 0.92653 {14, 18} 0.80891{32, 33} 1.3511 {14, 15} 0.36288 {14, 33} 0.54913 {14, 33} 0.79209

I �CWI � I API I CDI�u I LDI

{13, 14} 1.95478 {13, 14} 0.87627 {13, 14} 2.04833 {13, 14} 65.0600{14, 18} 0.82750 {14, 18} 0.39414 {14, 16} 0.63306 {14, 18} 12.9064{14, 33} 0.78533 {14, 33} 0.39140 {10, 14} 0.59797 {14, 33} 11.9203

through the omnibus survey in October 1996. A random sample of 40 clusters wastaken from all the clusters in Gauteng Province, South Africa. A random sampleof 343 respondents was drawn from the selected clusters. Respondents in the surveywere asked to recall a number of anti-smoking messages that appeared as warningmessages on cigarette advertisements. For each respondent, the total number ofmessages spontaneously recalled was noted. Our goal is to investigate how thenumber of messages recalled differs with respect to race, sex, age, socio-economicstatus, marital status, smoking status, educational level, and area clusters. We fittedthe linear mixed model to the data with cluster as the random effect and the othermain effects regarded as fixed. The interaction effects included in the model werethe fixed effect interaction of sex by socio-economic status and the random effectinteraction cluster by socio-economic status. These interactions were selected usingthe mixed model goodness of fit information (Duchateau and Janssen, 1997; Searleet al., 1992). The response variable was transformed to the natural log scale by firstadding an amount of 0.50 to the number of messages recalled as some respondentsrecalled zero messages.

The most influential pairs of observations according to different diagnosticmeasures are given in Table 2. The pair (245, 294) is by far the most jointlyinfluential pairs for the variance components. Though the pair (245, 294) is the topinfluential for the variance components, on omitting from the analysis the estimateof the variance ratios is not substantially changed from that of the full data

Table 2The top three jointly influential pairs of observations for nicotine

and cigarette perception data

I CDI�� I CDI�� I WKI I �CRI − 1�{245, 294} 0.032851 {80, 94} 0.025645 {80, 94} 0.069075 {7, 28} 1.89172{5, 245} 0.026526 {80, 169} 0.085331 {80, 169} 0.061776 {51, 161} 1.63259{245, 296} 0.025696 {80, 161} 0.082790 {80, 272} 0.058215 {28, 161} 1.62515

I �CRI − 1� I API I CDI�u I LDI

{7, 28} 5.30926 {80, 161} 0.44485 {24, 29} 1.14165 {80, 94} 3.05647{51, 161} 4.83984 {7, 28} 0.42694 {46, 156} 0.92917 {80, 169} 2.60071{28, 161} 4.82570 {51,161} 0.41033 {80, 245} 0.90210 {80, 272} 2.57665

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1082 Zewotir

Table 3The P-value for nicotine and cigarettes perception data

{80, 94} {80, 161} {7, 28} {245, 294}Effect Full Data Deleted Deleted Deleted Deleted

Race 0.5410 0.5057 0.5285 0.5294 0.6204Sex 0.0302 0.2239 0.0593 0.0177 0.0237Age 0.7835 0.8913 0.8558 0.7371 0.7106Social Status 0.5605 0.7361 0.6142 0.5972 0.5328Married 0.2260 0.3344 0.2005 0.2315 0.2939Smoke 0.4599 0.1669 0.3281 0.4557 0.6229Education 0.0006 0.0002 0.0004 0.0006 0.0007Sex*Social Status 0.0263 0.2347 0.0591 0.0145 0.0211Cov ParmCluster 0.0458 0.0447 0.0486 0.0486 0.0483Social Status*Cluster 0.0332 0.0243 0.0307 0.0306 0.0260

set, (see Table 3). There are no remarkably influential points for the variancecomponents. On the other hand, Table 2 shows that observations �80� 94 , �80� 161 ,and �7� 28 are the most jointly influential cases on the fixed and random effectestimates and likelihood function. Table 3 demonstrates how each pair changes thefull data conclusion. When �80� 94 and �80� 161 are deleted, the only significantfactor is education. The pairs (80, 94) and (80, 161) have a substantial effect on theoverall fit conclusion.

6. Concluding Remarks

We have presented various diagnostic measures that appear useful and can playan important role in linear mixed-model data analysis. All the diagnostic measuresare a function of the basic building blocks: residuals, error contrast matrix, andthe inverse of the response covariance matrix. The basic building blocks arecomputed only once from the complete data analysis. The diagnostics for thefixed effects, random effects, and likelihood function use a specified covariancestructure for the data, and assume that the variance component ratios are correctlyestimated. Therefore, in practice, it would be meaningful to examine influence onthe variance component ratios before dealing with the influential diagnostics on thefixed/random effects and the log-likelihood function.

Appendix: Proof of Theorem 1

From the inverse of a partitioned matrix H and H−1,

[CII C′

I�I

CI�I C�I�

]=

[M−1 −M−1H′

I�IH−1�i�

−H−1�i� HI�IM

−1 H−1�i� +H−1

�i� HI�IM−1H′

I�IH−1�i�

]

where M = HII −H′I�IH

−1�i� HI�I.

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

Multiple Cases Deletion Diagnostics for Linear Mixed Models 1083

Then, from the uniqueness property of the inverse of a positive-definite matrix

[CII C′

I�I

CI�I C�I�

]=

[CII C′

I�I

CI�I H−1�i� + CI�IC

−1II C

′I�I

]

=[0 00 H−1

�i�

]+

[CII

CI�I

]C−1

II

[CII C′

I�I

]

=[0 00 H−1

�i�

]+ CI C

−1II C

′I �

W�A�B = [aI A′

�I

] ( [0 00 H−1

�I�

]+ CI C

−1II C

′I

)[b′IB�I

]�

from which it follows that W�I��A�B = W�A�B−A′CIC

−1II C

′IB.

References

Beckman, R. J., Nachtsheim, C. J., Cook, R. D. (1987). Diagnostics for mixed model analysisof variance. Technometrics 29:413–426.

Belsley, D. A., Kuh, E., Welsch, R. E. (1980). Regression Diagnostics. New York: John Wiley.Chatterjee, S., Hadi, A. S. (1986). Influential observation, high leverage points, and outliers

in linear regression (with discussion). Statistical Science 1:379–416.Chatterjee, S., Hadi, A. S. (1988). Sensitivity Analysis in Linear Regression. New York: John

Wiley.Christensen, R., Johnson, W., Pearson, L. M. (1992a). Prediction diagnostics for spatial

linear models. Biometrika 79:583–591.Christensen, R., Pearson, L. M., Johnson, W. (1992b). Case deletion diagnostics for mixed

models. Technometrics 34:38–45.Christensen, R., Johnson, W., Pearson, L. M. (1993). Covariance function diagnostics for

spatial linear models. Mathematical Geology 25:145–160.Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics

19:15–18.Cook, R. D., Weisberg, S. (1980). Characterization of an empirical influence function for

detecting influential cases in regression. Technometrics 22:495–508.Cook, R. D., Weisberg, S. (1982). Residual and Influence in Regression. London: Chapman

and Hall.Duchateau, L., Janssen, P. (1997). An example based tour in linear mixed models. In

Linear Mixed Models in Practice: A SAS Oriented Approach. New York: Springer-Verlag,pp. 11–62.

Harville, D. A. (1977). Maximum likelihood approaches to variance component estimationand to related problems (with discussion). Journal of the American Statistical Association72:320–340.

Haslett, S. (1999). A simple derivation of deletion diagnostics for the general linear modelwith correlated errors. Journal of the Royal Statistical Society (B) 61:603–609.

Haslett, J., Hayes, K. (1998). Residuals for the linear model with general covariancestructure. Journal of the Royal Statistical Society (B) 60:201–215.

Haslett, J., Dillane, D. (2004). Application of “delete = replace” to deletion diagnosticsfor variance component estimation in the linear mixed model. Journal of the RoyalStatistical Society (B) 66:131–143.

Hemmerle, W. J., Hartley, H. O. (1973). Computing maximum likelihood estimates for themixed AOV model using the W transformation. Technometrics 15:819–831.

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014

1084 Zewotir

Jennrich, R. I., Sampson, P. F. (1976). Newton–Raphson and related algorithms formaximum likelihood estimation of variance components. Technometrics 18:11–18.

Martin, R. J. (1992). Leverage, influence and residuals in regression models whenobservations are correlated. Communications in Statistics—Theory and Methods21:1183–1212.

McCulloch, C. E., Searle, S. R. (2001). Generalized, Linear and Mixed Models. New York:John Wiley.

Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects (withdiscussion). Statistical Science 6:15–51.

SAS Institute Inc. (1992). SAS Technical Report P-229. Cary, NC: SAS Institute.Searle, S. R., Casella, G., McCulloch, C. E. (1992). Variance Components. New York: John

Wiley.Zewotir, T., Galpin J. S. (2005). Influence diagnostics for linear mixed models. Journal of

Data Science 3:153–177.Zewotir, T., Galpin, J. S. (2006). Evaluation of linear mixed model case deletion

diagnostic tools by Monte Carlo simulation. Communications in Statistics—Simulationand Computation 35:645–682.

Dow

nloa

ded

by [

UN

IVE

RSI

TY

OF

KW

AZ

UL

U-N

AT

AL

] at

22:

46 1

2 N

ovem

ber

2014