The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case...

20
http://smr.sagepub.com Sociological Methods & Research DOI: 10.1177/0049124105280200 2005; 34; 240 Sociological Methods Research Ke-Hai Yuan, Peter M. Bentler and Wei Zhang Analysis: The Univariate Case and Its Multivariate Implication The Effect of Skewness and Kurtosis on Mean and Covariance Structure http://smr.sagepub.com/cgi/content/abstract/34/2/240 The online version of this article can be found at: Published by: http://www.sagepublications.com can be found at: Sociological Methods & Research Additional services and information for http://smr.sagepub.com/cgi/alerts Email Alerts: http://smr.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://smr.sagepub.com/cgi/content/refs/34/2/240 SAGE Journals Online and HighWire Press platforms): (this article cites 37 articles hosted on the Citations © 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.com Downloaded from

Transcript of The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case...

http://smr.sagepub.com

Sociological Methods & Research

DOI: 10.1177/0049124105280200 2005; 34; 240 Sociological Methods Research

Ke-Hai Yuan, Peter M. Bentler and Wei Zhang Analysis: The Univariate Case and Its Multivariate Implication

The Effect of Skewness and Kurtosis on Mean and Covariance Structure

http://smr.sagepub.com/cgi/content/abstract/34/2/240 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

can be found at:Sociological Methods & Research Additional services and information for

http://smr.sagepub.com/cgi/alerts Email Alerts:

http://smr.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://smr.sagepub.com/cgi/content/refs/34/2/240SAGE Journals Online and HighWire Press platforms):

(this article cites 37 articles hosted on the Citations

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

The Effect of Skewness and Kurtosis on Meanand Covariance Structure Analysis

The Univariate Case and Its Multivariate Implication

KE-HAI YUANUniversity of Notre Dame

PETER M. BENTLERUniversity of California, Los Angeles

WEI ZHANGUniversity of Notre Dame

The maximum likelihood (ML) method, based on the normal distribution assumption, is

widely used in mean and covariance structure analysis. With typical nonnormal data,

the ML method will lead to biased statistics and inappropriate scientific conclusions.

This article develops a simple but informative case to show how ML results are influ-

enced by skewness and kurtosis. Specifically, the authors discuss how skewness and

kurtosis in a univariate distribution affect the standard errors of the ML estimators, the

covariances between the estimators, and the likelihood ratio test of hypotheses on

mean and variance parameters. They also describe corrections that have been devel-

oped to allow appropriate inference. Enough details are provided so that this material

can be used in graduate instruction. For each result, the corresponding results in the

higher dimensional case are pointed out, and references are provided.

Keywords: likelihood ratio statistic; nonnormal data; sandwich-type covariance matrix;

Wald statistics

1. INTRODUCTION

Mean and covariance structure analysis is becoming increasingly

popular in social and behavioral sciences (Bollen 2002; Boomsma

AUTHORS’ NOTE: The research was supported by Grants DA00017 and DA01070 from the

National Institute on Drug Abuse and NSF Grant DMS04-37167. We thank three referees for their

constructive comments, which led to an improved version of the article.

SOCIOLOGICAL METHODS & RESEARCH, Vol. 34, No. 2, November 2005 240-258

DOI: 10.1177/0049124105280200

� 2005 Sage Publications

240

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

2000; MacCallum and Austin 2000). The most widely used method

for estimation and testing is normal theory-based maximum likeli-

hood (ML). In this method, parameter estimates are obtained by

maximizing the likelihood function derived from the multivariate

normal distribution. Standard errors of the maximum likelihood

estimators (MLE) are based on the covariance matrix that is

obtained by inverting the associated information matrix. Overall

model evaluation is accomplished by referring the likelihood ratio

(LR) statistic to a chi-square distribution. Fit indices are also related

to, or derived from, the LR statistic. Although data in practice are

seldom normally distributed (Micceri 1989), researchers commonly

use the ML method without checking the distribution assumption.

One possible reason is that ML is the default method in almost all

the structural equation modeling (SEM) software. Another reason

may be that the effects of nonnormally distributed data on standard

errors of the MLEs and on the LR statistic are not well understood

by applied researchers. Actually, even more technically oriented

publications do not emphasize limitations of the normal theory ML

approach with nonnormal data (see, e.g., reviews by Breckler 1990;

MacCallum and Austin 2000).

Although SEM is taught in most graduate programs, since current

textbooks do not rigorously introduce material on the effect of non-

normal data on model inference, it is likely that few instructors cover

this material in classrooms. The mathematics/statistics involved is

more complicated than that of regression, ANOVA, or basic SEM,

and even courses in univariate and multivariate statistics do not pro-

vide enough technical background for digesting the literature on

SEM with nonnormal data. The aim of this article is to provide a rig-

orous introduction to the effect of nonnormality on statistical infer-

ence in mean and covariance structure analysis using the most simple

one-dimensional case. Although the one-dimensional case is over-

simplified, all the effects of nonnormality on standard errors and test

statistics in the higher dimensional case are reflected in the one-

dimensional case. The concepts needed to develop this case are quite

minimal and build on material already in the armamentarium of

many graduate students, namely, basic calculus, linear algebra, and

an introductory course in statistics/probability. Thus, we expect that

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 241

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

this case can be used as a teaching tool in SEM courses for graduate

students in the social and behavioral sciences.

In the one-dimensional case, the interesting parameters are the

population mean and variance. The effect of nonnormal data on sta-

tistical inference for these two parameters can be totally character-

ized by skewness and kurtosis. The concepts of skewness and

kurtosis in the one-dimensional case are well known to graduate

students in social sciences (see, e.g., Tabachnick and Fidell

2001:73-5). The other concepts involved in this article are partial

derivatives, the law of large numbers, and the central limit theorem.

We will provide the necessary steps for each result so that a quanti-

tative graduate student will be able to check or derive it. For each

result, we will also point out the parallel higher dimensional result

in the SEM literature.

In section 2, we study the effect of nonnormal data on the var-

iances and covariance of the MLEs of the population mean and var-

iance. In section 3, we study the effect of nonnormal data on the LR

and related statistics. In section 4, we present an example illustrat-

ing the effect of nonnormal data on the distributions of the MLEs

and the LR statistics. A discussion with a further guide to the litera-

ture is provided in section 5.

2. THE NORMAL THEORY BASED MAXIMUM

LIKELIHOOD ESTIMATOR

Let y1, y2, . . . , yn be a random sample from a population y with

E(y)=µ, Var(y)= σ2, E(y−µ)3 = σ3/2γ, and E(y−µ)4 = σ4β.

Then γ and β− 3 are the population skewness and kurtosis of y.

When y∼N(µ, σ2), γ= 0 and β= 3. This section deals with the

effect of γ and β on the distribution of the normal theory-based

MLEs of µ and σ2. Notice that even when γ= 0 and β= 3, y may

still be nonnormally distributed. However, the violation of normality

in higher order moments will have only a minimal effect. Actually, the

asymptotic distributions of the MLEs of µ and σ2 depend on the dis-

tribution of y only up to the fourth-order moment (see, e.g., Ferguson

1996:44-9; Magnus and Neudecker 1999:313-20).

242 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Given yi, the likelihood function based on yi ∼N(µ, σ2) is

Li(µ, σ2)=L(µ, σ2|yi)= 1

(2πσ2)1/2exp{−(yi −µ)2/(2σ2)},

which is just the normal density function with yi known. The corre-

sponding log-likelihood function li = log(Li) is

li(µ, σ2)= − 1

2log(2π)− 1

2log(σ2)− 1

2σ2(yi −µ)2:

The MLEs of µ and σ2, based on y1, y2, . . . , yn, are

µ= �y= 1

n

∑n

i=1

yi and σ2 = s2 = 1

n

∑n

i=1

(yi − �y)2,

which maximize the log-likelihood function

l(µ, σ2)=∑n

i=1

li(µ, σ2): (1)

Closely related to the log-likelihood function is the so-called infor-

mation matrix

I = I 11 I 12

I 21 I 22

( ),

where

I 11 = −E∂2li(µ, σ

2)

∂µ∂µ

( )= 1/σ2,

I 12 = −E∂2li(µ, σ

2)

∂µ∂σ2

( )= 0,

I 21 = −E∂2li(µ, σ

2)

∂σ2∂µ

( )= 0,

I 22 = −E∂2li(µ, σ

2)

∂σ2∂σ2

( )= 1/(2σ4):

In mean and covariance structure analysis in higher dimensions,

both the mean vector and the covariance matrix are parameterized

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 243

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

as functions of a more basic set of parameters. Then elements

of − I will be just the expectation of the second derivative of the

log-likelihood function with respect to a pair of the parameters.

Let θ= (µ, σ2)0 and θ= (�y, s2)0. When data are normally distrib-

uted, standard asymptotic statistical theory (see, e.g., Ferguson

1996:121) tells us that, as n→∞,

ffiffiffi

np

(θ−θ)!L N(0,�), (2)

where !L means ‘‘converging in distribution.’’ This means that,

with a large n, the distribution of the left side of (2) can be approxi-

mately described by a normal random vector with mean zero and

covariance matrix �. Furthermore, this covariance matrix is the

inverse of the information matrix

�= ω11 ω12

ω21 ω22

( )= I−1 = σ2 0

0 2σ4

( ): (3)

Because � is a diagonal matrix, µ and σ2 are asymptotically inde-

pendent. Of course, they are also independent with any finite sample

sizes due to y∼N(µ, σ2) (see, e.g., Casella and Berger 2002:218;

Hays 1994:250). Such a result holds also in higher dimensional

normal data. That is, when the mean and covariance structures do not

have overlapping parameters, parameter estimates in the mean struc-

ture are asymptotically independent of parameter estimates in the

variance-covariance structure (see Yuan and Bentler forthcoming).

We next study the distribution of θ= (µ, σ2)0 when data are

nonnormally distributed. Notice that

s2 = 1

n

X

n

i=1

(yi − �y)2 = 1

n

X

n

i=1

(yi −µ)2 − (�y−µ)2:

Denote ~σ2 = Pn

i=1(yi −µ)2/n. We have

ffiffiffi

np µ−µ

σ2 − σ2

� �

= ffiffiffi

np µ−µ

~σ2 − σ2

� �

− 0ffiffiffi

np

(�y−µ)2

� �

: (4)

Because (�y−µ) approaches zero in probability andffiffiffi

np

(�y−µ) is

bounded in probability (see, e.g., Bishop, Fienberg, and Holland

1975:476),ffiffiffi

np

(�y−µ)2 also approaches zero in probability. Denote

244 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

~θ= (�y, ~σ2)0. It follows from (4) and the well-known Slutsky’s (1925)

theorem1 thatffiffiffi

np

(θ−θ) andffiffiffi

np

(~θ−θ) have the same asymptotic

distribution. Notice that

ffiffiffi

np

(~θ−θ)= 1ffiffiffi

np

X

n

i=1

yi −µ

(yi −µ)2 − σ2

� �

: (5)

Applying the central limit theorem to the right side of (5) leads to

ffiffiffi

np

(~θ−θ)!L N(0,Π),

where

Π= π11 π12

π21 π22

� �

with

π11 =E(yi −µ)2 = σ2,

π12 =π21 =E{(yi −µ)[(yi −µ)2 − σ2]}=E(yi −µ)3 = σ3/2γ,

π22 =E{[(yi −µ)2 − σ2][(yi −µ)2 − σ2]}=E(yi −µ)4 − σ4

= σ4(β− 1):

Becauseffiffiffi

np

(θ−θ) andffiffiffi

np

(~θ−θ) have the same asymptotic

distribution,

ffiffiffi

np

(θ−θ)!L N(0,Π): (6)

Comparing (6) with (2) and (3), ω22 =π22 only when β= 3.

A standard error for σ2 based on the � in (3) will be negatively biased

when β> 3 and positively biased when β< 3. With sample estimates

of skewness and kurtosis, a consistent estimate of Π can be obtained

when replacing its unknown elements by the sample estimates. Thus,

a consistent standard error of σ2 will be obtained. This result is a

special case of the so-called sandwich-type covariance matrix in

mean and covariance structure analysis, discussed in Dijkstra

(1981); Bentler (1983); Shapiro (1983); Browne (1984); Bentler

and Dijkstra (1985); Satorra and Bentler (1988, 1994); Arminger and

Schoenberg (1989); Arminger and Sobel (1990); Kano, Berkane, and

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 245

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Bentler (1993); Browne and Arminger (1995); and Yuan and Bentler

(1997a, 1998a, 1998b, 2000b).

It follows from (6) that the asymptotic distribution of σ2 depends

on σ2 and β but not γ. In contrast, the asymptotic distribution of µ

does not depend on either γ or β. This is also true in the higher

dimensional case. Results in Yuan and Bentler (1999a, 2000a,

2002a) imply that the asymptotic distributions of the covariance

parameter estimates, the commonly used sample correlation coeffi-

cients, and sample reliability coefficients depend on only the joint

fourth-order moments or kurtoses of the variables. Equation (6) also

tells us that µ and σ2 are no longer asymptotically independent when

γ 6¼ 0. This is also true in the higher dimensional case when not all

the third-order moments are zero, where mean and covariance para-

meter estimates are not asymptotically independent even when they

do not have overlapping parameters (Yuan and Bentler forthcoming).

3. THE NORMAL THEORY-BASED LIKELIHOOD RATIO TEST

We first consider the distribution of the LR statistic when µ is a free

parameter. The null hypothesis is H0: σ2 = σ20. Notice that when H0

is true, the σ20 will equal the σ2 in section 2, which will also be the

scenario we consider in this section. The behavior of the LR statistic

with misspecified models was studied in Shapiro (1983); Satorra and

Saris (1985); Steiger, Shapiro, and Browne (1985); Satorra (1989);

Yuan and Hayashi (2003); Yuan (2005); and Yuan and Bentler

(forthcoming) Yuan, Hayashi and Bentler (2005).

Using the log-likelihood function in (1), we obtain the LR

statistic as

TML = 2[l(µ, σ2)− l(µ, σ20)]= n

s2

σ20

− logs2

σ20

� �

− 1

� �

: (7)

It is obvious that (7) is just the univariate version of the normal theory-

based discrepancy function in covariance structure analysis (see equa-

tion 4.67 of Bollen 1989:107). Notice that

logs2

σ20

� �

= log 1+ s2

σ20

− 1

� �� �

,

246 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

and (s2/σ20 − 1) will be small when n is large. Using the Taylor expan-

sion, we have

logs2

σ20

� �

= s2

σ20

− 1

� �

− 1

2

s2

σ20

− 1

� �2

+ rn, (8)

where nrn approaches zero in probability when n→∞. Putting (8)

into (7), we get

TML = n(s2 − σ2

0)2

2σ40

+ nrn:

It follows from (6) that

ffiffiffi

np

(s2 − σ20)!LN(0, π22):

Thus,

xn =ffiffiffi

np

(s2 − σ20)/

ffiffiffiffiffiffiffi

π22

p !L N(0, 1)

and

TML = π22

2σ40

x2n + nrn

= (β− 1)

2x2n + nrn

!L (β− 1)

2χ2

1:

So the distribution of the LR statistic is proportional to kurtosis.

When data are normally distributed, β= 3 and TML!Lχ2

1. A correct

hypothesis σ20 can be easily rejected when we refer the TML in (7) to

χ21 while β> 3. Similarly, a wrong hypothesis might not be rejected

when β< 3, even when n is large. In the higher dimensional case, the

LR statistic is also proportional to the common kurtosis when data are

elliptically symmetric (Browne 1984; Shapiro and Browne 1987), and

TML may still not depend on skewness when the marginal kurtosis is

heterogeneous (Kano, Berkane, and Bentler 1990) or even when data

are skewed (Yuan and Bentler 1999b).

With a consistent estimator of π22, we can rescale TML to

TR = 2σ4

π22

TML: (9)

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 247

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

It is obvious that σ4/π22 converges in probability to 1/(β− 1) and

TR!Lχ2

1:

In the multivariate case, the statistic TR is just the Satorra and Bentler

(1988, 1994) rescaled statistic.

Notice that

zn =ffiffiffi

np

(s2 − σ20)/

ffiffiffiffiffiffiffi

π22

p

!L N(0, 1):

The Wald-type statistic for testing σ2 = σ20 is

TW = n(s2 − σ2

0)2

π22

: (10)

As long as π22 is consistent for π22, the asymptotic distribution of TW

is χ21, which does not depend on the underlying distribution of y.

Such a property is commonly called asymptotically distribution free

(ADF) in the SEM literature. Two estimates of π22 are available.

One is

π(1)22 = 1

n

∑n

i=1

[(yi − �y)2 − s2]2, (11)

which is equivalent to s4(β− 1), where

β= 1

n

∑n

i=1

(yi − �y)4/s4:

The other one is

π(2)22 = 1

n

∑n

i=1

[(yi − �y)2 − σ20]2, (12)

and there exists

π(2)22 = π

(1)22 + (s2 − σ2

0)2:

Notice that, under H0: σ2 = σ20, (s2 − σ2

0) approaches zero in prob-

ability according to the law of large numbers, and π(2)22 and π

(1)22 are

asymptotically equivalent. The well-known ADF statistic (Browne

1984) in covariance structure analysis corresponds to a multivariate

version of TW with π22 = π(1)22 . The corrected ADF statistic developed

248 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

in Yuan and Bentler (1997b) corresponds to TW with π22 = π(2)22 .

Yuan and Bentler (1998c) provided a corrected residual-based ADF

statistic, which is also a multivariate version of TW with π22 = π(2)22 .

We next consider testing H0: (µ, σ2)0= (µ0, σ20)0 or θ= θ0. It is

easy to obtain

TML = 2[l(µ, σ2)− l(µ0, σ20)]

= ns2

σ20

− logs2

σ20

� �

− 1

� �

+ n(�y−µ0)

2

σ20

:(13)

Let Π−1/2 be a symmetric matrix such that Π−1/2Π−1/2 =Π−1.

It follows from (6) that

xn =Π−1/2ffiffiffi

np

(θ−θ0)!LN(0, I2),

where I2 is the 2-by-2 identity matrix. Let

W= 1/σ20 0

0 1/(2σ40)

( ):

It follows from (8) and (13) that

TML = n(s2 − σ2

0)2

2σ40

+ n(�y−µ0)

2

σ20

+ nrn

= n(θ−θ0)0W(θ−θ0)+ nrn

= x0nΠ1/2W Π1/2xn + nrn:

Let �1 and �2 be the two eigenvalues of Π1/2W Π1/2. Then there

exist eigenvectors v1 and v2 such that

Π1/2W Π1/2 = (v1, v2)�1 0

0 �2

( )(v1, v2)

0:

Let V= (v1, v2) and zn = (zn1, zn2)0=V0xn. Because V0V= I2,

zn!L

z= (z1, z2)0∼N(0, I2)

and

TML =�1z2n1 +�2z

2n2 + nrn:

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 249

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Notice that the eigenvalues of Π1/2W Π1/2 equal the eigenvalues of

W1/2Π W1/2 = 1 γ/(2σ3)1/2

γ/(2σ3)1/2 (β− 1)/2

( ):

The determinant equation determining the eigenvalues is

|W1/2Π W1/2 −� I2|= 0,

which is just

�2 − β+ 1

2�+ β− 1− γ2/σ3

2= 0:

Solving this equation, we have

�1 = 1

4{β+ 1+ [(β− 3)2 + 8γ2/σ3]1/2},

�2 = 1

4{β+ 1− [(β− 3)2 + 8γ2/σ3]1/2}:

When data are normally distributed (β= 3, γ= 0), �1 =�2 = 1,

TML!Lχ2

2:

When data are symmetric (γ= 0), �1 = (β− 1)/2 and �2 = 1. In

such a case, µ and σ2 are asymptotically independent. A rescaled

statistic that removes the effect of β is given by

TR = n2σ4

π22

s2

σ20

− logs2

σ20

( )− 1

[ ]+ n

(�y−µ0)2

σ20

(14)

and TR!Lχ2

2. A parallel statistic to (14) can also be constructed

in the higher dimensional case, although we are not aware of the

existence of such a development.

When data are skewed or γ 6¼ 0, the two eigenvalues are not

equal. We might consider simultaneously removing the effect of

skewness and kurtosis by constructing the rescaled statistic

TR = 2

tf(W Π)TML: (15)

However, TR will not approach χ22 but 2(�1z

21 +�2z

22)/(�1 +�2),

whose mean is 2=E(χ22). In the higher dimensional case, the

rescaled statistic parallel to the TR in (15) generally does not follow

250 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

a chi-square distribution (Satorra and Bentler 1994; Yuan and Bentler

2000b). Even with only a covariance structure model, the rescaled

statistic parallel to the TR in (9) may not follow a chi-square distribu-

tion either due to the heterogeneity of the eigenvalues, although its

distribution is still chi-square under various conditions (see Yuan and

Bentler 1999b). Similarly, ADF-type statistics can be constructed in

testing H0: (µ, σ2)0= (µ0, σ20)0, and we leave it to readers to work

out the details. See Browne and Arminger (1995) and Yuan and

Bentler (1997b, 1999c) for the higher dimensional case.

4. A NUMERICAL EXAMPLE

Neumann (1994) studied the relationship of alcohol and psychological

symptoms. His data set consists of p= 10 variables and n= 335 cases.

We will use the family history of psychopathology variable to illustrate

the effect of skewness and kurtosis. For this variable, the MLEs of µ

and σ2 are, respectively, µ= �y= 1:361 and σ2 = s2 = 2:302; the sam-

ple skewness and kurtosis are, respectively,

γ= 1

n

∑n

i=1

(yi − �y)3/s3 = 2:001 and β= 8:766:

Note that both γ and β− 3 are significantly different from zero

(see, e.g., Snedecor and Cochran 1989, Table A19), indicating that the

sample most likely comes from a nonnormal distribution. Our purpose

here is to illustrate the effect of γ and β on the asymptotic distributions

of µ and σ2 and statistics for testing µ=µ0 and σ2 = σ20, not to elabo-

rate on the substantive side of the data.

Assuming the sample is from N(µ, σ2), the asymptotic distribu-

tion of (µ, σ2)0 is given by (2) with

�= 2:302 0

0 10:598

( ):

Admitting that the data may not be normally distributed, the asymp-

totic distribution of (µ, σ2)0 is given by (6) with

Π= 2:302 6:989

6:989 41:154

( ):

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 251

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

If based on (2), the estimated standard error of σ2 is (10:598/335)1/2 =0:178. If based on (6), the estimated standard error of σ2 is

(41:154/335)1/2 = 0:350, almost double that based on (2). Of course,

µ and σ2 are no longer asymptotically independent in this example

when using (6). Although the confidence interval for σ2 based on (2) is

much shorter than that based on (6), the shorter interval is a misleading

result due to the nonnormality of the data.

Turning to hypothesis testing, suppose the null hypothesis is

H0: (µ, σ2)= (1:2, 1:8). When testing σ2 = 1:8 alone, the LR statis-

tic in (7) is TML = 11:021, which is highly significant when referred

to χ21. The rescaled statistic in (9) is TR = 2:838; the Wald statistic in

(10), using the π(1)22 in (11), is T

(1)W = 2:051; and the Wald statistic in

(10), using the π(2)22 in (12) is T (2)

W = 2:039. None is statistically signif-

icant at the α= 0:05 level when referred to χ21. Note that T (1)

W and

T(2)W have a tiny difference in this example because of p= 1 and a

relatively large sample size. In the higher dimensional case, their dif-

ference can be huge (Yuan and Bentler 1997b, 1998c), and T(2)W is

recommended for more reliable inference with smaller samples.

When testing (µ, σ2)= (1:2, 1:8) simultaneously, the LR statis-

tic in (13) is TML = 15:845, which is highly significant when

referred to χ22. The rescaled statistic in (15) is given by

TR = 4:153, which is no longer statistically significant at the

α= 0:05 level when referred to χ22. Note that the TR in (15) unli-

kely follows χ22 due to the significance of γ. However, referring TR

to a chi-square distribution does make the inference more reliable.

More empirical results about these statistics in higher dimensional

cases can be found in Hu, Bentler, and Kano (1992) and Yuan and

Bentler (1998c).

The rescaled statistic in (14) is TR = 7:662, which is statistically

significant at the α= 0:05 level when referred to χ22. However, this

TR is not justified when γ is statistically significant.

In summary, all the evidence in this example is not against the

hypothesis H0: (µ, σ2)= (1:2, 1:8) when using proper statistics.

However, if one starts with the normal theory-based ML proce-

dure without checking the distribution of the sample, H0 will be

rejected!

252 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

5. DISCUSSION

Motivated by the gap in teaching resources and the technical litera-

ture of SEM, this article provides a simplified version of SEM with

nonnormal data. Some of the material in this article may be just

a trivial exercise to quantitative graduate students. For students or

researchers who do not have much quantitative training, the material

should better facilitate an understanding of the effect of nonnormal

data on standard errors and test statistics in mean and covariance

structure analysis. Of course, fit indices defined through TML will be

equally affected by skewness or kurtosis (see Yuan 2005 and Yuan

and Marshall 2004). Readers with a solid quantitative background

may further read the literature in the higher dimensional case cited in

sections 2 and 3. We hope the article will help to solidify an under-

standing of the effect of nonnormal data on SEM inference, espe-

cially in graduate education.

Proper procedures have to be used to get reliable inferences with

nonnormal data. Although we do not have space to discuss these,

we would like to note that truly robust methods, not depending on

ML, do exist. These methods minimize the effect of bad data not

only on standard errors and test statistics but also on parameter

estimates and power evaluations (Yuan and Bentler 1998a, 1998b,

2000c; Yuan, Bentler, and Chan 2004; Yuan, Chan, and Bentler 2000;

Yuan and Hayashi 2003; Yuan, Marshall, and Weston 2002). It is well

known that Mardia’s (1970, 1974) measure of multivariate kurtosis is

a generalization of the univariate kurtosis β− 3. When the sample

multivariate kurtosis is significantly greater than that of the multivari-

ate normal distribution, a robust procedure might be necessary. In

small samples, the significance of Mardia’s coefficient can be evalu-

ated using the simulation approach of Bonett, Woodward, and Randall

(2002). In addition to nonnormal data, a small sample size also tends

to cause the significance of the statistic TML with correctly specified

models. Remedies in this direction are addressed in Bentler and Yuan

(1999) and Yuan and Bentler (1999c).

In this article, we have emphasized the ML function for the sim-

plest case in which data are complete and obtained by simple random

sampling. As can easily be imagined, the problems arising from

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 253

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

skewness and kurtosis do not vanish when data are missing or when

data are obtained under hierarchical sampling schemes. The same

principles apply—that is, normal theory-based ML standard errors

and test statistics will be biased under nonnormality. Solutions to this

problem for the missing data case were developed by Arminger and

Sobel (1990) and Yuan and Bentler (2000b). Solutions for multilevel

data were provided by Poon and Lee (1994) and Yuan and Bentler

(2002b, 2003).

There is a related literature called asymptotic robustness theory.

This is concerned with the validity of normal theory-based methods

with large-sample nonnormal data (Amemiya and Anderson 1990;

Anderson and Amemiya 1988; Browne and Shapiro 1988; Mooijaart

and Bentler 1991; Satorra and Bentler 1990; Shapiro 1987; Yuan and

Bentler 1999a, 1999b). Unfortunately, the conditions for asymptotic

robustness depend on both the data and the model, and there is no

effective way to verify these conditions at present. It is not appropriate

to blindly trust that a researcher’s given data and model satisfy these

conditions.

In practice, TML with empirical data is often statistically signifi-

cant when referred to a chi-square distribution. However, a small p

value associated with TML may not be due to a bad model and/or

too much power (i.e., a huge sample size). It may be due to viola-

tion of assumptions or bad data (Yuan and Bentler 2001). The

newer statistics2 described in this article do not require making the

stringent multivariate normality assumption. These statistics not

only are liable to make a good model more acceptable statistically

but also should lead to more accurate scientific conclusions.

NOTES

1. The theorem states that, if xn!Lx, an converges in probability to a and bn converges

in probability to b, anxn + bn!Lax+ b.

2. Most of the procedures discussed in this article, such as standard errors based on the

sandwich-type covariance matrices, rescaled and improved asymptotically distribution free

statistics, robust methods, and statistics that perform well with small samples, are currently

available in EQS 6.0 (Bentler forthcoming).

254 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

REFERENCES

Amemiya, Yasuo and Theodore W. Anderson. 1990. ‘‘Asymptotic Chi-Square Tests for a

Large Class of Factor Analysis Models.’’ Annals of Statistics 18:1453-63.

Anderson, Theodore W. and Yasuo Amemiya. 1988. ‘‘The Asymptotic Normal Distribution of

Estimators in Factor Analysis Under General Conditions.’’ Annals of Statistics 16:759-71.

Arminger, Gehard and Ronald Schoenberg. 1989. ‘‘Pseudo Maximum Likelihood Estimation

and a Test for Misspecification in Mean and Covariance Structure Models.’’ Psychometrika

54:409-26.

Arminger, Gehard and Michael E. Sobel. 1990. ‘‘Pseudo-Maximum Likelihood Estimation of

Mean and Covariance Structures With Missing Data.’’ Journal of the American Statistical

Association 85:195-203.

Bentler, Peter M. 1983. ‘‘Some Contributions to Efficient Statistics in Structural Models:

Specification and Estimation of Moment Structures.’’ Psychometrika 48:493-517.

———. Forthcoming. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate

Software.

Bentler, Peter M. and Theo K. Dijkstra. 1985. ‘‘Efficient Estimation Via Linearization

in Structural Models.’’ Pp. 9-42 in Multivariate Analysis VI, edited by P. R. Krishnaiah.

Amsterdam: North-Holland.

Bentler, Peter M. and Ke-Hai Yuan. 1999. ‘‘Structural Equation Modeling With Small Sam-

ples: Test Statistics.’’ Multivariate Behavioral Research 34:181-97.

Bishop, Yvonne M. M., Stephen E. Fienberg, and Paul W. Holland. 1975. Discrete Multivari-

ate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Bollen, Kenneth A. 1989. Structural Equations With Latent Variables. New York: John

Wiley.

———. 2002. ‘‘Latent Variables in Psychology and the Social Sciences.’’ Annual Review of

Psychology 53:605-34.

Bonett, Douglas G., J. Arthur Woodward, and Robert L. Randall. 2002. ‘‘Estimating p-Values

for Mardia’s Coefficients of Multivariate Skewness and Kurtosis.’’ Computational Statis-

tics 17:117-22.

Boomsma, Anne. 2000. ‘‘Reporting on Structural Equation Analyses.’’ Structural Equation

Modeling 7:461-83.

Breckler, Steven J. 1990. ‘‘Application of Covariance Structure Modeling in Psychology:

Cause for Concern?’’ Psychological Bulletin 107:260-73.

Browne, Michael W. 1984. ‘‘Asymptotic Distribution-Free Methods for the Analysis of Covar-

iance Structures.’’ British Journal of Mathematical and Statistical Psychology 37:62-83.

Browne, Michael W. and Gehard Arminger. 1995. ‘‘Specification and Estimation of Mean and

Covariance Structure Models.’’ Pp. 185-249 in Handbook of Statistical Modeling for the

Social and Behavioral Sciences, edited by G. Arminger, C. C. Clogg, and M. E. Sobel.

New York: Plenum.

Browne, Michael W. and Alexander Shapiro. 1988. ‘‘Robustness of Normal Theory Methods

in the Analysis of Linear Latent Variate Models.’’ British Journal of Mathematical and

Statistical Psychology 41:193-208.

Casella, George and Roger L. Berger. 2002. Statistical Inference. Pacific Grove, CA: Duxbury.

Dijkstra, Theo K. 1981. ‘‘Latent Variables in Linear Stochastic Models: Reflections on ‘Max-

imum Likelihood’ and ‘Partial Least Squares’ Methods.’’ Ph.D. dissertation, University

of Groningen.

Ferguson, Thomas S. 1996. A Course in Large Sample Theory. London: Chapman & Hall.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 255

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Hays, William L. 1994. Statistics. 5th ed. Fort Worth, TX: Harcourt Brace.

Hu, Li-tze, Peter M. Bentler, and Yutaka Kano. 1992. ‘‘Can Test Statistics in Covariance

Structure Analysis Be Trusted?’’ Psychological Bulletin 112:351-62.

Kano, Yukata, Maria Berkane, and Peter M. Bentler. 1990. ‘‘Covariance Structure Analysis

With Heterogeneous Kurtosis Parameters.’’ Biometrika 77:575-85.

———. 1993. ‘‘Statistical Inference Based on Pseudo-Maximum Likelihood Estimators in

Elliptical Populations.’’ Journal of the American Statistical Association 88:135-43.

MacCallum, Robert C. and James T. Austin. 2000. ‘‘Applications of Structural Equation

Modeling in Psychological Research.’’ Annual Review of Psychology 51:201-26.

Magnus, Jan R. and Heinz Neudecker. 1999. Matrix Differential Calculus With Applications

in Statistics and Econometrics. Rev. ed. New York: John Wiley.

Mardia, Kanti V. 1970. ‘‘Measures of Multivariate Skewness and Kurtosis With Applica-

tions.’’ Biometrika 57:519-30.

———. 1974. ‘‘Applications of Some Measures of Multivariate Skewness and Kurtosis in

Testing Normality and Robustness Studies.’’ Sankhya B 35:115-28.

Micceri, Theodore. 1989. ‘‘The Unicorn, the Normal Curve, and Other Improbable

Creatures.’’ Psychological Bulletin 105:156-66.

Mooijaart, Ab and Peter M. Bentler. 1991. ‘‘Robustness of Normal Theory Statistics in Struc-

tural Equation Models.’’ Statistica Neerlandica 45:159-71.

Neumann, Craig S. 1994. ‘‘Structural Equation Modeling of Symptoms of Alcoholism and

Psychopathology.’’ Ph.D. dissertation, University of Kansas.

Poon, Wai-Yin and Sik-Yum Lee. 1994. ‘‘A Distribution Free Approach for Analysis of

Two-Level Structural Equation Model.’’ Computational Statistics and Data Analysis

17:265-75.

Satorra, Albert. 1989. ‘‘Alternative Test Criteria in Covariance Structure Analysis: A Unified

Approach.’’ Psychometrika 54:131-51.

Satorra, Albert and Peter M. Bentler. 1988. ‘‘Scaling Corrections for Chi-Square Statistics in

Covariance Structure Analysis.’’ Pp. 308-13 in American Statistical Association 1988

Proceedings of Business and Economics Sections. Alexandria, VA: American Statistical

Association.

———. 1990. ‘‘Model Conditions for Asymptotic Robustness in the Analysis of Linear Rela-

tions.’’ Computational Statistics and Data Analysis 10:235-49.

———. 1994. ‘‘Corrections to Test Statistics and Standard Errors in Covariance Structure

Analysis.’’ Pp. 399-419 in Latent Variables Analysis: Applications for Developmental

Research, edited by A. von Eye and C. C. Clogg. Newbury Park, CA: Sage.

Satorra, Albert and William Saris. 1985. ‘‘Power of the Likelihood Ratio Test in Covariance

Structure Analysis.’’ Psychometrika 50:83-90.

Shapiro, Alexander. 1983. ‘‘Asymptotic Distribution Theory in the Analysis of Covariance

Structures (A Unified Approach).’’ South African Statistical Journal 17:33-81.

———. 1987. ‘‘Robustness Properties of the MDF Analysis of Moment Structures.’’ South

African Statistical Journal 21:39-62.

Shapiro, Alexander and Michael W. Browne. 1987. ‘‘Analysis of Covariance Structures

Under Elliptical Distributions.’’ Journal of the American Statistical Association 82:

1092-97.

Slutsky, Eugene. 1925. Uber Stochastische Asymptoten and Grenzwerte.’’ Metron 5:1-90.

Snedecor, George W. and William G. Cochran. 1989. Statistical Methods. 8th ed. Ames:

Iowa State University Press.

Steiger, James H., Alexander Shapiro, and Michael W. Browne. 1985. ‘‘On the Multivariate

Asymptotic Distribution of Sequential Chi-Square Statistics.’’ Psychometrika 50:253-64.

256 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Tabachnick, Barbara G. and Linda S. Fidell. 2001. Using Multivariate Statistics 4th ed.

New York: HarperCollins.

Yuan, Ke-Hai. 2005. ‘‘Fit Indices Versus Test Statistics.’’ Multivariate Behavior Research

40:115-148.

Yuan, Ke-Hai and Peter M. Bentler. 1997a. ‘‘Improving Parameter Tests in Covariance Struc-

ture Analysis.’’ Computational Statistics and Data Analysis 26:177-98.

———-. 1997b. ‘‘Mean and Covariance Structure Analysis: Theoretical and Practical

Improvements.’’ Journal of the American Statistical Association 92:767-74.

———-. 1998a. ‘‘Robust Mean and Covariance Structure Analysis.’’ British Journal of

Mathematical and Statistical Psychology 51:63-88.

———-. 1998b. ‘‘Structural Equation Modeling With Robust Covariances.’’ Sociological

Methodology 28:363-96.

———-. 1998c. ‘‘Normal Theory Based Test Statistics in Structural Equation Modelling.’’

British Journal of Mathematical and Statistical Psychology 51:289-309.

———-. 1999a. ‘‘On Asymptotic Distributions of Normal Theory MLE in Covariance Struc-

ture Analysis Under Some Nonnormal Distributions.’’ Statistics and Probability Letters

42:107-13.

———-. 1999b. ‘‘On Normal Theory and Associated Test Statistics in Covariance Structure

Analysis Under Two Classes of Nonnormal Distributions.’’ Statistica Sinica 9:831-53.

———-. 1999c. ‘‘F-tests for Mean and Covariance Structure Analysis.’’ Journal of Educa-

tional and Behavioral Statistics 24:225-43.

———-. 2000a. ‘‘Inferences on Correlation Coefficients in Some Classes of Nonnormal

Distributions.’’ Journal of Multivariate Analysis 72:230-48.

———-. 2000b. ‘‘Three Likelihood-Based Methods for Mean and Covariance Structure

Analysis With Nonnormal Missing Data.’’ Sociological Methodology 30:167-202.

———-. 2000c. ‘‘Robust Mean and Covariance Structure Analysis Through Iteratively

Reweighted Least Squares.’’ Psychometrika 65:43-58.

———-. 2001. ‘‘Effect of Outliers on Estimators and Tests in Covariance Structure Analysis.’’

British Journal of Mathematical and Statistical Psychology 54:161-75.

———-. 2002a. ‘‘On Robustness of the Normal-Theory Based Asymptotic Distributions of

Three Reliability Coefficient Estimates.’’ Psychometrika 67:251-9.

———-. 2002b. ‘‘On Normal Theory Based Inference for Multilevel Models With Distribu-

tional Violations.’’ Psychometrika 67:539-61.

———-. 2003. ‘‘Eight Test Statistics for Multilevel Structural Equation Models.’’ Psychome-

trika 44:89-107.

———-. Forthcoming. ‘‘Mean Comparison: Manifest Variable Versus Latent Variable.’’

Psychometrika.

Yuan, Ke-Hai, Peter M. Bentler, and Wai Chan. 2004. ‘‘Structural Equation Modeling With

Heavy Tailed Distributions.’’ Psychometrika 69:421-36.

Yuan, Ke-Hai, Wai Chan, and Peter M. Bentler. 2000. ‘‘Robust Transformation With Appli-

cations to Structural Equation Modeling.’’ British Journal of Mathematical and Statistical

Psychology 53:31-50.

Yuan, Ke-Hai and Kentaro Hayashi. 2003. ‘‘Bootstrap Approach to Inference and Power

Analysis Based on Three Statistics for Covariance Structure Models.’’ British Journal of

Mathematical and Statistical Psychology 56:93-110.

Yuan, Ke-Hai, Kentaro Hayashi, and Peter M. Bentler. (N.d.) ‘‘Normal theory likelihood ratio

statistic for mean and covariance structure analysis under alternative hypotheses.’’

Yuan, Ke-Hai and Linda L. Marshall. 2004. ‘‘A New Measure of Misfit for Covariance Struc-

ture Models.’’ Behaviormetrika 31:1-24.

Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 257

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from

Yuan, Ke-Hai, Linda L. Marshall, and Rebecca Weston. 2002. ‘‘Cross-Validation Through

Downweighting Influential Cases in Structural Equation Modeling.’’ British Journal of

Mathematical and Statistical Psychology 55:125-43.

Ke-Hai Yuan is an associate professor in quantitative psychology at the University of

Notre Dame. His research interests are in the areas of psychometric theory and applied

multivariate statistics. He received the Cattell award for early career outstanding multi-

variate research from the Society of Multivariate Experimental Psychology in 2002.

Peter M. Bentler is a Distinguished Professor of Psychology and Statistics at UCLA

who has been an elected president of the Psychometric Society, the Society of Multivari-

ate Experimental Psychology, and the Division of Evaluation, Measurement, and Statis-

tics of the American Psychological Association. He directs the Center for Collaborative

Research on Drug Abuse.

Wei Zhang is a graduate student in quantitative psychology at the University of Notre

Dame. His primary research interest is structural equation modeling.

258 SOCIOLOGICAL METHODS & RESEARCH

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from