The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case...
Transcript of The Effect of Skewness and Kurtosis on Mean and Covariance Structure Analysis: The Univariate Case...
http://smr.sagepub.com
Sociological Methods & Research
DOI: 10.1177/0049124105280200 2005; 34; 240 Sociological Methods Research
Ke-Hai Yuan, Peter M. Bentler and Wei Zhang Analysis: The Univariate Case and Its Multivariate Implication
The Effect of Skewness and Kurtosis on Mean and Covariance Structure
http://smr.sagepub.com/cgi/content/abstract/34/2/240 The online version of this article can be found at:
Published by:
http://www.sagepublications.com
can be found at:Sociological Methods & Research Additional services and information for
http://smr.sagepub.com/cgi/alerts Email Alerts:
http://smr.sagepub.com/subscriptions Subscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://smr.sagepub.com/cgi/content/refs/34/2/240SAGE Journals Online and HighWire Press platforms):
(this article cites 37 articles hosted on the Citations
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
The Effect of Skewness and Kurtosis on Meanand Covariance Structure Analysis
The Univariate Case and Its Multivariate Implication
KE-HAI YUANUniversity of Notre Dame
PETER M. BENTLERUniversity of California, Los Angeles
WEI ZHANGUniversity of Notre Dame
The maximum likelihood (ML) method, based on the normal distribution assumption, is
widely used in mean and covariance structure analysis. With typical nonnormal data,
the ML method will lead to biased statistics and inappropriate scientific conclusions.
This article develops a simple but informative case to show how ML results are influ-
enced by skewness and kurtosis. Specifically, the authors discuss how skewness and
kurtosis in a univariate distribution affect the standard errors of the ML estimators, the
covariances between the estimators, and the likelihood ratio test of hypotheses on
mean and variance parameters. They also describe corrections that have been devel-
oped to allow appropriate inference. Enough details are provided so that this material
can be used in graduate instruction. For each result, the corresponding results in the
higher dimensional case are pointed out, and references are provided.
Keywords: likelihood ratio statistic; nonnormal data; sandwich-type covariance matrix;
Wald statistics
1. INTRODUCTION
Mean and covariance structure analysis is becoming increasingly
popular in social and behavioral sciences (Bollen 2002; Boomsma
AUTHORS’ NOTE: The research was supported by Grants DA00017 and DA01070 from the
National Institute on Drug Abuse and NSF Grant DMS04-37167. We thank three referees for their
constructive comments, which led to an improved version of the article.
SOCIOLOGICAL METHODS & RESEARCH, Vol. 34, No. 2, November 2005 240-258
DOI: 10.1177/0049124105280200
� 2005 Sage Publications
240
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
2000; MacCallum and Austin 2000). The most widely used method
for estimation and testing is normal theory-based maximum likeli-
hood (ML). In this method, parameter estimates are obtained by
maximizing the likelihood function derived from the multivariate
normal distribution. Standard errors of the maximum likelihood
estimators (MLE) are based on the covariance matrix that is
obtained by inverting the associated information matrix. Overall
model evaluation is accomplished by referring the likelihood ratio
(LR) statistic to a chi-square distribution. Fit indices are also related
to, or derived from, the LR statistic. Although data in practice are
seldom normally distributed (Micceri 1989), researchers commonly
use the ML method without checking the distribution assumption.
One possible reason is that ML is the default method in almost all
the structural equation modeling (SEM) software. Another reason
may be that the effects of nonnormally distributed data on standard
errors of the MLEs and on the LR statistic are not well understood
by applied researchers. Actually, even more technically oriented
publications do not emphasize limitations of the normal theory ML
approach with nonnormal data (see, e.g., reviews by Breckler 1990;
MacCallum and Austin 2000).
Although SEM is taught in most graduate programs, since current
textbooks do not rigorously introduce material on the effect of non-
normal data on model inference, it is likely that few instructors cover
this material in classrooms. The mathematics/statistics involved is
more complicated than that of regression, ANOVA, or basic SEM,
and even courses in univariate and multivariate statistics do not pro-
vide enough technical background for digesting the literature on
SEM with nonnormal data. The aim of this article is to provide a rig-
orous introduction to the effect of nonnormality on statistical infer-
ence in mean and covariance structure analysis using the most simple
one-dimensional case. Although the one-dimensional case is over-
simplified, all the effects of nonnormality on standard errors and test
statistics in the higher dimensional case are reflected in the one-
dimensional case. The concepts needed to develop this case are quite
minimal and build on material already in the armamentarium of
many graduate students, namely, basic calculus, linear algebra, and
an introductory course in statistics/probability. Thus, we expect that
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 241
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
this case can be used as a teaching tool in SEM courses for graduate
students in the social and behavioral sciences.
In the one-dimensional case, the interesting parameters are the
population mean and variance. The effect of nonnormal data on sta-
tistical inference for these two parameters can be totally character-
ized by skewness and kurtosis. The concepts of skewness and
kurtosis in the one-dimensional case are well known to graduate
students in social sciences (see, e.g., Tabachnick and Fidell
2001:73-5). The other concepts involved in this article are partial
derivatives, the law of large numbers, and the central limit theorem.
We will provide the necessary steps for each result so that a quanti-
tative graduate student will be able to check or derive it. For each
result, we will also point out the parallel higher dimensional result
in the SEM literature.
In section 2, we study the effect of nonnormal data on the var-
iances and covariance of the MLEs of the population mean and var-
iance. In section 3, we study the effect of nonnormal data on the LR
and related statistics. In section 4, we present an example illustrat-
ing the effect of nonnormal data on the distributions of the MLEs
and the LR statistics. A discussion with a further guide to the litera-
ture is provided in section 5.
2. THE NORMAL THEORY BASED MAXIMUM
LIKELIHOOD ESTIMATOR
Let y1, y2, . . . , yn be a random sample from a population y with
E(y)=µ, Var(y)= σ2, E(y−µ)3 = σ3/2γ, and E(y−µ)4 = σ4β.
Then γ and β− 3 are the population skewness and kurtosis of y.
When y∼N(µ, σ2), γ= 0 and β= 3. This section deals with the
effect of γ and β on the distribution of the normal theory-based
MLEs of µ and σ2. Notice that even when γ= 0 and β= 3, y may
still be nonnormally distributed. However, the violation of normality
in higher order moments will have only a minimal effect. Actually, the
asymptotic distributions of the MLEs of µ and σ2 depend on the dis-
tribution of y only up to the fourth-order moment (see, e.g., Ferguson
1996:44-9; Magnus and Neudecker 1999:313-20).
242 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Given yi, the likelihood function based on yi ∼N(µ, σ2) is
Li(µ, σ2)=L(µ, σ2|yi)= 1
(2πσ2)1/2exp{−(yi −µ)2/(2σ2)},
which is just the normal density function with yi known. The corre-
sponding log-likelihood function li = log(Li) is
li(µ, σ2)= − 1
2log(2π)− 1
2log(σ2)− 1
2σ2(yi −µ)2:
The MLEs of µ and σ2, based on y1, y2, . . . , yn, are
µ= �y= 1
n
∑n
i=1
yi and σ2 = s2 = 1
n
∑n
i=1
(yi − �y)2,
which maximize the log-likelihood function
l(µ, σ2)=∑n
i=1
li(µ, σ2): (1)
Closely related to the log-likelihood function is the so-called infor-
mation matrix
I = I 11 I 12
I 21 I 22
( ),
where
I 11 = −E∂2li(µ, σ
2)
∂µ∂µ
( )= 1/σ2,
I 12 = −E∂2li(µ, σ
2)
∂µ∂σ2
( )= 0,
I 21 = −E∂2li(µ, σ
2)
∂σ2∂µ
( )= 0,
I 22 = −E∂2li(µ, σ
2)
∂σ2∂σ2
( )= 1/(2σ4):
In mean and covariance structure analysis in higher dimensions,
both the mean vector and the covariance matrix are parameterized
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 243
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
as functions of a more basic set of parameters. Then elements
of − I will be just the expectation of the second derivative of the
log-likelihood function with respect to a pair of the parameters.
Let θ= (µ, σ2)0 and θ= (�y, s2)0. When data are normally distrib-
uted, standard asymptotic statistical theory (see, e.g., Ferguson
1996:121) tells us that, as n→∞,
ffiffiffi
np
(θ−θ)!L N(0,�), (2)
where !L means ‘‘converging in distribution.’’ This means that,
with a large n, the distribution of the left side of (2) can be approxi-
mately described by a normal random vector with mean zero and
covariance matrix �. Furthermore, this covariance matrix is the
inverse of the information matrix
�= ω11 ω12
ω21 ω22
( )= I−1 = σ2 0
0 2σ4
( ): (3)
Because � is a diagonal matrix, µ and σ2 are asymptotically inde-
pendent. Of course, they are also independent with any finite sample
sizes due to y∼N(µ, σ2) (see, e.g., Casella and Berger 2002:218;
Hays 1994:250). Such a result holds also in higher dimensional
normal data. That is, when the mean and covariance structures do not
have overlapping parameters, parameter estimates in the mean struc-
ture are asymptotically independent of parameter estimates in the
variance-covariance structure (see Yuan and Bentler forthcoming).
We next study the distribution of θ= (µ, σ2)0 when data are
nonnormally distributed. Notice that
s2 = 1
n
X
n
i=1
(yi − �y)2 = 1
n
X
n
i=1
(yi −µ)2 − (�y−µ)2:
Denote ~σ2 = Pn
i=1(yi −µ)2/n. We have
ffiffiffi
np µ−µ
σ2 − σ2
� �
= ffiffiffi
np µ−µ
~σ2 − σ2
� �
− 0ffiffiffi
np
(�y−µ)2
� �
: (4)
Because (�y−µ) approaches zero in probability andffiffiffi
np
(�y−µ) is
bounded in probability (see, e.g., Bishop, Fienberg, and Holland
1975:476),ffiffiffi
np
(�y−µ)2 also approaches zero in probability. Denote
244 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
~θ= (�y, ~σ2)0. It follows from (4) and the well-known Slutsky’s (1925)
theorem1 thatffiffiffi
np
(θ−θ) andffiffiffi
np
(~θ−θ) have the same asymptotic
distribution. Notice that
ffiffiffi
np
(~θ−θ)= 1ffiffiffi
np
X
n
i=1
yi −µ
(yi −µ)2 − σ2
� �
: (5)
Applying the central limit theorem to the right side of (5) leads to
ffiffiffi
np
(~θ−θ)!L N(0,Π),
where
Π= π11 π12
π21 π22
� �
with
π11 =E(yi −µ)2 = σ2,
π12 =π21 =E{(yi −µ)[(yi −µ)2 − σ2]}=E(yi −µ)3 = σ3/2γ,
π22 =E{[(yi −µ)2 − σ2][(yi −µ)2 − σ2]}=E(yi −µ)4 − σ4
= σ4(β− 1):
Becauseffiffiffi
np
(θ−θ) andffiffiffi
np
(~θ−θ) have the same asymptotic
distribution,
ffiffiffi
np
(θ−θ)!L N(0,Π): (6)
Comparing (6) with (2) and (3), ω22 =π22 only when β= 3.
A standard error for σ2 based on the � in (3) will be negatively biased
when β> 3 and positively biased when β< 3. With sample estimates
of skewness and kurtosis, a consistent estimate of Π can be obtained
when replacing its unknown elements by the sample estimates. Thus,
a consistent standard error of σ2 will be obtained. This result is a
special case of the so-called sandwich-type covariance matrix in
mean and covariance structure analysis, discussed in Dijkstra
(1981); Bentler (1983); Shapiro (1983); Browne (1984); Bentler
and Dijkstra (1985); Satorra and Bentler (1988, 1994); Arminger and
Schoenberg (1989); Arminger and Sobel (1990); Kano, Berkane, and
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 245
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Bentler (1993); Browne and Arminger (1995); and Yuan and Bentler
(1997a, 1998a, 1998b, 2000b).
It follows from (6) that the asymptotic distribution of σ2 depends
on σ2 and β but not γ. In contrast, the asymptotic distribution of µ
does not depend on either γ or β. This is also true in the higher
dimensional case. Results in Yuan and Bentler (1999a, 2000a,
2002a) imply that the asymptotic distributions of the covariance
parameter estimates, the commonly used sample correlation coeffi-
cients, and sample reliability coefficients depend on only the joint
fourth-order moments or kurtoses of the variables. Equation (6) also
tells us that µ and σ2 are no longer asymptotically independent when
γ 6¼ 0. This is also true in the higher dimensional case when not all
the third-order moments are zero, where mean and covariance para-
meter estimates are not asymptotically independent even when they
do not have overlapping parameters (Yuan and Bentler forthcoming).
3. THE NORMAL THEORY-BASED LIKELIHOOD RATIO TEST
We first consider the distribution of the LR statistic when µ is a free
parameter. The null hypothesis is H0: σ2 = σ20. Notice that when H0
is true, the σ20 will equal the σ2 in section 2, which will also be the
scenario we consider in this section. The behavior of the LR statistic
with misspecified models was studied in Shapiro (1983); Satorra and
Saris (1985); Steiger, Shapiro, and Browne (1985); Satorra (1989);
Yuan and Hayashi (2003); Yuan (2005); and Yuan and Bentler
(forthcoming) Yuan, Hayashi and Bentler (2005).
Using the log-likelihood function in (1), we obtain the LR
statistic as
TML = 2[l(µ, σ2)− l(µ, σ20)]= n
s2
σ20
− logs2
σ20
� �
− 1
� �
: (7)
It is obvious that (7) is just the univariate version of the normal theory-
based discrepancy function in covariance structure analysis (see equa-
tion 4.67 of Bollen 1989:107). Notice that
logs2
σ20
� �
= log 1+ s2
σ20
− 1
� �� �
,
246 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
and (s2/σ20 − 1) will be small when n is large. Using the Taylor expan-
sion, we have
logs2
σ20
� �
= s2
σ20
− 1
� �
− 1
2
s2
σ20
− 1
� �2
+ rn, (8)
where nrn approaches zero in probability when n→∞. Putting (8)
into (7), we get
TML = n(s2 − σ2
0)2
2σ40
+ nrn:
It follows from (6) that
ffiffiffi
np
(s2 − σ20)!LN(0, π22):
Thus,
xn =ffiffiffi
np
(s2 − σ20)/
ffiffiffiffiffiffiffi
π22
p !L N(0, 1)
and
TML = π22
2σ40
x2n + nrn
= (β− 1)
2x2n + nrn
!L (β− 1)
2χ2
1:
So the distribution of the LR statistic is proportional to kurtosis.
When data are normally distributed, β= 3 and TML!Lχ2
1. A correct
hypothesis σ20 can be easily rejected when we refer the TML in (7) to
χ21 while β> 3. Similarly, a wrong hypothesis might not be rejected
when β< 3, even when n is large. In the higher dimensional case, the
LR statistic is also proportional to the common kurtosis when data are
elliptically symmetric (Browne 1984; Shapiro and Browne 1987), and
TML may still not depend on skewness when the marginal kurtosis is
heterogeneous (Kano, Berkane, and Bentler 1990) or even when data
are skewed (Yuan and Bentler 1999b).
With a consistent estimator of π22, we can rescale TML to
TR = 2σ4
π22
TML: (9)
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 247
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
It is obvious that σ4/π22 converges in probability to 1/(β− 1) and
TR!Lχ2
1:
In the multivariate case, the statistic TR is just the Satorra and Bentler
(1988, 1994) rescaled statistic.
Notice that
zn =ffiffiffi
np
(s2 − σ20)/
ffiffiffiffiffiffiffi
π22
p
!L N(0, 1):
The Wald-type statistic for testing σ2 = σ20 is
TW = n(s2 − σ2
0)2
π22
: (10)
As long as π22 is consistent for π22, the asymptotic distribution of TW
is χ21, which does not depend on the underlying distribution of y.
Such a property is commonly called asymptotically distribution free
(ADF) in the SEM literature. Two estimates of π22 are available.
One is
π(1)22 = 1
n
∑n
i=1
[(yi − �y)2 − s2]2, (11)
which is equivalent to s4(β− 1), where
β= 1
n
∑n
i=1
(yi − �y)4/s4:
The other one is
π(2)22 = 1
n
∑n
i=1
[(yi − �y)2 − σ20]2, (12)
and there exists
π(2)22 = π
(1)22 + (s2 − σ2
0)2:
Notice that, under H0: σ2 = σ20, (s2 − σ2
0) approaches zero in prob-
ability according to the law of large numbers, and π(2)22 and π
(1)22 are
asymptotically equivalent. The well-known ADF statistic (Browne
1984) in covariance structure analysis corresponds to a multivariate
version of TW with π22 = π(1)22 . The corrected ADF statistic developed
248 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
in Yuan and Bentler (1997b) corresponds to TW with π22 = π(2)22 .
Yuan and Bentler (1998c) provided a corrected residual-based ADF
statistic, which is also a multivariate version of TW with π22 = π(2)22 .
We next consider testing H0: (µ, σ2)0= (µ0, σ20)0 or θ= θ0. It is
easy to obtain
TML = 2[l(µ, σ2)− l(µ0, σ20)]
= ns2
σ20
− logs2
σ20
� �
− 1
� �
+ n(�y−µ0)
2
σ20
:(13)
Let Π−1/2 be a symmetric matrix such that Π−1/2Π−1/2 =Π−1.
It follows from (6) that
xn =Π−1/2ffiffiffi
np
(θ−θ0)!LN(0, I2),
where I2 is the 2-by-2 identity matrix. Let
W= 1/σ20 0
0 1/(2σ40)
( ):
It follows from (8) and (13) that
TML = n(s2 − σ2
0)2
2σ40
+ n(�y−µ0)
2
σ20
+ nrn
= n(θ−θ0)0W(θ−θ0)+ nrn
= x0nΠ1/2W Π1/2xn + nrn:
Let �1 and �2 be the two eigenvalues of Π1/2W Π1/2. Then there
exist eigenvectors v1 and v2 such that
Π1/2W Π1/2 = (v1, v2)�1 0
0 �2
( )(v1, v2)
0:
Let V= (v1, v2) and zn = (zn1, zn2)0=V0xn. Because V0V= I2,
zn!L
z= (z1, z2)0∼N(0, I2)
and
TML =�1z2n1 +�2z
2n2 + nrn:
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 249
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Notice that the eigenvalues of Π1/2W Π1/2 equal the eigenvalues of
W1/2Π W1/2 = 1 γ/(2σ3)1/2
γ/(2σ3)1/2 (β− 1)/2
( ):
The determinant equation determining the eigenvalues is
|W1/2Π W1/2 −� I2|= 0,
which is just
�2 − β+ 1
2�+ β− 1− γ2/σ3
2= 0:
Solving this equation, we have
�1 = 1
4{β+ 1+ [(β− 3)2 + 8γ2/σ3]1/2},
�2 = 1
4{β+ 1− [(β− 3)2 + 8γ2/σ3]1/2}:
When data are normally distributed (β= 3, γ= 0), �1 =�2 = 1,
TML!Lχ2
2:
When data are symmetric (γ= 0), �1 = (β− 1)/2 and �2 = 1. In
such a case, µ and σ2 are asymptotically independent. A rescaled
statistic that removes the effect of β is given by
TR = n2σ4
π22
s2
σ20
− logs2
σ20
( )− 1
[ ]+ n
(�y−µ0)2
σ20
(14)
and TR!Lχ2
2. A parallel statistic to (14) can also be constructed
in the higher dimensional case, although we are not aware of the
existence of such a development.
When data are skewed or γ 6¼ 0, the two eigenvalues are not
equal. We might consider simultaneously removing the effect of
skewness and kurtosis by constructing the rescaled statistic
TR = 2
tf(W Π)TML: (15)
However, TR will not approach χ22 but 2(�1z
21 +�2z
22)/(�1 +�2),
whose mean is 2=E(χ22). In the higher dimensional case, the
rescaled statistic parallel to the TR in (15) generally does not follow
250 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
a chi-square distribution (Satorra and Bentler 1994; Yuan and Bentler
2000b). Even with only a covariance structure model, the rescaled
statistic parallel to the TR in (9) may not follow a chi-square distribu-
tion either due to the heterogeneity of the eigenvalues, although its
distribution is still chi-square under various conditions (see Yuan and
Bentler 1999b). Similarly, ADF-type statistics can be constructed in
testing H0: (µ, σ2)0= (µ0, σ20)0, and we leave it to readers to work
out the details. See Browne and Arminger (1995) and Yuan and
Bentler (1997b, 1999c) for the higher dimensional case.
4. A NUMERICAL EXAMPLE
Neumann (1994) studied the relationship of alcohol and psychological
symptoms. His data set consists of p= 10 variables and n= 335 cases.
We will use the family history of psychopathology variable to illustrate
the effect of skewness and kurtosis. For this variable, the MLEs of µ
and σ2 are, respectively, µ= �y= 1:361 and σ2 = s2 = 2:302; the sam-
ple skewness and kurtosis are, respectively,
γ= 1
n
∑n
i=1
(yi − �y)3/s3 = 2:001 and β= 8:766:
Note that both γ and β− 3 are significantly different from zero
(see, e.g., Snedecor and Cochran 1989, Table A19), indicating that the
sample most likely comes from a nonnormal distribution. Our purpose
here is to illustrate the effect of γ and β on the asymptotic distributions
of µ and σ2 and statistics for testing µ=µ0 and σ2 = σ20, not to elabo-
rate on the substantive side of the data.
Assuming the sample is from N(µ, σ2), the asymptotic distribu-
tion of (µ, σ2)0 is given by (2) with
�= 2:302 0
0 10:598
( ):
Admitting that the data may not be normally distributed, the asymp-
totic distribution of (µ, σ2)0 is given by (6) with
Π= 2:302 6:989
6:989 41:154
( ):
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 251
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
If based on (2), the estimated standard error of σ2 is (10:598/335)1/2 =0:178. If based on (6), the estimated standard error of σ2 is
(41:154/335)1/2 = 0:350, almost double that based on (2). Of course,
µ and σ2 are no longer asymptotically independent in this example
when using (6). Although the confidence interval for σ2 based on (2) is
much shorter than that based on (6), the shorter interval is a misleading
result due to the nonnormality of the data.
Turning to hypothesis testing, suppose the null hypothesis is
H0: (µ, σ2)= (1:2, 1:8). When testing σ2 = 1:8 alone, the LR statis-
tic in (7) is TML = 11:021, which is highly significant when referred
to χ21. The rescaled statistic in (9) is TR = 2:838; the Wald statistic in
(10), using the π(1)22 in (11), is T
(1)W = 2:051; and the Wald statistic in
(10), using the π(2)22 in (12) is T (2)
W = 2:039. None is statistically signif-
icant at the α= 0:05 level when referred to χ21. Note that T (1)
W and
T(2)W have a tiny difference in this example because of p= 1 and a
relatively large sample size. In the higher dimensional case, their dif-
ference can be huge (Yuan and Bentler 1997b, 1998c), and T(2)W is
recommended for more reliable inference with smaller samples.
When testing (µ, σ2)= (1:2, 1:8) simultaneously, the LR statis-
tic in (13) is TML = 15:845, which is highly significant when
referred to χ22. The rescaled statistic in (15) is given by
TR = 4:153, which is no longer statistically significant at the
α= 0:05 level when referred to χ22. Note that the TR in (15) unli-
kely follows χ22 due to the significance of γ. However, referring TR
to a chi-square distribution does make the inference more reliable.
More empirical results about these statistics in higher dimensional
cases can be found in Hu, Bentler, and Kano (1992) and Yuan and
Bentler (1998c).
The rescaled statistic in (14) is TR = 7:662, which is statistically
significant at the α= 0:05 level when referred to χ22. However, this
TR is not justified when γ is statistically significant.
In summary, all the evidence in this example is not against the
hypothesis H0: (µ, σ2)= (1:2, 1:8) when using proper statistics.
However, if one starts with the normal theory-based ML proce-
dure without checking the distribution of the sample, H0 will be
rejected!
252 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
5. DISCUSSION
Motivated by the gap in teaching resources and the technical litera-
ture of SEM, this article provides a simplified version of SEM with
nonnormal data. Some of the material in this article may be just
a trivial exercise to quantitative graduate students. For students or
researchers who do not have much quantitative training, the material
should better facilitate an understanding of the effect of nonnormal
data on standard errors and test statistics in mean and covariance
structure analysis. Of course, fit indices defined through TML will be
equally affected by skewness or kurtosis (see Yuan 2005 and Yuan
and Marshall 2004). Readers with a solid quantitative background
may further read the literature in the higher dimensional case cited in
sections 2 and 3. We hope the article will help to solidify an under-
standing of the effect of nonnormal data on SEM inference, espe-
cially in graduate education.
Proper procedures have to be used to get reliable inferences with
nonnormal data. Although we do not have space to discuss these,
we would like to note that truly robust methods, not depending on
ML, do exist. These methods minimize the effect of bad data not
only on standard errors and test statistics but also on parameter
estimates and power evaluations (Yuan and Bentler 1998a, 1998b,
2000c; Yuan, Bentler, and Chan 2004; Yuan, Chan, and Bentler 2000;
Yuan and Hayashi 2003; Yuan, Marshall, and Weston 2002). It is well
known that Mardia’s (1970, 1974) measure of multivariate kurtosis is
a generalization of the univariate kurtosis β− 3. When the sample
multivariate kurtosis is significantly greater than that of the multivari-
ate normal distribution, a robust procedure might be necessary. In
small samples, the significance of Mardia’s coefficient can be evalu-
ated using the simulation approach of Bonett, Woodward, and Randall
(2002). In addition to nonnormal data, a small sample size also tends
to cause the significance of the statistic TML with correctly specified
models. Remedies in this direction are addressed in Bentler and Yuan
(1999) and Yuan and Bentler (1999c).
In this article, we have emphasized the ML function for the sim-
plest case in which data are complete and obtained by simple random
sampling. As can easily be imagined, the problems arising from
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 253
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
skewness and kurtosis do not vanish when data are missing or when
data are obtained under hierarchical sampling schemes. The same
principles apply—that is, normal theory-based ML standard errors
and test statistics will be biased under nonnormality. Solutions to this
problem for the missing data case were developed by Arminger and
Sobel (1990) and Yuan and Bentler (2000b). Solutions for multilevel
data were provided by Poon and Lee (1994) and Yuan and Bentler
(2002b, 2003).
There is a related literature called asymptotic robustness theory.
This is concerned with the validity of normal theory-based methods
with large-sample nonnormal data (Amemiya and Anderson 1990;
Anderson and Amemiya 1988; Browne and Shapiro 1988; Mooijaart
and Bentler 1991; Satorra and Bentler 1990; Shapiro 1987; Yuan and
Bentler 1999a, 1999b). Unfortunately, the conditions for asymptotic
robustness depend on both the data and the model, and there is no
effective way to verify these conditions at present. It is not appropriate
to blindly trust that a researcher’s given data and model satisfy these
conditions.
In practice, TML with empirical data is often statistically signifi-
cant when referred to a chi-square distribution. However, a small p
value associated with TML may not be due to a bad model and/or
too much power (i.e., a huge sample size). It may be due to viola-
tion of assumptions or bad data (Yuan and Bentler 2001). The
newer statistics2 described in this article do not require making the
stringent multivariate normality assumption. These statistics not
only are liable to make a good model more acceptable statistically
but also should lead to more accurate scientific conclusions.
NOTES
1. The theorem states that, if xn!Lx, an converges in probability to a and bn converges
in probability to b, anxn + bn!Lax+ b.
2. Most of the procedures discussed in this article, such as standard errors based on the
sandwich-type covariance matrices, rescaled and improved asymptotically distribution free
statistics, robust methods, and statistics that perform well with small samples, are currently
available in EQS 6.0 (Bentler forthcoming).
254 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
REFERENCES
Amemiya, Yasuo and Theodore W. Anderson. 1990. ‘‘Asymptotic Chi-Square Tests for a
Large Class of Factor Analysis Models.’’ Annals of Statistics 18:1453-63.
Anderson, Theodore W. and Yasuo Amemiya. 1988. ‘‘The Asymptotic Normal Distribution of
Estimators in Factor Analysis Under General Conditions.’’ Annals of Statistics 16:759-71.
Arminger, Gehard and Ronald Schoenberg. 1989. ‘‘Pseudo Maximum Likelihood Estimation
and a Test for Misspecification in Mean and Covariance Structure Models.’’ Psychometrika
54:409-26.
Arminger, Gehard and Michael E. Sobel. 1990. ‘‘Pseudo-Maximum Likelihood Estimation of
Mean and Covariance Structures With Missing Data.’’ Journal of the American Statistical
Association 85:195-203.
Bentler, Peter M. 1983. ‘‘Some Contributions to Efficient Statistics in Structural Models:
Specification and Estimation of Moment Structures.’’ Psychometrika 48:493-517.
———. Forthcoming. EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate
Software.
Bentler, Peter M. and Theo K. Dijkstra. 1985. ‘‘Efficient Estimation Via Linearization
in Structural Models.’’ Pp. 9-42 in Multivariate Analysis VI, edited by P. R. Krishnaiah.
Amsterdam: North-Holland.
Bentler, Peter M. and Ke-Hai Yuan. 1999. ‘‘Structural Equation Modeling With Small Sam-
ples: Test Statistics.’’ Multivariate Behavioral Research 34:181-97.
Bishop, Yvonne M. M., Stephen E. Fienberg, and Paul W. Holland. 1975. Discrete Multivari-
ate Analysis: Theory and Practice. Cambridge, MA: MIT Press.
Bollen, Kenneth A. 1989. Structural Equations With Latent Variables. New York: John
Wiley.
———. 2002. ‘‘Latent Variables in Psychology and the Social Sciences.’’ Annual Review of
Psychology 53:605-34.
Bonett, Douglas G., J. Arthur Woodward, and Robert L. Randall. 2002. ‘‘Estimating p-Values
for Mardia’s Coefficients of Multivariate Skewness and Kurtosis.’’ Computational Statis-
tics 17:117-22.
Boomsma, Anne. 2000. ‘‘Reporting on Structural Equation Analyses.’’ Structural Equation
Modeling 7:461-83.
Breckler, Steven J. 1990. ‘‘Application of Covariance Structure Modeling in Psychology:
Cause for Concern?’’ Psychological Bulletin 107:260-73.
Browne, Michael W. 1984. ‘‘Asymptotic Distribution-Free Methods for the Analysis of Covar-
iance Structures.’’ British Journal of Mathematical and Statistical Psychology 37:62-83.
Browne, Michael W. and Gehard Arminger. 1995. ‘‘Specification and Estimation of Mean and
Covariance Structure Models.’’ Pp. 185-249 in Handbook of Statistical Modeling for the
Social and Behavioral Sciences, edited by G. Arminger, C. C. Clogg, and M. E. Sobel.
New York: Plenum.
Browne, Michael W. and Alexander Shapiro. 1988. ‘‘Robustness of Normal Theory Methods
in the Analysis of Linear Latent Variate Models.’’ British Journal of Mathematical and
Statistical Psychology 41:193-208.
Casella, George and Roger L. Berger. 2002. Statistical Inference. Pacific Grove, CA: Duxbury.
Dijkstra, Theo K. 1981. ‘‘Latent Variables in Linear Stochastic Models: Reflections on ‘Max-
imum Likelihood’ and ‘Partial Least Squares’ Methods.’’ Ph.D. dissertation, University
of Groningen.
Ferguson, Thomas S. 1996. A Course in Large Sample Theory. London: Chapman & Hall.
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 255
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Hays, William L. 1994. Statistics. 5th ed. Fort Worth, TX: Harcourt Brace.
Hu, Li-tze, Peter M. Bentler, and Yutaka Kano. 1992. ‘‘Can Test Statistics in Covariance
Structure Analysis Be Trusted?’’ Psychological Bulletin 112:351-62.
Kano, Yukata, Maria Berkane, and Peter M. Bentler. 1990. ‘‘Covariance Structure Analysis
With Heterogeneous Kurtosis Parameters.’’ Biometrika 77:575-85.
———. 1993. ‘‘Statistical Inference Based on Pseudo-Maximum Likelihood Estimators in
Elliptical Populations.’’ Journal of the American Statistical Association 88:135-43.
MacCallum, Robert C. and James T. Austin. 2000. ‘‘Applications of Structural Equation
Modeling in Psychological Research.’’ Annual Review of Psychology 51:201-26.
Magnus, Jan R. and Heinz Neudecker. 1999. Matrix Differential Calculus With Applications
in Statistics and Econometrics. Rev. ed. New York: John Wiley.
Mardia, Kanti V. 1970. ‘‘Measures of Multivariate Skewness and Kurtosis With Applica-
tions.’’ Biometrika 57:519-30.
———. 1974. ‘‘Applications of Some Measures of Multivariate Skewness and Kurtosis in
Testing Normality and Robustness Studies.’’ Sankhya B 35:115-28.
Micceri, Theodore. 1989. ‘‘The Unicorn, the Normal Curve, and Other Improbable
Creatures.’’ Psychological Bulletin 105:156-66.
Mooijaart, Ab and Peter M. Bentler. 1991. ‘‘Robustness of Normal Theory Statistics in Struc-
tural Equation Models.’’ Statistica Neerlandica 45:159-71.
Neumann, Craig S. 1994. ‘‘Structural Equation Modeling of Symptoms of Alcoholism and
Psychopathology.’’ Ph.D. dissertation, University of Kansas.
Poon, Wai-Yin and Sik-Yum Lee. 1994. ‘‘A Distribution Free Approach for Analysis of
Two-Level Structural Equation Model.’’ Computational Statistics and Data Analysis
17:265-75.
Satorra, Albert. 1989. ‘‘Alternative Test Criteria in Covariance Structure Analysis: A Unified
Approach.’’ Psychometrika 54:131-51.
Satorra, Albert and Peter M. Bentler. 1988. ‘‘Scaling Corrections for Chi-Square Statistics in
Covariance Structure Analysis.’’ Pp. 308-13 in American Statistical Association 1988
Proceedings of Business and Economics Sections. Alexandria, VA: American Statistical
Association.
———. 1990. ‘‘Model Conditions for Asymptotic Robustness in the Analysis of Linear Rela-
tions.’’ Computational Statistics and Data Analysis 10:235-49.
———. 1994. ‘‘Corrections to Test Statistics and Standard Errors in Covariance Structure
Analysis.’’ Pp. 399-419 in Latent Variables Analysis: Applications for Developmental
Research, edited by A. von Eye and C. C. Clogg. Newbury Park, CA: Sage.
Satorra, Albert and William Saris. 1985. ‘‘Power of the Likelihood Ratio Test in Covariance
Structure Analysis.’’ Psychometrika 50:83-90.
Shapiro, Alexander. 1983. ‘‘Asymptotic Distribution Theory in the Analysis of Covariance
Structures (A Unified Approach).’’ South African Statistical Journal 17:33-81.
———. 1987. ‘‘Robustness Properties of the MDF Analysis of Moment Structures.’’ South
African Statistical Journal 21:39-62.
Shapiro, Alexander and Michael W. Browne. 1987. ‘‘Analysis of Covariance Structures
Under Elliptical Distributions.’’ Journal of the American Statistical Association 82:
1092-97.
Slutsky, Eugene. 1925. Uber Stochastische Asymptoten and Grenzwerte.’’ Metron 5:1-90.
Snedecor, George W. and William G. Cochran. 1989. Statistical Methods. 8th ed. Ames:
Iowa State University Press.
Steiger, James H., Alexander Shapiro, and Michael W. Browne. 1985. ‘‘On the Multivariate
Asymptotic Distribution of Sequential Chi-Square Statistics.’’ Psychometrika 50:253-64.
256 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Tabachnick, Barbara G. and Linda S. Fidell. 2001. Using Multivariate Statistics 4th ed.
New York: HarperCollins.
Yuan, Ke-Hai. 2005. ‘‘Fit Indices Versus Test Statistics.’’ Multivariate Behavior Research
40:115-148.
Yuan, Ke-Hai and Peter M. Bentler. 1997a. ‘‘Improving Parameter Tests in Covariance Struc-
ture Analysis.’’ Computational Statistics and Data Analysis 26:177-98.
———-. 1997b. ‘‘Mean and Covariance Structure Analysis: Theoretical and Practical
Improvements.’’ Journal of the American Statistical Association 92:767-74.
———-. 1998a. ‘‘Robust Mean and Covariance Structure Analysis.’’ British Journal of
Mathematical and Statistical Psychology 51:63-88.
———-. 1998b. ‘‘Structural Equation Modeling With Robust Covariances.’’ Sociological
Methodology 28:363-96.
———-. 1998c. ‘‘Normal Theory Based Test Statistics in Structural Equation Modelling.’’
British Journal of Mathematical and Statistical Psychology 51:289-309.
———-. 1999a. ‘‘On Asymptotic Distributions of Normal Theory MLE in Covariance Struc-
ture Analysis Under Some Nonnormal Distributions.’’ Statistics and Probability Letters
42:107-13.
———-. 1999b. ‘‘On Normal Theory and Associated Test Statistics in Covariance Structure
Analysis Under Two Classes of Nonnormal Distributions.’’ Statistica Sinica 9:831-53.
———-. 1999c. ‘‘F-tests for Mean and Covariance Structure Analysis.’’ Journal of Educa-
tional and Behavioral Statistics 24:225-43.
———-. 2000a. ‘‘Inferences on Correlation Coefficients in Some Classes of Nonnormal
Distributions.’’ Journal of Multivariate Analysis 72:230-48.
———-. 2000b. ‘‘Three Likelihood-Based Methods for Mean and Covariance Structure
Analysis With Nonnormal Missing Data.’’ Sociological Methodology 30:167-202.
———-. 2000c. ‘‘Robust Mean and Covariance Structure Analysis Through Iteratively
Reweighted Least Squares.’’ Psychometrika 65:43-58.
———-. 2001. ‘‘Effect of Outliers on Estimators and Tests in Covariance Structure Analysis.’’
British Journal of Mathematical and Statistical Psychology 54:161-75.
———-. 2002a. ‘‘On Robustness of the Normal-Theory Based Asymptotic Distributions of
Three Reliability Coefficient Estimates.’’ Psychometrika 67:251-9.
———-. 2002b. ‘‘On Normal Theory Based Inference for Multilevel Models With Distribu-
tional Violations.’’ Psychometrika 67:539-61.
———-. 2003. ‘‘Eight Test Statistics for Multilevel Structural Equation Models.’’ Psychome-
trika 44:89-107.
———-. Forthcoming. ‘‘Mean Comparison: Manifest Variable Versus Latent Variable.’’
Psychometrika.
Yuan, Ke-Hai, Peter M. Bentler, and Wai Chan. 2004. ‘‘Structural Equation Modeling With
Heavy Tailed Distributions.’’ Psychometrika 69:421-36.
Yuan, Ke-Hai, Wai Chan, and Peter M. Bentler. 2000. ‘‘Robust Transformation With Appli-
cations to Structural Equation Modeling.’’ British Journal of Mathematical and Statistical
Psychology 53:31-50.
Yuan, Ke-Hai and Kentaro Hayashi. 2003. ‘‘Bootstrap Approach to Inference and Power
Analysis Based on Three Statistics for Covariance Structure Models.’’ British Journal of
Mathematical and Statistical Psychology 56:93-110.
Yuan, Ke-Hai, Kentaro Hayashi, and Peter M. Bentler. (N.d.) ‘‘Normal theory likelihood ratio
statistic for mean and covariance structure analysis under alternative hypotheses.’’
Yuan, Ke-Hai and Linda L. Marshall. 2004. ‘‘A New Measure of Misfit for Covariance Struc-
ture Models.’’ Behaviormetrika 31:1-24.
Yuan et al. / EFFECT OF SKEWNESS AND KURTOSIS 257
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from
Yuan, Ke-Hai, Linda L. Marshall, and Rebecca Weston. 2002. ‘‘Cross-Validation Through
Downweighting Influential Cases in Structural Equation Modeling.’’ British Journal of
Mathematical and Statistical Psychology 55:125-43.
Ke-Hai Yuan is an associate professor in quantitative psychology at the University of
Notre Dame. His research interests are in the areas of psychometric theory and applied
multivariate statistics. He received the Cattell award for early career outstanding multi-
variate research from the Society of Multivariate Experimental Psychology in 2002.
Peter M. Bentler is a Distinguished Professor of Psychology and Statistics at UCLA
who has been an elected president of the Psychometric Society, the Society of Multivari-
ate Experimental Psychology, and the Division of Evaluation, Measurement, and Statis-
tics of the American Psychological Association. He directs the Center for Collaborative
Research on Drug Abuse.
Wei Zhang is a graduate student in quantitative psychology at the University of Notre
Dame. His primary research interest is structural equation modeling.
258 SOCIOLOGICAL METHODS & RESEARCH
© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on February 6, 2008 http://smr.sagepub.comDownloaded from