Cross-Cultural Examination of Measurement Invariance of the Beck Depression Inventory–II

15
Psychological Assessment Cross-Cultural Examination of Measurement Invariance of the Beck Depression Inventory–II Jessica Dere, Carolyn A. Watters, Stephanie Chee-Min Yu, R. Michael Bagby, Andrew G. Ryder, and Kate L. Harkness Online First Publication, October 13, 2014. http://dx.doi.org/10.1037/pas0000026 CITATION Dere, J., Watters, C. A., Yu, S. C.-M., Bagby, R. M., Ryder, A. G., & Harkness, K. L. (2014, October 13). Cross-Cultural Examination of Measurement Invariance of the Beck Depression Inventory–II. Psychological Assessment. Advance online publication. http://dx.doi.org/10.1037/pas0000026

Transcript of Cross-Cultural Examination of Measurement Invariance of the Beck Depression Inventory–II

Psychological Assessment

Cross-Cultural Examination of Measurement Invariance ofthe Beck Depression Inventory–IIJessica Dere, Carolyn A. Watters, Stephanie Chee-Min Yu, R. Michael Bagby, Andrew G.Ryder, and Kate L. HarknessOnline First Publication, October 13, 2014. http://dx.doi.org/10.1037/pas0000026

CITATIONDere, J., Watters, C. A., Yu, S. C.-M., Bagby, R. M., Ryder, A. G., & Harkness, K. L. (2014,October 13). Cross-Cultural Examination of Measurement Invariance of the Beck DepressionInventory–II. Psychological Assessment. Advance online publication.http://dx.doi.org/10.1037/pas0000026

Cross-Cultural Examination of Measurement Invariance of theBeck Depression Inventory–II

Jessica DereUniversity of Toronto Scarborough

Carolyn A. WattersUniversity of Toronto

Stephanie Chee-Min YuQueen’s University

R. Michael BagbyUniversity of Toronto

Andrew G. RyderConcordia University and Jewish General Hospital,

Montreal, Quebec, Canada

Kate L. HarknessQueen’s University

Given substantial rates of major depressive disorder among college and university students, as well as thegrowing cultural diversity on many campuses, establishing the cross-cultural validity of relevantassessment tools is important. In the current investigation, we examined the Beck Depression Inventory—Second Edition (BDI–II; Beck, Steer, & Brown, 1996) among Chinese-heritage (n � 933) and European-heritage (n � 933) undergraduates in North America. The investigation integrated 3 distinct lines ofinquiry: (a) the literature on cultural variation in depressive symptom reporting between people ofChinese and Western heritage; (b) recent developments regarding the factor structure of the BDI–II; and(c) the application of advanced statistical techniques to the issue of cross-cultural measurement invari-ance. A bifactor model was found to represent the optimal factor structure of the BDI–II. Multigroupconfirmatory factor analysis showed that the BDI–II had strong measurement invariance across bothculture and gender. In group comparisons with latent and observed variables, Chinese-heritage studentsscored higher than European-heritage students on cognitive symptoms of depression. This findingdeviates from the commonly held view that those of Chinese heritage somatize depression. Thesefindings hold implications for the study and use of the BDI–II, highlight the value of advanced statisticaltechniques such as multigroup confirmatory factor analysis, and offer methodological lessons forcross-cultural psychopathology research more broadly.

Keywords: major depressive disorder, Beck Depression Inventory—Second Edition, factor analysis,measurement invariance, cross-cultural

Supplemental materials: http://dx.doi.org/10.1037/pas0000026.supp

Major depressive disorder (MDD) is among the most preva-lent psychiatric disorders in North America and around theworld (Kessler et al., 2003; Patten et al., 2006), and it represents

one of the greatest sources of disease burden internationally(Cuijpers, Beekman, & Reynolds, 2012; Ustün, Ayuso-Mateos,Chatterji, Mathers, & Murray, 2004).1 MDD is projected to

Jessica Dere, Department of Psychology, University of Toronto Scar-borough; Carolyn A. Watters, Department of Psychology, University ofToronto; Stephanie Chee-Min Yu, Department of Psychology, Queen’sUniversity; R. Michael Bagby, Departments of Psychiatry and Psychology,University of Toronto; Andrew G. Ryder, Department of Psychology andCentre for Clinical Research in Health, Concordia University, and Cultureand Mental Health Research Unit and Lady Davis Institute, Jewish GeneralHospital, Montreal, Quebec, Canada; Kate L. Harkness, Department ofPsychology, Queen’s University.

Stephanie Chee-Min Yu is now at the Department of Social Work andSocial Administration, The University of Hong Kong.

The first and second authors contributed equally to this work. Prepara-tion of this article was supported by a fellowship granted to Jessica Dere inthe Social Aetiology of Mental Illness training program, a Strategic Train-ing Initiative in Health Research funded by the Canadian Institutes ofHealth Research (CIHR), based at the Centre for Addiction and Mental

Health and the University of Toronto. Preparation of the article was alsosupported by the Joseph-Armand Bombardier Doctoral Scholarship,awarded to Carolyn A. Watters by the Social Sciences and HumanitiesResearch Council of Canada, and by a New Investigator Award from theCIHR, awarded to Andrew G. Ryder.

Correspondence concerning this article should be addressed to JessicaDere, Department of Psychology, University of Toronto Scarborough,1265 Military Trail, Toronto, Ontario, Canada M1C 1A4, or to Kate L.Harkness, Department of Psychology, Queen’s University, 222 Craine,Kingston, Ontario, Canada K7L 3N6. E-mail: [email protected] [email protected]

1 Disease burden broadly refers to a disease’s relative impact on mor-bidity and mortality and is commonly measured in disability-adjusted lifeyears (Ustün et al., 2004).

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Psychological Assessment © 2014 American Psychological Association2014, Vol. 26, No. 4, 000 1040-3590/14/$12.00 http://dx.doi.org/10.1037/pas0000026

1

rank first among all sources of disease burden in high-incomecountries by the year 2030 (Mathers & Loncar, 2006). It typi-cally has its initial onset in late adolescence or young adulthood(Eisenberg, Gollust, Golberstein, & Hefner, 2007). Rates ofMDD in this age group, particularly among college and univer-sity students, appear to be even higher than in the generalpopulation and may be increasing (Bayram & Bilgel 2008;Ibrahim, Kelly, Adams, & Glazebrook, 2013; Young, Fang, &Zisook, 2010). The onset of MDD at this life stage is associatedwith significant negative consequences across numerous lifedomains that persist throughout adulthood. Crucial to earlyidentification and intervention efforts in MDD is the existenceof well-validated, psychometrically sound assessment tools.

The increasing cultural diversity of many undergraduate studentbodies (e.g., Horn, Peter, Rooney, & Malizio, 2002) means thatassessment tools for MDD must also be cross-culturally valid. Thisis particularly salient given the significant cultural variation in thepresentation of depressive symptoms (e.g., Kirmayer, 2001; Ryderet al., 2008). In the current study we examine one of the mostcommonly used self-report measures of depressive symptoms—the Beck Depression Inventory—Second Edition (BDI–II; Beck,Steer, & Brown, 1996)—among Chinese-heritage and European-heritage undergraduate students in North America.2 With the aimof advancing the understanding of the cultural shaping of depres-sive symptom reporting, we integrate three distinct lines of in-quiry: (a) the literature on cultural variation in depressive symptomreporting between people of Chinese and “Western” heritage; (b)recent developments regarding the factor structure of the BDI–II;and (c) the application of advanced statistical techniques to theissue of cross-cultural measurement equivalence. Supplementingthis cross-cultural focus, our analyses also include examination ofpotential gender effects within each cultural group. To the best ofour knowledge, this study is the first to examine measurementinvariance of the BDI–II on the basis of both cultural and gender-within-culture group comparisons.

Cultural Variations in the Reportingof Depression Symptoms

The empirical literature on cultural variations in depression isdominated by comparisons of Chinese and Western samples. It isrooted in a long-standing theoretical and empirical interest in thephenomenon of Chinese somatization, a popular topic in the in-terdisciplinary field of culture and mental health (see Ryder &Chentsova-Dutton, 2012). The idea that people of Chinese heritagetend to emphasize somatic (e.g., fatigue, sleep difficulties) ratherthan psychological (e.g., hopelessness, guilt) symptoms of depres-sion has been discussed for several decades (e.g., Kleinman, 1982;Parker, Cheah, & Roy, 2001; Ryder et al., 2008); this pattern hasbeen examined among Chinese-heritage samples in both EastAsian and Western contexts (e.g., Parker, Chan, Tully, & Eisen-bruch, 2005; Yen, Robins, & Lin, 2000). A relative emphasis onsomatic symptoms is contrasted—whether explicitly or implic-itly—with a Western emphasis on psychological symptoms ofdepression, which has been termed psychologization (e.g., Ryder& Chentsova-Dutton, 2012). Although Chinese somatization hasbeen discussed since the 1970s, direct cross-cultural comparisonsbetween Chinese and Western samples in the presentation ofsymptoms have only recently been conducted.

In reviewing the existing empirical literature, a contrast emergesbetween clinical and nonclinical samples regarding the extent ofsupport for the idea of Chinese somatization. The only two cross-national studies with clinical samples provide support for Chinesesomatization and for Western psychologization (Parker et al.,2001; Ryder et al., 2008). Parker et al. (2001) found that a greaterpercentage of Malaysian Chinese outpatients than Euro Australianoutpatients reported a somatic symptom as their primary present-ing complaint. Furthermore, the Chinese group endorsed moresomatic symptoms on a symptom questionnaire, whereas the EuroAustralians endorsed more psychological symptoms. Ryder et al.(2008) compared depressive symptom reporting between Chineseand Euro Canadian depressed psychiatric outpatients and foundgreater somatic symptom reporting among the Chinese and greaterpsychological symptom reporting among the Euro Canadians. Inboth of these investigations, the cultural difference in psycholog-ical symptom reporting was a stronger effect than the difference insomatic symptom reporting.

The pattern of cultural variation in depressive symptom report-ing is not as consistent in studies using college student and com-munity samples. Broad claims to the effect that Chinese groupsalways somatize distress and depression are likely overstated (seealso Dere et al., 2013; Zhou et al., 2011). Yen et al. (2000) foundthat Chinese students seeking mental health services endorsed ahigher proportion of somatic symptoms of depression, using aChinese translation of the Center for Epidemiological StudiesDepression Scale (CES-D; Radloff, 1977), than did a Chinesestudent control group. In a separate study comparing Chinese,Chinese American, and Euro American students, however, theChinese group reported a lower proportion of somatic symptomson the CES-D than did the other groups; participants in this studywere not recruited or differentiated on the basis of help-seekingstatus. In their interpretation of the findings across their twostudies, Yen et al. (2000) suggested that a Chinese emphasis onsomatic symptoms might occur only in the context of help seeking.

In a comparison of adolescents in Hong Kong and the UnitedStates, Stewart et al. (2002) found that, among adolescents diag-nosed with MDD, those in Hong Kong endorsed fatigue/loss ofenergy more frequently and irritability less frequently than didthose in the United States. In contrast, among adolescents withdiagnostically subthreshold depression, those in Hong Kong en-dorsed a variety of both somatic (i.e., weight/appetite changes,fatigue/loss of energy, psychomotor changes) and psychological(i.e., worthlessness/guilt, concentration difficulties/indecisiveness,suicidal ideation) symptoms more often than did those in theUnited States. These latter results in particular are contrary totheoretical expectations about the absence of psychological symp-toms among Chinese samples.

Though their study was not a cross-cultural comparison, Chang(2007) examined symptom reporting on a Chinese version of theBDI–II (C-BDI–II) among “probably depressed” (i.e., C-BDI–IIscore � 16) and “nondepressed” (i.e., C-BDI–II score � 16)Chinese college students in Taiwan. Groups were compared on the

2 The terms Chinese-heritage and European-heritage are used here torefer to the ethno-cultural background of our participants in broad terms, inrecognition of the heterogeneous makeup of the two cultural groups withinour sample.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

2 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

basis of their scores on somatic and cognitive-affective factors onthe C-BDI–II, as derived through exploratory factor analysis. Onthe whole, the Chinese students endorsed a higher proportion ofsomatic symptoms (e.g., fatigue, changes in appetite) thancognitive-affective symptoms (e.g., worthlessness, pessimism).Relative emphasis on somatic versus cognitive-affective symp-toms varied, however, by level of severity. The probably depressedgroup placed less emphasis on somatic symptoms and more em-phasis on cognitive-affective symptoms than did the nondepressedgroup. Similar to the findings of Stewart et al. (2002), these resultssuggest that patterns of symptom reporting vary with level ofseverity and demonstrate that a simple prediction of a somaticsymptom emphasis among Chinese-heritage college students maynot be warranted.

In sum, the studies above suggest that claims regarding Chinesesomatization and Western psychologization have to be qualified,as variation in symptom reporting may depend on symptom se-verity and treatment-seeking status. What is absent from thisliterature is any discussion of the extent to which these findingshave been influenced by potential variation in how the items onsymptom measures are interpreted by respondents from differentcultural groups. This issue is important, given that proposed ex-planations for cultural group differences in depressive symptomreporting include cultural variations in response style or responsebiases (e.g., Lam, Pepper, & Ryabchenko, 2004) and that there isa substantial literature on cross-cultural measurement and meth-odology (e.g., Chen, 2008; Little, 1997; Milfont & Fischer, 2010;van de Vijver & Leung, 2000). Central to interpreting the abovefindings, therefore, is establishing cross-cultural measurementequivalence or invariance across Chinese and Western groups.3 Toour knowledge, the current study is the first to undertake this taskin college-age samples.

The examination of measurement equivalence for an assessmenttool first requires that a baseline factor structure can be established.Equivalence or invariance analyses proceeds by determining thebest fitting factor structure across the groups of interest. In the caseof the BDI–II, this proves to be an area of considerable debate.Therefore, before moving to a discussion of measurement equiv-alence, we provide a summary of recent developments regardingthe factor structure of the BDI–II.

Factor Structure of the BDI–II

The BDI–II (Beck, Steer, & Brown, 1996) is a 21-item self-report measure of depressive symptoms, designed to correspondwith the Diagnostic and Statistical Manual of Mental Disorders(4th ed.; American Psychiatric Association, 1994) diagnostic cri-teria for MDD. Respondents rate the severity of symptoms basedon their experiences over the previous 2 weeks, on a scale rangingfrom 0 to 3. The response options vary across items, but the scalealways progresses from low to high severity. Individual itemscores are summed to provide a total score of depression severity.This measure generally shows strong reliability and validity,across a variety of study populations (e.g., Beck, Steer, Ball, &Ranieri, 1996; Storch, Roberti, & Roth, 2004; Wiebe & Penley,2005).

In their original presentation of the BDI–II, Beck, Steer, andBrown (1996) suggested that a two-factor structure provided thebest fit among both clinical outpatients and college students.

However, the two factors differed between the groups; somatic-affective and cognitive factors were found for the clinical sample,whereas cognitive-affective and somatic factors were found for thestudents (see Dozois, Dobson, & Ahnberg, 1998). A large numberof alternative factor structures (models) have since been proposedand have received stronger support, across a variety of samples(e.g., Brouwer, Meijer, & Zevalkink, 2013; Quilty, Zhang, &Bagby, 2010; Vanheule, Desmet, Groenvynck, Rosseel, & Fon-taine, 2008). A number of these structures, with a focus on modelsthat have previously been tested among undergraduate studentsand in Chinese-heritage samples, are displayed in Table 1. De-tailed summaries of the studies referenced in Table 1 can be foundelsewhere (Brouwer et al., 2013; Quilty et al., 2010; Vanheule etal., 2008); however, it is worthwhile to highlight several importantobservations regarding the different models.

All of the models except for one (Model 1) include somecombination of cognitive, somatic, and affective symptom factors.However, there is considerable inconsistency in the item compo-sition of factors across the proposed structures, and many items areincluded in different symptom domains across different models.The presence of many different factor structures hinders the inter-pretation and comparability of results across studies. Adding tothis issue, Model 1 is made up of three factors (negative attitude,performance difficulty, and somatic elements) that do not easilymap onto the cognitive, somatic, or affective factors included inthe majority of other models. Model 1 is also the model mostcommonly used in studies examining the BDI–II among Chineseand East Asian groups (e.g., Byrne, Stewart, Kennard, & Lee,2007; Whisman, Juss, Whiteford, & Gelhorn, 2013; Wu, 2010; Wu& Huang, 2014), making it particularly difficult to interpret theresults of these studies within the broader BDI–II literature.

Notwithstanding the inconsistencies across these studies, thereis some consensus emerging supporting a bifactor structure for theBDI–II (Al-Turkait & Ohaeri, 2010; Brouwer et al., 2013; Osman,Barrios, Gutierrez, Williams, & Bailey, 2008; Quilty et al., 2010;Ward, 2006). In a bifactor model, items are specified to load ontoa general factor made up of all scale items, and they may have anadditional loading on a subfactor composed of a subset of itemsrepresenting a narrower construct. The effects of the general factorare not included in the loadings on the subfactors. Second-ordermodels are similar to bifactor models in that they both represent ahierarchical structure; however, only bifactor models can separatethe unique variance of the subfactors from the variance of thegeneral construct. Therefore, a bifactor approach has the advantageof simultaneously capturing a general construct shared by a set ofsubfactors, as well as the unique effect of each subfactor over andabove the general construct (Chen, Hayes, Carver, Laurenceau, &Zhang, 2012). Support for a bifactor model of depression followsa broader trend in psychopathology research, which suggests thatbifactor models offer a valuable approach for analyzing clinicalquestionnaires that reflect a single overarching construct whilealso being multidimensional (see Brouwer et al., 2013; Reise,

3 The terms measurement equivalence and measurement invariance aregenerally seen as synonymous. Equivalence is more often used in concep-tual discussions of this topic, including in the cultural literature, whereasinvariance is favored in statistically focused discussions (Little, 1997).Following these norms, both terms are used in the current paper asappropriate.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

3CROSS-CULTURAL EXAMINATION OF THE BDI–II

Bonifay, & Haviland, 2013; Watters, Keefer, Kloosterman, Sum-merfeldt, & Parker, 2013).

Ward (2006) was the first to investigate a bifactor model of theBDI–II and proposed a model in which only some of the itemswere assigned to either a cognitive or a somatic subfactor (Model6 in Table 1). Of interest, the items that did not load onto eithersubfactor in Ward’s model show the greatest inconsistency interms of factor assignment across previous studies, as shown inTable 1. Further, when comparing the quality of different models,subsequent studies have found that Ward’s (2006) model not onlyis optimal compared to first and second-order versions of the samemodel (Al-Turkait & Ohaeri, 2010; Brouwer et al., 2013; Quilty etal., 2010) but is also superior to bifactor versions of other BDI–IImodels (Al-Turkait & Ohaeri, 2010). All of the loadings in Ward’smodel tend to be strong and significant, whereas this is not the casewith other bifactor BDI–II models. In other words, Ward’s modelappears to be the most parsimonious bifactor BDI–II structure.However, no bifactor model of the BDI–II—including Ward’smodel—has been tested for measurement equivalence across ei-ther culture or gender.

Measurement Equivalence

When making comparisons between groups distinguished bysuch characteristics as cultural background or gender, researchers

routinely assume that the instrument(s) or test(s) they are usinghave “measurement equivalence,” such that they are measuring thesame construct across the groups in the same way (Chen, 2008;Little, 1997). An instrument or test can be said to show equiva-lence across groups when members of each group assign the samemeanings to the test instrument and its constituent items and whenrespondents who share the same level of the underlying constructobtain the same score regardless of group membership. Suchequivalence permits more meaningful group comparisons andreadily interpretable results than does a methodological scenario inwhich measurement invariance is not examined (Meredith, 1993).Nevertheless, the majority of cross-cultural studies do not typicallyexamine measurement equivalence. Insufficiently powered samplesizes and a lack of appropriate statistical methods likely contributeto the failure to examine or establish measure invariance.

Ongoing advances in structural equation modeling have led torobust statistical approaches for addressing measurement equiva-lence, often referred to as tests of measurement invariance (MI).Within the structural equation modeling framework, multigroupconfirmatory factor analysis (MG-CFA) is generally accepted asthe most common approach (Chen, 2008; Milfont & Fischer,2010). MG-CFA can also be used to conduct latent group com-parisons, testing for group differences in latent means, covari-ances, and variances. Such comparisons, known as tests of struc-

Table 1Overview of Previously Published BDI–II Factor Structures Derived and/or Tested in College orUniversity Student Samples, Chinese-Heritage Samples, and Studies Testing Complex BDI–IIStructures (i.e., Second-Order and Bifactor Models)

Item

First-order oblique Bifactor

1a,b 2a,b 3 4b 5b 6

1. Sadness NA C C C A2. Pessimism NA C C C A C3. Past failure NA C C C C C4. Loss of pleasure PD C C S A5. Guilty feelings NA C C C C C6. Punishment feelings NA C C C C C7. Self-dislike NA C C C C C8. Self-criticism NA C C C C C9. Suicidal thoughts NA C C C A C

10. Crying NA C C S S11. Agitation PD C C S S12. Loss of interest PD C C S A13. Indecisiveness PD C S C C14. Worthlessness NA C C C C C15. Loss of energy S S S S S S16. Changes in sleep S S S S S S17. Irritability PD C C S S18. Changes in appetite S S S S S S19. Concentration difficulty PD S S S S S20. Tiredness S S S S S S21. Loss of interest in sex S C C S S

Note. BDI–II � Beck Depression Inventory—II; A � affective; C � cognitive (or cognitive-affective: Models2, 3, 4, 6); S � somatic (or somatic-elements: Model 1; somatic-affective: Models 2, 3; somatic-vegetative:Model 4); NA � negative affectivity; PD � performance difficulty. Model 1 � Al-Turkait & Ohaeri (2010)a,b;Byrne et al. (2007; with Item 21 deleted)a; Whisman et al. (2013)a; Wu (2010); Wu & Huang (2014). Model 2 �Al-Turkait & Ohaeri (2010)a,b; Osman et al. (2008)b; Storch et al. (2004); Whisman et al. (2000), modified fromBeck et al. (1996; clinical sample). Model 3 � Beck et al. (1996; student sample). Model 4 � Brouwer et al.(2013)b; Dozois et al. (1998). Model 5 � Beck et al. (2002), as cited in Brouwer et al. (2013)b. Model 6 �Brouwer et al. (2013); Quilty et al. (2010); Ward (2006).a Second-order model. b Bifactor model.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

4 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

tural invariance, provide the advantage of controlling formeasurement error (Chen, 2008; Little, 1997; Meredith, 1993).

Four levels of measurement equivalence are generally dis-cussed, with each level defined by a more restrictive set of re-quirements (e.g., Milfont & Fischer, 2010). The first is functionalequivalence, which refers to whether or not the construct of inter-est exists in each group under study; this level of equivalencecannot in fact be tested statistically and is therefore not a focus ofthe current study.4 The next level is termed configural or structuralequivalence; it is met when a construct is made up of the samenumber of factors, with the same items associated with each factor,in each group. If this level is not met, the assessment tool is notmeasuring the same construct across groups. When the factorloadings of all items are also equal across the groups, the next levelof equivalence, metric or factor loading equivalence, is met.Equivalence at this level is required for meaningful comparison ofpredictive relationships across groups. Finally, scalar or interceptequivalence is met when individual items show the same point oforigin (i.e., intercept) across the groups; this level of equivalenceis necessary for comparing group means (Chen, 2008; Little,1997).

Chen (2008) demonstrated the potential consequences of assum-ing rather than testing for MI in cross-cultural research. In a seriesof simulation studies, Chen (2008) found that a lack of metricinvariance could lead to artificial interaction effects between twoconstructs. Furthermore, a lack of metric and scalar invariance canlead to spurious group differences or to a failure to uncover actualgroup differences. These findings suggest that failing to test for MIlikely contributes to the inconsistent pattern of results that can beseen in various areas of the cross-cultural literature, includingstudies on cultural variation in depressive symptom reporting.Consideration of measurement issues is necessary in order toadvance the field, by helping to tease apart actual cultural varia-tions from those driven by measurement artifacts. Testing for MIis relevant when considering the need for psychometrically soundassessment instruments that are appropriate for the identificationof depression among culturally diverse populations, such as thosefound on many North American college and university campuses.

Measurement Invariance Analyses of the BDI–II

To date, four studies have examined MI with the BDI–II. Wuand colleagues investigated MI across gender among Taiwaneseadolescents and college students, using the Chinese version of theBDI–II (Wu, 2010; Wu & Huang, 2014). Byrne et al. (2007)conducted MI testing of the BDI–II among adolescents in HongKong and the United States, using the Chinese version of theBDI–II in the Hong Kong sample. Whisman et al. (2013) con-ducted the only other MI study of the BDI–II with a cross-culturalfocus; they examined the BDI–II across gender and across ethno-cultural groups (White, Black, Asian, and Latino) among U.S.college student samples but did not examine gender within eachethno-cultural group.

All of these studies examined either a first-order or a second-order version of the three-factor structure that consists of negativeaffect, performance difficulty, and somatic elements. A bifactormodel was not considered in any of these studies. Nevertheless, allfour studies showed strong invariance (i.e., scalar or interceptinvariance; Meredith, 1993) when using a common criteria of a

change in confirmatory fit index of more than .01.5 Taken together,an important next step is to integrate the recent focus on bifactormodels, MI testing, and group comparisons in depressive symptomreporting into a single cross-cultural study using the BDI–II.

The Current Investigation

In the current investigation, we examine depressive symptomreporting on the BDI–II between Chinese-heritage (CH) andEuropean-heritage (EH) student groups. First, we test a number ofpotential baseline models—including bifactor models—and thenconduct an examination of configural, metric, and scalar levels ofMI across both culture and gender within culture, prior to makinggroup comparisons. This is the first study to investigate a bifactormodel in a Chinese-heritage sample and also the first to conductMI on a bifactor model of the BDI–II across both culture andgender. Our objectives were to (a) determine an optimal factorstructure for the BDI–II among CH and EH students in Canada; (b)examine the MI of the BDI–II in these two groups, across cultureand gender; and (c) contingent upon the MI results, examinecultural and gender variation in depressive symptom reporting inthese two groups using the BDI–II.

Method

Participants and Procedure

Participants were drawn from an archival database of studentswho completed a large questionnaire battery while enrolled in anintroductory psychology class at a Canadian university in theprovince of Ontario; the university has an undergraduate enroll-ment of approximately 16,000 students. The battery is adminis-tered on an annual basis, and the current data were collectedbetween the years 2005 and 2012. The battery consisted of aprescreening inventory and included a variety of self-report psy-chological measures. These measures included the BDI–II and ademographics questionnaire containing a set of basic questionsregarding age, gender, and ethno-cultural background. All partic-ipants provided written informed consent and received coursecredit for their participation; ethical approval was received fromthe university institutional review board.

To be included in the current study sample, participants musthave provided complete data concerning their gender and ethno-cultural identity, as well as complete data on the BDI–II. Onlystudents who indicated their ethno-cultural group membership as

4 Although the current study did not examine the functional level ofequivalence, previous research suggests that the depression symptom con-tent of “Western” instruments (such as the BDI–II) is comprehensible,endorsed, and even spontaneously reported as presenting problems by atleast some respondents in both Chinese and Euro Canadian cultural con-texts (e.g., Ryder et al., 2008).

5 Wu and Huang (2014) and Wu (2010) reported that several BDI–IIitems showed a lack of invariance at the scalar level, based on a significantchange in the Satorra–Bentler chi-square statistic (�S-B�2). However,others have suggested that �S-B�2 should be used only for descriptivepurposes in large sample sizes (such as those found in Wu, 2010, and Wu& Huang, 2014), due to the chi-square statistic’s sensitivity to sample size(Cheung & Rensvold, 2002). Using the criteria recommended by Cheungand Rensvold (2002) described in the analysis section, these studies sup-port strong invariance.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

5CROSS-CULTURAL EXAMINATION OF THE BDI–II

either “White” or “East Asian” were included in the sample.Students who self-identified as White were included in theEuropean-heritage group. Among those students who self-identified as East Asian, only those of Chinese heritage wereincluded in the final sample. This inclusion criterion was estab-lished in order to achieve a more culturally homogeneous sample;also, the students of Chinese heritage represented a significantmajority within the East Asian group. Inclusion in the Chinese-heritage group was based on participants’ response to an open-ended question inquiring about the country of origin of theirfamily. Students who self-identified as East Asian and who refer-enced China, Taiwan, and/or Hong Kong in their response to thisquestion were included in the CH group.

As the number of EH students was substantially larger than thenumber of CH students, EH participants were randomly selected tomatch the CH sample on gender, stratified by year of data collec-tion. After implementation of the inclusion criteria describedabove and the subsequent matching procedure, the final sampleconsisted of 933 CH (mean age � 18.14 years, SD � 1.25,range � 16 to 36 years) and 933 EH students (mean age � 18.21years, SD � 1.72, range � 16 to 43 years), of whom 68% werewomen. The percentage of participants by year of data collectionranged from 8% to 16% of the total sample. Within the CH group,53.6% were born in mainland China, Taiwan, or Hong Kong;42.2% were born in Canada, 3.1% were born elsewhere, and 1.1%did not provide a country of birth. As the data were drawn from anannual prescreening battery that includes different questionnairesfrom year to year, no additional sociodemographic variables wereavailable for the entire sample.

Analysis

Sequential sets of analyses were conducted for each of threepairs of between-group comparisons: CH versus EH, CH menversus CH women, and EH men versus EH women. In the first setof analyses, competing factor structures of the BDI–II (see Table1) were tested to establish the optimal BDI–II measurement modelacross groups. Due to mounting evidence that a bifactor model bestrepresents the BDI–II structure (e.g., Al-Turkait & Ohaeri, 2010;Brouwer et al., 2013; Osman et al., 2008; Quilty et al., 2010),first-order oblique (F-O) and second-order (S-O) hierarchical mod-els were also tested as bifactor models, similar to the method ofAl-Turkait and Ohaeri (2010). F-O and S-O models with the sameitem composition of lower order (i.e., F-O) factors are nestedwithin the related bifactor structure (see Chen et al., 2012). Twomodels are nested if one is a corresponding subset of the other, inthat the model structures are similar and differ only in the numberof free parameters included in the model (see Kline, 2011). Thus,a nested comparison test can be used to determine whether there isa significant difference in model fit. As an S-O model with onlytwo F-O factors would be underidentified (Kline, 2011), we didnot test this model. To identify each model, we set all factorvariances to one in order to establish consistency across the CFAsbeing compared within the same samples (Kline, 2011).

In the second set of analyses, MI testing of the optimal BDI–IIstructure was assessed through increasingly restrictive MG-CFAs,corresponding to configural (factor structure), metric (load-ings), and scalar (intercept or item mean) levels of invariance.MI was considered weak if the configural and metric levels

were invariant and strong if the configural, metric, and scalarlevels were invariant (Meredith, 1993). Strong MI was requiredin order to conduct group comparisons that would not beinflated or attenuated due to measurement error (Chen, 2008;Little, 1997; Meredith, 1993). In order to identify the modelsfor MI testing, the loading of the first item of each factor wasset to one (after running analyses to confirm that these itemsshowed invariant properties). To estimate the unique contribu-tions of each factor to the optimal model, we calculated ex-plained common variance for each cultural group, with ex-plained common variance being equal to the sum of squaredfactor loadings divided by the sum of squared model loadings(see Brouwer et al., 2013).

In the third set of analyses, latent and observed group com-parisons were conducted. To unpack group differences acrosscultural group and gender, two extra pairs of group comparisonswere added: CH men versus EH men and CH women versus EHwomen. Variance and covariance invariance analyses for latentcomparisons were planned if weak MI was met (i.e., factorloading invariance; Meredith, 1993). A nested comparison testwith one degree of freedom was used to indicate a significantdifference in variance across groups. Mean comparisons wereplanned if strong invariance was met (i.e., scalar or interceptinvariance; Meredith, 1993); the mean of the reference groupwould be constrained to equal zero while being freely estimatedin the comparison group, providing a relative difference zstatistic for the means across groups, as well as an effect size(Hancock, 2001). For observed analyses, weighted factor scoresbased on the optimal cultural by gender group model loadingswere formed. We then ran between-groups analyses using t testswith Bonferroni adjusted alpha levels of .003 (.05/18) andLevene’s test for homogeneity of variance. The patterns ofsignificant results found in the analyses using observed vari-ables versus those using latent variables were compared in orderto explore whether the observed analyses’ failure to control formeasurement error led to different results than those foundusing latent analyses.

All CFA models were tested with EQS 6.1 (Bentler, 2005) usingmaximum likelihood estimation; all other analyses were conductedwith SPSS 16.0. Several indices were used to assess goodness offit. Because the BDI–II scores were expected to violate normalityassumptions due to the use of a nonclinical sample (i.e., positiveskewness), a scaling correction using Satorra and Bentler’s (1994)method was utilized and robust indices were specified. Three otherfit indices were selected: root-mean-square error of approximation(RMSEA; Browne & Cudeck, 1993) and 90% confidence interval(90% CI); standardized root-mean-square residual (SRMR; Hu &Bentler, 1999); and comparative fit index (CFI; Bentler, 1990).The quality of each CFA model was evaluated according to thefollowing fit criteria: RMSEA � .08, SRMR � .10, and CFI � .90for acceptable fit; RMSEA � .05, SRMR � .08, and CFI � .95 forgood fit (Browne & Cudeck, 1993; Hu & Bentler, 1999). Given thelarge sample size, the Satorra–Bentler chi-square (S-B �2) wasused for descriptive purposes only due to the sensitivity of �2

statistics to sample size (Kline, 2011). To compare the quality ofnested models (i.e., F-O with bifactor and S-O with bifactor), weutilized change (�) CFI � �.01 (Chen, 2007). To compare thequality of non-nested models, we utilized the Akaike information

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

6 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

criterion (AIC; Akaike, 1987), in which a smaller number repre-sents the more optimal model.

Three statistics were used as indicators of invariance: S-B��2

(Byrne, 2006), where invariance was achieved if S-B��2 wasnonsignificant when change in degrees of freedom [�df] was usedto determine critical S-B��2; �CFI; and �RMSEA. As S-B��2

was included only for descriptive purposes, significantly bettermodel fit was determined based on �CFI � �.01 and�RMSEA � �.015 (Chen, 2007; Cheung & Rensvold, 2002). Theassumption of MI at each level was accepted if �CFI and�RMSEA did not show significant change between increasinglyrestrictive MI models.

Results

Preliminary Analyses

Data were screened for outliers, and less than 1% of thesample was found to have total BDI–II standardized scoresgreater than �3.29. As this percentage was considered minus-cule given the large sample size, outliers were not deleted(Tabachnick & Fidell, 2001). As expected, and similar to otherstudent samples (e.g., Wu, 2010; Wu & Huang, 2014), thedistributions of several BDI–II items were positively skewed.Descriptive statistics (i.e., mean, standard deviation, item–totalcorrelation, and t tests) by cultural group and gender withincultural group are displayed in Table 2. T tests showed that CHstudents scored significantly higher than EH students on fouritems (past failure, punishment feelings, indecisiveness, andworthlessness). EH women scored higher than men on two

items (crying and tiredness), whereas there were no differencesbetween CH men and women.

Model Comparisons of BDI–II Factor Structures

The goodness-of-fit indices for competing factor structures ofthe BDI–II are presented in Table 3. In every instance, the bifactormodel had significantly better fit than its related F-O and S-Omodels (i.e., �CFI � �.01) and a lower AIC value than non-related F-O and S-O models. With all of the bifactor modelsshowing adequate fit and minimal difference with respect to theAIC value, inspection of the parameter estimates indicated that themost parsimonious structure was Model 6; this was the sole modelcontaining only significant and interpretable loadings. All othermodels had multiple nonsignificant and/or negative loadingsacross culture and culture by gender groups; moreover, several ofthese models did not have adequate goodness-of-fit indices (anal-yses available upon request). As such, Model 6—corresponding toWard’s (2006) bifactor structure with a general depression severityfactor and two subfactors of cognitive (8 items) and somatic (5items) domains—was chosen as the BDI–II structure for all furtheranalyses. The standardized parameter estimates of this baselinemodel by cultural group are presented in Figure 1.6

Measurement Invariance Testing

The results of MI testing across pairs of comparison groups arepresented in Table 4. Strong invariance was achieved across cul-

6 Loadings for culture by gender subgroups are available upon request.

Table 2Descriptive Statistics for BDI-II Items by Cultural Group and Gender

Item

Chinese heritage (n � 933)/European heritage (n � 933)

Chinese heritage: men (n � 635)/women (n � 298)

European heritage: men (n � 635)/women (n � 298)

M SD r(i-t) t(1864) M SD r(i-t) t(1864) M SD r(i-t) t(1864)

1. Sadness .36/.32 .58/.53 .66/.59 �1.38 .36/.36 .61/.57 .67/.65 0.00 .30/.34 .54/.52 .61/.58 �1.132. Pessimism .42/.38 .59/.54 .58/.59 �1.51 .38/.44 .58/.60 .67/.53 �1.33 .36/.39 .53/.55 .59/.59 �0.963. Past failure .42/.25 .70/.55 .62/.59 �5.81� .43/42 .70/.70 .67/.59 0.22 .24/.26 .55/.55 .64/.56 �0.564. Loss of pleasure .33/.27 .55/.52 .58/.58 �2.50 .33/.33 .56/.55 .59/.57 �0.02 .33/.24 .55/.51 .52/.63 2.405. Guilty feelings .43/.42 .61/.60 .55/.51 �0.27 .47/.42 .64/.60 .59/.53 1.18 .44/.42 .58/.61 .49/.52 0.536. Punishment feelings .23/.14 .59/.48 .61/.43 �3.69� .23/.23 .60/.58 .71/.56 0.04 .19/.12 .56/.43 .38/.48 1.997. Self-dislike .49/.39 .76/.65 .73/.67 �2.83 .47/.49 .77/.75 .76/.71 �0.53 .33/.42 .59/.67 .67/.67 �2.098. Self-criticism .52/.55 .73/.72 .69/.59 0.90 .51/.52 .75/.72 .74/.67 �0.15 .54/.55 .75/.70 .57/.61 �0.259. Suicidal thoughts .15/.11 .44/.32 .65/.51 �2.52 .16/.15 .52/.40 .73/.60 0.47 .10/.11 .32/.32 .43/.55 �0.58

10. Crying .37/.34 .76/.65 .59/.55 �0.76 .29/.40 .67/.79 .60/.59 �2.26 .21/.40 .58/.67 .56/.54 �4.24�

11. Agitation .47/.49 .66/.62 .62/.53 0.69 .44/.48 .67/.65 .63/.61 �0.92 .46/.50 .63/.61 .48/.56 �0.7612. Loss of interest .29/.25 .56/.52 .59/.62 �1.63 .29/.29 .56/.56 .64/.57 �0.16 .24/.25 .51/.52 .56/.64 �0.2813. Indecisiveness .44/.33 .71/.59 .62/.62 �3.72� .44/.44 .70/.71 .71/.59 0.14 .34/.30 .59/.59 .65/.61 �0.9614. Worthlessness .30/.20 .64/.51 .70/.62 �3.76� .27/.32 .61/.65 .71/.71 �1.25 .18/.21 .50/.51 .54/.65 �0.8815. Loss of energy .56/.50 .66/.59 .70/.62 �2.05 .53/.57 .70/.66 .72/.68 �0.84 .44/.53 .57/.60 .62/.62 �2.0516. Changes in sleep .84/.91 .71/.73 .48/.48 2.16 .83/.84 .68/.72 .46/.49 �0.18 .86/.93 .71/.74 .49/.48 �1.3617. Irritability .27/.25 .55/.52 .69/.60 �0.56 .26/.27 .58/.54 .77/.64 �0.16 .25/.25 .53/.51 .60/.60 �0.0118. Changes in appetite .71/.72 .70/.72 .46/.47 0.52 .71/.70 .70/.69 .39/.50 0.39 .73/.72 .71/.73 .46/.48 0.1719. Concentration difficulty .54/.48 .69/.65 .59/.62 �1.82 .50/.55 .66/.70 .60/.59 �1.02 .41/.51 .64/.66 .67/.60 �2.1720. Tiredness .64/.58 .62/.60 .62/.64 �1.89 .63/.64 .64/.61 .64/.60 �0.31 .48/.63 .58/.61 .64/.63 �3.50�

21. Loss of interest in sex .17/.15 .52/.45 .49/.41 �0.77 .16/.18 .48/.53 .46/.50 �0.47 .14/.16 .40/.47 .33/.44 �0.53BDI–II total 8.93/8.04 8.13/6.82 �2.55 8.69/9.04 8.59/7.91 �0.63 7.53/8.28 6.59/6.91 �1.58

Note. BDI–II � Beck Depression Inventory—II; M � mean; SD � standard deviation; r(i-t) � item–total correlation coefficient; t(df) � t statistic(degrees of freedom).� p � .003, Bonferroni correction.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

7CROSS-CULTURAL EXAMINATION OF THE BDI–II

tural groups and for gender within each cultural group, meeting therequirements for latent comparison testing for all pairs of compar-isons. �CFI and �RMSEA were nonsignificant in all instances,supporting both weak (i.e., factor loading invariance) and strong(i.e., scalar or intercept invariance; Meredith, 1993) levels of MIacross cultural group and gender within cultural group. Explainedcommon variance values were also similar across the CH and EHgroups, with the general depression factor explaining 82% and80%, the cognitive factor explaining 12% and 11%, and thesomatic factor explaining 7% and 9% of the total variance in thetwo groups, respectively.7

Latent and Observed Group Comparisons

Variance invariance. Model fit and the results of latent vari-ance invariance testing for Model 6 (Ward, 2006) are displayed inTable 4; the variance of all factors (i.e., general depression, cog-nitive, somatic) was equivalent for all five pairs of comparisongroups. In contrast, Levene’s test for the homogeneity of variancedemonstrated that several pairs of comparison groups violated thisassumption. The CH group showed significantly more variance onthe general depression factor (Levene statistic), F(1, 1864) �27.06, p � .001, and on the cognitive subfactor, F(1, 1864) �57.29, p � .001, than did the EH group. The same pattern of resultsoccurred for the CH versus EH women, F(1, 1268) � 11.73, p �.001; F(1, 1268) � 50.73, p � .001, respectively, and for the CHversus EH men, F(1, 594) � 17.12, p � .001; F(1, 594) � 6.24,

p � .013, respectively. Within the CH group, women showedsignificantly more variance on the cognitive subfactor than didmen, F(1, 931) � 15.92, p � .001. Because there were no signif-icant differences in variance across groups when running latentmean comparisons, which control for measurement error, theseresults indicate that the violations of variance homogeneity werelargely due to measurement error that is not controlled for withobserved analyses.

Mean comparisons. Fit indices and tests of latent mean dif-ferences are presented in Table 5. Descriptive statistics (i.e., mean,standard deviation, and coefficient alphas) for the BDI–II factorsby cultural group and gender within cultural group are displayed inTable 6, with a summary of significant latent and observed results.Both latent and observed comparisons found that the CH groupscored significantly higher on the cognitive factor than the EHgroup, z � �3.04, d � .24; t(1864) � �4.74, p � .001, d � .22,whereas no significant cultural group difference was found on thesomatic factor. Further analyses showed that this difference was

7 Because several of the baseline models showed adequate fit across allfit indices (i.e., Model 1, F-O, S-O, and bifactor versions; bifactor versionof Models 2 through 6), we also ran supplementary analyses to test the MIof these models (see Tables S1 through S6 in the online supplementalmaterials). In all cases, evidence of strong MI (Meredith, 1993) was found,across both culture and gender within culture. These results further suggestthat the BDI–II shows particularly robust invariance.

Table 3Fit Indices for BDI–II Factor Structures Outlined in Table 1 (N � 933 for Each Group)

Model S-B�2 df CFI RMSEA, 90% CI SRMR AIC

1. Chinese-heritage F-O 425.46� 186 .909� .037 [.032, .024] .040 53.461a. Chinese-heritage S-O 360.27� 186 .925� .032 [.027, .037] .038 �11.731a. Chinese-heritage S-O (Item 21 deleted) 388.98� 167 .915� .038 [.033, .043] .039 54.981. Chinese-heritage bifactor 314.89 168 .944 .031 [.025, .036] .031 �21.111. European-heritage F-O 372.13� 186 .920� .033 [.028, .038] .040 0.131a. European-heritage S-O 360.25� 186 .925� .032 [.027, .037] .046 �11.751a. European-heritage S-O (Item 21 deleted) 336.19� 167 .928� .033 [.028, .038] .038 2.191. European-heritage bifactor 307.90 168 .940 .030 [.025, .035] .034 �28.102. Chinese-heritage F-O 497.10� 188 .882� .042 [.037, .046] .044 121.102. Chinese-heritage bifactor 328.04 168 .939 .032 [.027, .037] .032 �7.962. European-heritage F-O 432.01� 188 .895� .037 [.033, .042] .044 56.012. European-heritage bifactor 320.02 168 .935 .031 [.026, .036] .035 �15.983. Chinese-heritage F-O 488.25� 188 .886� .041 [.037, .046] .044 112.253. Chinese-heritage bifactor 317.51 168 .943 .031 [.026, .036] .031 �18.493. European-heritage F-O 445.41� 188 .890� .038 [.034, .043] .045 69.413. European-heritage bifactor 317.90 168 .936 .031 [.026, .036] .035 �18.104. Chinese-heritage F-O 493.18� 188 .884� .042 [.037, .046] .045 117.184. Chinese-heritage bifactor 327.72 168 .939 .032 [.027, .037] .032 �8.284. European-heritage F-O 455.53� 188 .885� .039 [.034, .044] .046 79.534. European-heritage bifactor 308.46 168 .940 .030 [.025, .035] .034 �27.555. Chinese-heritage F-O 501.01� 186 .880� .043 [.038, .047] .045 129.015. Chinese-heritage bifactor 347.96 168 .932 .034 [.029, .039] .034 11.965. European-heritage F-O 431.54� 186 .895� .038 [.033, .042] .045 59.545. European-heritage bifactor 318.03 168 .936 .031 [.026, .036] .036 �17.976. Chinese-heritage bifactor 341.97 176 .937 .032 [.027, .037] .033 �10.036. European-heritage bifactor 334.53 176 .932 .031 [.026, .036] .036 �17.48

Note. All S-B�2 values were significant at p � .001. BDI–II � Beck Depression Inventory—II; F-O � first order model; S-O � second-order model;S-B � Satorra–Bentler adjusted �2 test statistic; df � degrees of freedom; CFI � comparative fit index; RMSEA � root-mean-square error ofapproximation; 90% CI � 90% confidence interval for RMSEA; SRMR � standardized root-mean-square residual; AIC � Akaike information criterion.� Signifies significantly better fit for bifactor versus related first-order and second-order model [i.e., change (�) statistic exceeds critical � value for S-B�2

(�28.87 when �df � 18 and � 31.41 when �df � 20) and �CFI � �.01].

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

8 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

due to CH women scoring higher than EH women on cognitivesymptoms (z � �2.51, d � .20).

There were also several discrepant findings between the latentand observed analyses (see Table 6). Latent comparisons foundthat EH women scored higher than EH men on the somatic factor(z � 2.78, d � .031). Observed comparisons found that the CHgroup scored higher than the EH group on the general factor,t(1864) � �4.39, p � .001, d � .20. More specifically, the CHmen scored higher than the EH men on the general factor,t(1268) � �4.74, p � .001, d � .27. The CH men scored lowerthan the CH women on the cognitive factor, t(931) � 3.42, p �.001, d � .024. Although these group differences were small withrespect to effect size, these results highlight the fact that observedanalyses, which do not control for measurement error, can lead todifferent results from latent analyses, which do. In other words,significant group differences that are found using observed anal-yses (e.g., t tests, ANOVA), particularly those with small effectsizes, should be interpreted with caution, as such results could be

due to artifacts of measurement error rather than true group dif-ferences.

Discussion

In the current study we attempted to integrate several recentlines of research to provide a statistically rigorous examination ofpotential differences in depressive symptom reporting betweenChinese- and European-heritage students using the BDI–II. Ourinvestigation builds upon recent work in the areas of culturalvariation in depressive symptoms, the use of a bifactor approach tothe factor structure of the BDI–II, and the application of advancedstatistical techniques in testing measurement invariance. We thinkthat our findings not only contribute uniquely to each of theseresearch areas but also have important methodological implica-tions for cross-cultural psychopathology research more broadly.

We also think that the findings of the current investigationcontribute to the ongoing debate regarding the optimal factor

6. Punishment feelings (.55, .38)

7. Self-dislike (.66, .60)

5. Guilty feelings (.46, .40)

4. Loss of pleasure (.58, .60)

8. Self-criticalness (.62, .49)

9. Suicidal thoughts (.62, .49)

(.33, .27)

(.53, .39)

.90, ,.80(.27, .26)

(.21, .22)

(.38, .39)

(.32, .36)

(.33, .21)

3. Past failure (.51, .50)

10. Crying (.56, .51)

1. Sadness (.66, .59)

2. Pessimism (.50, .53)

11. Agitation (.60, .48)

12. Loss of Interest (.61, .63)

13. Indecisiveness (.60, .61)

14. Worthlessness (.64, .56)

15. Loss of energy (.66, .58)

16. Changes in sleep (.39, .37)

17. Irritability (.71, .61)

18. Changes in appetite (.40, .37)

19. Concentration difficulty (.54, .58)

20. Tiredness (.57, .57)

21. Loss of interest in Sex (.48, .41)

General Depression

Somatic

Cognitive

(.35, .39)

(.31, .38)

(.36, .32)

(.21, .25)

(.22, .21)

(.49, .54)

Figure 1. Standardized parameter estimates for Chinese-heritage (CH) and European-heritage (EH) BeckDepression Inventory–II baseline factor models (Ward, 2006). Numbers in parentheses are CH followed by EHloading estimates; all parameter estimates are significant at p � .05. Explained common variance percentagesfor CH and EH groups: general factor (82, 80); cognitive (12, 11); somatic (7, 9).

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

9CROSS-CULTURAL EXAMINATION OF THE BDI–II

structure of the BDI–II. All bifactor models showed significantlyimproved fit (based on change in CFI of � .01) over their nestedfirst- and second-order counterparts, across all culture and gendersubgroups. This result fits with those of other recent studies (e.g.,Al-Turkait & Ohaeri, 2010; Brouwer et al., 2013; Osman et al.,2008; Quilty et al., 2010) to provide further evidence for thesuperiority of a bifactor approach to the BDI–II. As this study wasthe first to test a bifactor model in a Chinese-heritage sample, ourfindings extend those of previous studies and add to a growingbody of literature supporting the idea that depression is marked bya general factor of depression severity while simultaneously con-taining multiple components (e.g., Simms, Grös, Watson, &O’Hara, 2008): the general factor accounted for approximately80% of the explained common variance in BDI–II scores. In termsof practical implications, this suggests that the calculation of a totalBDI–II score is useful and valid, a finding of considerable rele-vance to researchers and clinicians alike (Brouwer et al. 2013;Quilty et al., 2010).

Statistically, a bifactor model allows for the investigation ofdimensions of depression (e.g., cognitive and somatic) that con-tribute unique variance beyond the general severity factor, whichother approaches such as second-order models do not (Chen et al.,2012). From a cross-cultural research perspective, such an ap-proach is desirable in its ability to parse apart the variance asso-ciated with overall severity from that attributable to narrowersymptom domains, which tend to be the focus of attention. Asresearchers seek to uncover those variables that contribute to

cultural variation in depressive symptom reporting, stable androbust factor structures of the measures being used to assessdepression are crucial.

With regard to the cross-cultural use of the BDI–II, we foundstrong measurement invariance (i.e., configural, metric, and scalarlevels of invariance; Meredith, 1993) in our two groups, acrossboth culture and gender. This finding is notable in terms of thestrength of the invariance results and suggests that the BDI–IIrepresents a good choice as a self-report measure of depressivesymptoms, at least among these two groups. It is important to notethat the Chinese-heritage students in our study are likely to differfrom other Chinese groups in North America and elsewhere, basedon factors such as age, education level, and the level of English-language proficiency required to enter their university. Similarly, itis worth noting that all participants completed the BDI–II inEnglish, though Byrne et al. (2007) also found support for stronginvariance comparing the English- and Chinese-language versionsof the BDI–II. The use of a non-help-seeking student sample in thecurrent study limits the generalizability of these findings to sam-ples with greater levels of depression severity or those in clinicalsettings. Although we cannot unequivocally recommend the use ofthe BDI–II in cross-cultural comparisons of nonstudent ortreatment-seeking samples, when combined with the findings ofByrne et al. (2007) and Whisman et al. (2013) our results suggestthat there is mounting evidence that the BDI–II shows robustmeasurement invariance.

Table 4Fit Indices and Difference Statistics for Measurement Invariance Models (Configural, Metric, Scalar) by Cultural Group and GenderWithin Cultural Group

Model S-B�2 df CFI RMSEA, 90% CI SRMR �S-B�2 �df �CFI �RMSEA

Chinese heritage (n � 933), European heritage (n � 933)

Configural (structure) 676.63 352 .935 .031 [.028, .035] .035Metric (loadings) 725.62 382 .931 .031 [.028, .034] .048 50.52� 30 �.004 .000Scalar (intercepts) 806.29 403 .930 .032 [.028, .035] .048 112.27� 21 �.001 .001Variance depression 820.29 404 .928 .032 [.029, .036] .078 6.49� 1 �.002 .000Variance cognitive 811.90 404 .930 .033 [.030, .036] .051 3.92� 1 .000 .001Variance somatic 808.10 404 .930 .032 [.028, .035] .049 1.70 1 .000 .000

Chinese heritage: men (n � 298), women (n � 635)

Configural (structure) 538.49 352 .923 .034 [.028, .039] .042Metric (loadings) 581.36 382 .918 .033 [.028, .039] .063 43.32 30 �.005 �.001Scalar (intercepts) 605.57 403 .915 .033 [.027, .038] .063 18.44 21 �.003 .000Variance depression 602.58 404 .917 .034 [.029, .039] .076 0.73 1 .003 .001Variance cognitive 611.42 404 .913 .033 [.028, .038] .062 5.06� 1 �.002 .000Variance somatic 606.77 404 .915 .035 [.029, .040] .064 1.23 1 .000 .002

European heritage; men (n � 298), women (n � 635)

Configural (structure) 518.43 352 .927 .032 [.026, .038] .045Metric (loadings) 547.62 382 .927 .032 [.025, .036] .054 31.86 30 .000 .000Scalar (intercepts) 601.95 403 .926 .033 [.027, .038] .054 75.73� 21 �.001 .002Variance depression 603.90 404 .925 .032 [.026, .037] .057 1.74 1 �.001 .001Variance cognitive 604.00 404 .925 .032 [.026, .037] .055 1.74 1 �.001 .001Variance somatic 605.40 404 .925 .032 [.026, .037] .054 3.84 1 �.001 .001

Note. S-B�2 � Satorra–Bentler adjusted chi-square test statistic; df � degrees of freedom; CFI � comparative fit index; RMSEA � root-mean-squareerror of approximation; 90% CI � 90% confidence interval for RMSEA; SRMR � standardized root-mean-square residual; �S-B�2 � change in S-B�2

based on Byrne (2006) equation (asterisk indicates significant �2 value based on �df � change in df; where 21 � critical value � 32.67, 30 � criticalvalue � 43.77, and 1 � critical value � 3.84); �CFI � change in CFI (asterisk indicates � .01); �RMSEA � change in RMSEA (asterisk indicates �.015); comparison analyses include (a) metric versus configural; (b) scalar versus metric; (c) variance versus metric model.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

10 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

Examining the issue of cultural variation in symptom reporting,the current results are not consistent with the ideas of eitherChinese somatization or Western psychologization. We found,based on both the latent and measured group comparisons, thatChinese-heritage students scored higher on the cognitive factorthan did European-heritage students. Additional group compari-sons revealed that this result was largely due to Chinese-heritagewomen scoring higher than European-heritage women on thecognitive factor, whereas no difference was found between menacross cultural group. This result echoes the earlier discussionregarding a lack of consistent support for the idea of Chinesesomatization and discrepancies between clinical and nonclinicalsamples (e.g., Yen et al., 2000). This finding also fits with previousstudies that have found a greater emphasis on cognitive symptomsamong Chinese-heritage adolescents and young adults under cer-tain circumstances (Chang, 2007; Stewart et al., 2002).

The cultural group difference on the cognitive factor of theBDI–II was the only result that was consistent across the latent andobserved group comparisons. Discrepancies between the analysesusing latent versus observed variables highlight the importance ofcontrolling for measurement error, as failing to do so can poten-tially lead to spurious results, artifacts of measurement error, andgroup differences that do not truly exist. A recommendation thatemerges from these findings, therefore, is for researchers conduct-

ing group comparisons (cultural and otherwise) to make use of theMG-CFA framework when possible. When sample size constraintsdo not allow for this approach, researchers are urged to be wary ofgroup differences with small effect sizes, which may be artifacts ofmeasurement error.

Given the lack of additional variables in the current study thatmight help to explain our group comparison findings, any pro-posed explanation remains speculative. However, one issue thatmay be important to consider is the academic context in which thedata were collected; all participants were first-year undergraduatesattending a prestigious Canadian university with a highly compet-itive admission process. Other authors have previously highlightedthe important value that tends to be placed on academic success intraditional East Asian cultural contexts and the potential link todepressive symptomatology (Stewart et al., 2002; Young et al.,2010). It is possible that cognitive symptoms (e.g., past failures)would be especially salient among students for whom academicsuccess is particularly important, who may also find any first-yearacademic challenges to be particularly distressing. This line ofinquiry may be a fruitful avenue for future research.

As our results add to the pattern of discrepant findings regardingChinese somatization and Western psychologization between clin-ical and nonclinical samples, it is worth briefly reflecting onpotential explanations for this discrepancy. A number of authors

Table 5Tests for Latent Mean Differences by Cultural Group, Gender Within Cultural Group, andGender Across Cultural Group (Women, N � 635, and Men, N � 298, for Chinese-Heritageand European-Heritage Groups)

Model andsubscales S-B�2 CFI

RMSEA,90% CI SRMR

Differenceestimate z

Chinese heritage, European heritage; full sample (reference group � European heritage)

General depression 783.87 .932 .032 [.028, .035] .046 �.033 �1.85Cognitive subscale �.034 �3.04�

Somatic subscale �.001 �0.05

Chinese heritage; men, women (reference group � men)

General depression 599.27 .916 .034 [.029, .040] .059 .022 0.72Cognitive subscale �.009 �0.44Somatic subscale .000 0.02

European heritage; men, women (reference group � men)

General depression 609.13 .918 .033 [.028, .039] .054 .025 1.09Cognitive subscale �.000 �0.17Somatic subscale .072 2.78�

Chinese heritage, European heritage; men (reference group � Chinese heritage)

General depression 550.01 .903 .036 [.028, .043] .068 �.047 �1.40Cognitive subscale �.013 �1.10Somatic subscale �.045 �1.61

Chinese heritage, European heritage; women (reference group � European heritage)

General depression 660.78 .932 .032 [.027, .036] .048 �.033 �1.54Cognitive subscale �.039 �2.51�

Somatic subscale .022 1.26

Note. S-B�2 � Satorra–Bentler adjusted chi-square test statistic; df � degrees of freedom; CFI � comparativefit index; RMSEA � root-mean-square error of approximation; 90% CI � 90% confidence interval for RMSEA;SRMR � standardized root-mean-square residual; reference group � group with latent mean constrained to bezero, where the difference estimate is in relation to the reference group and the z value represents the strengthof this difference estimate.� p � .05.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

11CROSS-CULTURAL EXAMINATION OF THE BDI–II

have pointed toward help seeking and the patient role as importantexplanatory factors contributing to cultural differences in depres-sive symptom reporting among Chinese and Western groups (e.g.,Chang, 2007; Ryder et al., 2008; Yen et al., 2000). Others havealso suggested that cultural variations in symptom reporting reflectculturally shaped differences in the types of symptoms that areexperienced as most salient and troubling during times of profounddistress (Ryder & Chentsova-Dutton, 2012). The current data werecollected in a classroom setting from students who were unlikelyto be experiencing severe levels of depressive symptoms, suggest-ing that neither help seeking nor experiences of profound distresswould be particularly applicable factors in their pattern of symp-tom reporting. Further work is required to understand more clearlyand precisely those variables that play a role in cultural variationsin depressive symptom reporting across both clinical and nonclini-cal samples.

Future work should also explore the potential role of accultur-ation and related processes in helping to understand the culturalshaping of depressive symptom reporting. Acculturation is a par-ticularly relevant construct when examining samples that includedifferent generations of immigrants and migrants, such as interna-tional students. Other processes relating to cultural change, such asmodernization, would also be an important area of future study. Inparticular, recent research on depression among Chinese-heritagesamples highlights the need to examine historical forces in betterunderstanding apparent shifts in the prevalence and symptompresentation of depression in Chinese cultural contexts in recentdecades (see, e.g., Ryder, Sun, Zhu, Yao, & Chentsova-Dutton,2012). The influence of Western psychiatric models of depression,along with related norms concerning emotional expression andpsychologization, has been discussed as an influential factor inmainland China (e.g., Lee & Kleinman, 2007); such forces arelikely to be all the more influential for young adults of Chineseheritage who are born, raised, and/or studying in North America.

Indeed, such factors may help to explain the lack of evidence forChinese somatization in the current results.

As suggested by the proposed areas of future research, animportant limitation of the current study is the lack of moredetailed sociodemographic information about our participants, aswell as a lack of additional variables that could serve to helpexplain our findings more fully. Although the use of an archivaldatabase allowed for the large samples sizes required for ourstatistical analyses, it limited our ability to gather such variables. Itmight also be argued that the comparison of a Chinese-heritagesample of university undergraduates in North America withEuropean-heritage counterparts is not a particularly compellingcross-cultural comparison. However, previous research has dem-onstrated significant differences between Asian Americans andEuropean Americans on a variety of fundamental emotion-relatedprocesses, including ones specific to depression (e.g., Chentsova-Dutton et al., 2007), suggesting that important differences canemerge from such comparisons. Furthermore, the study of NorthAmerican undergraduates of diverse cultural backgrounds is cru-cial to ongoing efforts to improve the ability to assess and treatmental health concerns on university campuses.

References

Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.doi:10.1007/BF02294359

Al-Turkait, F. A., & Ohaeri, J. U. (2010). Dimensional and hierarchicalmodels of depression using the Beck Depression Inventory-II in an Arabcollege student sample. BMC Psychiatry, 10, Article 60. doi:10.1186/1471-244X-10.60

American Psychiatric Association. (1994). Diagnostic and statistical man-ual of mental disorders (4th ed.). Washington, DC: Author.

Bayram, N., & Bilgel, N. (2008). The prevalence and socio-demographiccorrelations of depression, anxiety and stress among a group of univer-

Table 6Descriptive Statistics for the BDI–II, Ward’s (2006) Bifactor Model, by Total Sampleand Gender

MeasureCH

(n � 933)EH

(n � 933)CH men

(n � 298)CH women(n � 635)

EH men(n � 298)

EH women(n � 635)

BDI–II total (21 items)M 8.93a 8.04a 8.68 9.04 7.53 8.29SD 8.13a 6.83a 8.59d 7.91e 6.59d 6.91e

Cronbach’s alpha .91 .89 .93 .91 .88 .89MIC .56 .50 .59 .54 .48 .51

Cognitive (8 items)M 2.97A 2.45A 2.92 2.99E 2.37 2.49E

SD 3.62a 2.87a 3.87d 3.51e 2.78d 2.91e

Cronbach’s alpha .86 .80 .88 .85 .77 .81MIC .61 .52 .66 .58 .48 .54

Somatic (5 items)M 3.27 3.19 3.21 3.30e 2.93C 3.32C,e

SD 2.36 2.29 2.32 2.38 2.25 2.30Cronbach’s alpha .73 .73 .72 .74 .74 .72MIC .50 .49 .48 .51 .51 .48

Note. A superscript lowercase letter indicates significant observed mean difference (based on t-test statistic) orviolation of homogeneity of variance (based on Levene’s statistic); a superscript bold and capital letter indicatessignificant latent and observed mean or variance difference. BDI–II � Beck Depression Inventory–II; CH �Chinese heritage; EH � European heritage; MIC � mean interitem correlation. Comparison groups: a CH/EH;b CH men/women; c EH men/women; d CH/EH men; e CH/EH women.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

12 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS

sity students. Social Psychiatry and Psychiatric Epidemiology, 43, 667–672. doi:10.1007/s00127-008-0345-x

Beck, A. T., Steer, R. A., Ball, R., & Ranieri, W. F. (1996). Comparisonof Beck Depression Inventories–IA and –II in psychiatric outpatients.Journal of Personality Assessment, 67, 588 –597. doi:10.1207/s15327752jpa6703_13

Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the BeckDepression Inventory–2. San Antonio, TX: Psychological Corporation.

Bentler, P. M. (1990). Comparative fit indices in structural models. Psy-chological Bulletin, 107, 238–246. doi:10.1037/0033-2909.107.2.238

Bentler, P. M. (2005). ESQ 6.1 structural equations program manual. LosAngeles, CA: Multivariate Software.

Brouwer, D., Meijer, R. R., & Zevalkink, J. (2013). On the factor structureof the Beck Depression Inventory–II: G is the key. Psychological As-sessment, 25, 136–145. doi:10.1037/a0029228

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing modelfit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equationmodels (pp. 136–162). Newbury Park, CA: Sage.

Byrne, B. M. (2006). Structural equation modeling with EQS (2nd ed.).New York, NY: Psychology Press.

Byrne, B. M., Stewart, S. M., Kennard, B. D., & Lee, P. W. H. (2007). TheBeck Depression Inventory-II: Testing for measurement equivalence andfactor mean difference across Hong Kong and American adolescents.International Journal of Testing, 7, 293–309. doi:10.1080/15305050701438058

Chang, H. (2007). Depressive symptom manifestation and help-seekingamong Chinese college students in Taiwan. International Journal ofPsychology, 42, 200–206. doi:10.1080/00207590600878665

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack ofmeasurement invariance. Structural Equation Modeling, 14, 464–504.doi:10.1080/10705510701301834

Chen, F. F. (2008). What happens if we compare chopsticks with forks?The impact of making inappropriate comparisons in cross-cultural re-search. Journal of Personality and Social Psychology, 95, 1005–1018.doi:10.1037/a0013193

Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J.-P., & Zhang, Z.(2012). Modeling general and specific variance in multifaceted con-structs: A comparison of the bifactor model to other approaches. Journalof Personality, 80, 219–251. doi:10.1111/j.1467-6494.2011.00739.x

Chentsova-Dutton, Y. E., Chu, J. P., Tsai, J. L., Rottenberg, J., Gross, J.,& Gotlib, I. H. (2007). Depression and emotional reactivity: Variationamong Asian Americans and European Americans. Journal of AbnormalPsychology, 116, 776–785. doi:10.1037/0021-843X.116.4.776

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fitindexes for testing measurement invariance. Structural Equation Mod-eling, 9, 233–255. doi:10.1207/S15328007SEM0902_5

Cuijpers, P., Beekman, A. T. F., & Reynolds, C. F., III. (2012). Preventingdepression: A global priority. JAMA: Journal of the American MedicalAssociation, 307, 1033–1034. doi:10.1001/jama.2012.271

Dere, J., Sun, J., Zhao, Y., Persson, T. J., Zhu, X., Yao, S., . . . Ryder, A. G.(2013). Beyond “somatization” and “psychologization”: Symptom-levelvariation in depressed Han Chinese and Euro-Canadian outpatients.Frontiers in Psychology, 4, Article 377. doi:10.3389/fpsyg.2013.00377

Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometricevaluation of the Beck Depression Inventory–II. Psychological Assess-ment, 10, 83–89. doi:10.1037/1040-3590.10.2.83

Eisenberg, D., Gollust, S. E., Golberstein, E., & Hefner, J. L. (2007).Prevalence and correlates of depression, anxiety, and suicidality amonguniversity students. American Journal of Orthopsychiatry, 77, 534–542.doi:10.1037/0002-9432.77.4.534

Hancock, G. R. (2001). Effect size, power, and sample size determinationfor structured means modeling and MIMIC approaches to between-groups hypothesis testing of means on a single latent construct. Psy-chometrika, 66, 373–388. doi:10.1007/BF02294440

Horn, L., Peter, K., Rooney, K., & Malizio, A. G. (2002). Profile ofundergraduates in U.S. postsecondary institutions: 1999–2000. Wash-ington, DC: U. S. Department of Education, Office of EducationalResearch and Improvement, National Center for Education Statistics.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes incovariance structure analysis: Conventional criteria versus new alterna-tives. Structural Equation Modeling, 6, 1–55. doi:10.1080/10705519909540118

Ibrahim, A. K., Kelly, S. J., Adams, C. E., & Glazebrook, C. (2013). Asystematic review of studies of depression prevalence in universitystudents. Journal of Psychiatric Research, 47, 391–400. doi:10.1016/j.jpsychires.2012.11.015

Kessler, R. C., Berglund, P., Demler, O., Jin, R., Koretz, D., Merikangas,K. R., . . . Wang, P. S. (2003). The epidemiology of major depressivedisorder: Results from the National Comorbidity Survey Replication(NCS-R). JAMA: Journal of the American Medical Association, 289,3095–3105. doi:10.1001/jama.289.23.3095

Kirmayer, L. J. (2001). Cultural variations in the clinical presentation ofdepression and anxiety: Implications for diagnosis and treatment. Jour-nal of Clinical Psychiatry, 62, 22–28.

Kleinman, A. (1982). Neurasthenia and depression: A study of somatiza-tion and culture in China. Culture, Medicine and Psychiatry, 6, 117–190.

Kline, R. B. (2011). Principles and practice of structural equation mod-eling (3rd ed.). New York, NY: Guilford Press.

Lam, C. Y., Pepper, C. M., & Ryabchenko, K. A. (2004). Case identifi-cation of mood disorders in Asian American and Caucasian Americancollege students. Psychiatric Quarterly, 75, 361–373. doi:10.1023/B:PSAQ.0000043511.13623.1b

Lee, S., & Kleinman, A. (2007). Are somatoform disorders changing overtime? The case of neurasthenia in China. Psychosomatic Medicine, 69,846–849. doi:10.1097/PSY.0b013e31815b0092

Little, T. D. (1997). Mean and covariance structures (MACS) analyses ofcross-cultural data: Practical and theoretical issues. Multivariate Behav-ioral Research, 32, 53–76. doi:10.1207/s15327906mbr3201_3

Mathers, C. D., & Loncar, D. (2006). Projections of global mortality andburden of disease from 2002 to 2030. PLoS Medicine, 3(11), e442.doi:10.1371/journal.pmed.0030442

Meredith, W. (1993). Measurement invariance, factor analysis and factorialinvariance. Psychometrika, 58, 525–543. doi:10.1007/BF02294825

Milfont, T. L., & Fischer, R. (2010). Testing measurement invarianceacross groups: Applications in cross-cultural research. InternationalJournal of Psychological Research, 3, 111–130.

Osman, A., Barrios, F. X., Gutierrez, P. M., Williams, J. E., & Bailey, J.(2008). Psychometric properties of the Beck Depression Inventory–II innonclinical adolescent samples. Journal of Clinical Psychology, 64,83–102. doi:10.1002/jclp.20433

Parker, G., Chan, B., Tully, L., & Eisenbruch, M. (2005). Depression in theChinese: The impact of acculturation. Psychological Medicine, 35,1475–1483. doi:10.1017/S0033291705005623

Parker, G., Cheah, Y.-C., & Roy, K. (2001). Do the Chinese somatizedepression? A cross-cultural study. Social Psychiatry and PsychiatricEpidemiology, 36, 287–293. doi:10.1007/s001270170046

Patten, S. B., Wang, J. L., Williams, J. V. A., Currie, S., Beck, C. A.,Maxwell, C. J., & el-Guebaly, N. (2006). Descriptive epidemiology ofmajor depression in Canada. Canadian Journal of Psychiatry/Revuecanadienne de psychiatrie, 51, 84–90.

Quilty, L. C., Zhang, K. A., & Bagby, R. M. (2010). The latent symptomstructure of the Beck Depression Inventory–II in outpatients with majordepression. Psychological Assessment, 22, 603–608. doi:10.1037/a0019698

Radloff, L. S. (1977). The CES-D scale: A self-report depression scale forresearch in the general population. Applied Psychological Measurement,1, 385–401. doi:10.1177/014662167700100306

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

13CROSS-CULTURAL EXAMINATION OF THE BDI–II

Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring andmodeling psychological measures in the presence of multidimensional-ity. Journal of Personality Assessment, 95, 129–140. doi:10.1080/00223891.2012.725437

Ryder, A. G., & Chentsova-Dutton, Y. E. (2012). Depression in culturalcontext: “Chinese somatization,” revisited. Psychiatric Clinics of NorthAmerica, 35, 15–36. doi:10.1016/j.psc.2011.11.006

Ryder, A. G., Sun, J., Zhu, X., Yao, S., & Chentsova-Dutton, Y. E. (2012).Depression in China: Integrating developmental psychopathology andcultural-clinical psychology. Journal of Clinical Child and AdolescentPsychology, 41, 682–694. doi:10.1080/15374416.2012.710163

Ryder, A. G., Yang, J., Zhu, X., Yao, S., Yi, J., Heine, S. J., & Bagby,R. M. (2008). The cultural shaping of depression: Somatic symptoms inChina, psychological symptoms in North America? Journal of AbnormalPsychology, 117, 300–313. doi:10.1037/0021-843X.117.2.300

Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics andstandard errors in covariance structure analysis. In A. von Eye & C. C.Clogg (Eds.), Latent variables analysis: Applications for developmentalresearch (pp. 399–419). Thousand Oaks, CA: Sage.

Simms, L. J., Grös, D. F., Watson, D., & O’Hara, M. W. (2008). Parsingthe general and specific components of depression and anxiety withbifactor modeling. Depression and Anxiety, 25, E34–E46. doi:10.1002/da.20432

Stewart, S. M., Lewinsohn, P. M., Lee, P. W., Ho, L. M., Kennard, B.,Hughes, C. W., & Emslie, G. J. (2002). Symptom patterns in depressionand “subthreshold” depression among adolescents in Hong Kong and theUnited States. Journal of Cross-Cultural Psychology, 33, 559–576.doi:10.1177/022022102238269

Storch, E. A., Roberti, J. W., & Roth, D. A. (2004). Factor structure,concurrent validity, and internal consistency of the Beck DepressionInventory–Second Edition in a sample of college students. Depressionand Anxiety, 19, 187–189. doi:10.1002/da.20002

Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (5thed.). Boston, MA: Allyn & Bacon.

Ustün, T. B., Ayuso-Mateos, J. L., Chatterji, S., Mathers, C., & Murray,C. J. L. (2004). Global burden of depressive disorders in the year 2000.British Journal of Psychiatry, 184, 386–392. doi:10.1192/bjp.184.5.386

van de Vijver, F. J. R., & Leung, K. (2000). Methodological issues inpsychological research on culture. Journal of Cross-Cultural Psychol-ogy, 31, 33–51. doi:10.1177/0022022100031001004

Vanheule, S., Desmet, M., Groenvynck, H., Rosseel, Y., & Fontaine, J.(2008). The factor structure of the Beck Depression Inventory-II. As-sessment, 15, 177–187. doi:10.1177/1073191107311261

Ward, L. C. (2006). Comparison of factor structure models for the BeckDepression Inventory–II. Psychological Assessment, 18, 81–88. doi:10.1037/1040-3590.18.1.81

Watters, C. A., Keefer, K. V., Kloosterman, P. H., Summerfeldt, L. J., &Parker, J. D. A. (2013). Examining the structure of the Internet Addic-tion Test in adolescents: A bifactor approach. Computers in HumanBehavior, 29, 2294–2302. doi:10.1016/j.chb.2013.05.020

Whisman, M. A., Juss, C. M., Whiteford, N. T., & Gelhorn, H. L. (2013).Measurement invariance of the Beck Depression Inventory–Second Edi-tion (BDI-II) across gender, race, and ethnicity in college students.Assessment, 20, 419–428. doi:10.1177/1073191112460273

Whisman, M. A., Perez, J. E., & Ramel, W. (2000). Factor structure of theBeck Depression Inventory—Second Edition (BDI-II) in a student sam-ple. Journal of Clinical Psychology, 56, 545–551. doi:10.1002/(SICI)1097-4679(200004)56:4�545::AID-JCLP7�3.0.CO;2-U

Wiebe, J. S., & Penley, J. A. (2005). A psychometric comparison of theBeck Depression Inventory–II in English and Spanish. PsychologicalAssessment, 17, 481–485. doi:10.1037/1040-3590.17.4.481

Wu, P. (2010). Measurement invariance and latent mean differences of theBeck Depression Inventory II across gender groups. Journal of Psychoe-ducational Assessment, 28, 551–563. doi:10.1177/0734282909360772

Wu, P.-C., & Huang, T.-W. (2014). Gender-related invariance of the BeckDepression Inventory II for Taiwanese adolescent samples. Assessment,21, 218–226. doi:10.1177/1073191112441243

Yen, S., Robins, C. J., & Lin, N. (2000). A cross-cultural comparison ofdepressive symptom manifestation: China and the United States. Journalof Consulting and Clinical Psychology, 68, 993–999. doi:10.1037/0022-006X.68.6.993

Young, C. B., Fang, D. Z., & Zisook, S. (2010). Depression in Asian-American and Caucasian undergraduate students. Journal of AffectiveDisorders, 125, 379–382. doi:10.1016/j.jad.2010.02.124

Zhou, X., Dere, J., Zhu, X., Yao, S., Chentsova-Dutton, Y. E., & Ryder,A. G. (2011). Anxiety symptom presentations in Han Chinese andEuro-Canadian outpatients: Is distress always somatized in China? Jour-nal of Affective Disorders, 135, 111–114. doi:10.1016/j.jad.2011.06.049

Received October 9, 2013Revision received July 23, 2014

Accepted August 1, 2014 �

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

14 DERE, WATTERS, YU, BAGBY, RYDER, AND HARKNESS