The validity of the Devereux Early Childhood Assessment for culturally and linguistically diverse...

15
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights

Transcript of The validity of the Devereux Early Childhood Assessment for culturally and linguistically diverse...

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/authorsrights

Author's personal copy

Early Childhood Research Quarterly 28 (2013) 794– 807

Contents lists available at ScienceDirect

Early Childhood Research Quarterly

The validity of the Devereux Early Childhood Assessment forculturally and linguistically diverse Head Start children

Rebecca J. Bulotsky-Shearer ∗, Veronica A. Fernandez, Stefano RainelliUniversity of Miami, Department of Psychology, FL, United States

a r t i c l e i n f o

Article history:Received 24 September 2012Received in revised form 9 July 2013Accepted 20 July 2013

Keywords:Social-emotional assessmentConstruct validityPreschool

a b s t r a c t

The Devereux Early Childhood Assessment (DECA) is a social-emotional assessment widely used by earlychildhood educational programs to inform early identification and intervention efforts. However, its con-struct validity is not well-established in independent samples of children from low-income backgrounds.We examined the construct validity of the teacher report of the DECA using a series of confirmatory factoranalyses, exploratory factor analyses, and the Rasch partial credit model in a large sample of culturally andlinguistically diverse Head Start children (N = 5,197). Findings provided some evidence for consistency inthe factor structure of the three Protective Factors subscales (Initiative, Self-Control, and Attachment);however, the factor structure of the Behavioral Concerns subscale was not replicated in our sample anddemonstrated poor fit to these data. Findings suggested that the 10 items of the published BehavioralConcerns subscale did not comprise a unidimensional construct, but rather, were better represented bytwo factors (externalizing and internalizing behavior). The use of the total Behavioral Concerns score asa screening tool to identify emotional and behavioral problems in diverse samples of preschool childrenfrom low-income backgrounds was not supported, especially for internalizing behavior. Implications forthe consequential validity of the DECA for use as a screening tool in early childhood programs servingdiverse populations of children and directions for future research are discussed.

© 2013 Elsevier Inc. All rights reserved.

There has been much national attention paid to promotingthe social-emotional development of young children. Researchsuggests that social-emotional skills are essential to early schoolengagement and learning (Denham, 2006; Raver, 2002; Thompson& Raikes, 2007). In addition, social, emotional, and regulatory prob-lems place children at risk for concurrent and long-term socialand academic difficulties (Huffman, Mehlinger, & Kerivan, 2000;Raver, 2002). Unfortunately, epidemiological studies suggest that8–22% of preschool children exhibit moderate to clinically signif-icant emotional and behavioral problems (Brauner & Stephens,2006; Campbell, 1995; Lavigne et al., 1996) and that this rate is ele-vated for children living in low-income households (Barbarin, 2007;Feil et al., 2005; National Center for Children in Poverty [NCCP],2011).

Early childhood programs serving children from low-incomebackgrounds, such as Head Start, have the strategic opportunityto respond to the social-emotional needs of low-income children(U.S. Department of Health and Human Services, Improving HeadStart for School Readiness Act, 2007). Logically, however, equitable

∗ Corresponding author.E-mail address: [email protected] (R.J. Bulotsky-Shearer).

and timely early identification efforts are contingent upon theavailability of comprehensive, reliable and valid assessment toolsof children’s social-emotional strengths and needs (Lopez, Tarullo,Forness, & Boyce, 2000; Yoshikawa & Zigler, 2000). Teachers arekey informants and front line workers with summative knowledgeabout children’s behavior observed within the classroom environ-ment (McDermott, 1993). Typically teacher rating scales are themost efficient mechanisms within early childhood programs forscreening the greatest number of children in need of intervention(Caselman & Self, 2008; McDermott, 1993). While these tools arecost-effective, our nation’s teacher workforce and population ofchildren from low-income families are culturally and linguisticallydiverse. Early childhood programs face the challenge of selectingsocial-emotional measures that can be completed reliably bytrained staff and that have established psychometric proper-ties for diverse preschool populations (Barbarin, 2007; Snow &Van Hemel, 2008). When choosing social-emotional assessmenttools to inform early identification efforts, best practices in earlychildhood assessment underscore the need to establish constructvalidity, or the ability of the tests’ content to measure the intendedunderlying construct, when used with diverse groups (AmericanEducational Research Association [AERA], American PsychologicalAssociation [APA], & National Council on Measurement inEducation [NCME], 1999; Messick, 1989; Salvia & Ysseldyke,1991).

0885-2006/$ – see front matter © 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.ecresq.2013.07.009

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 795

Unfortunately, a series of recent studies document the difficultyof establishing the construct validity of social-emotional assess-ments in diverse populations for which the measures were notoriginally developed (LeBoeuf, Fantuzzo, & Lopez, 2010). Given thatsocial-emotional tools are likely to be “contextually, ecologically,and culturally dependent” (Snow & Van Hemel, 2008, p. 245) it maybe especially inappropriate to use the norms and factor structureof such measures “for evaluating that child’s current performanceor for predicting future performances” if their psychometric prop-erties have not been established in culturally, linguistically, andsocio-economically diverse populations (Salvia & Ysseldyke, 1991,p. 18).

The Devereux Early Childhood Assessment (DECA; LeBuffe &Naglieri, 1999) is a published social-emotional assessment widelyused by early childhood programs serving children from low-income backgrounds to inform early identification and interventionefforts. The DECA was developed to assess a comprehensive set ofresilient social-emotional protective factors and behavioral con-cerns. The DECA is one of the most commonly used social-emotionalscreening tools nationally within programs such as Head Start tomeet the assessment requirements of the federal Program Perfor-mance Standards (U.S. Department of Health and Human Services,2007). However, to date, few empirical studies have examined thepsychometric properties of the measure when used with cultur-ally and linguistically diverse, low-income populations. Therefore,the purpose of the present study was to examine the constructvalidity of the DECA for an entire cohort of culturally and linguis-tically diverse Head Start children. Below, we summarize previousresearch on best practices in the development and use of social-emotional assessments with diverse early childhood populationsand review extant studies with the DECA as they support the empir-ical need for the present study.

1. Establishing construct validity for measures used indiverse populations

Construct validity is the relationship between the contentof a test, or scale, and the underlying construct it is intendedto measure (American Educational Research Association [AERA],American Psychological Association [APA], & National Council onMeasurement in Education [NCME], 1999). The degree to which atest’s content aligns with its intended construct helps us evaluatethe “adequacy and appropriateness of inferences and actions basedon [the] test scores. . .” (Messick, 1989, p. 13). If construct validityis not established for particular groups of children, then the infer-ences and actions based on the scores for the group may not bevalid. For example, using a measure that has not been validatedwith culturally diverse children from low-income backgroundsmay result in the under- or over-identification of certain groups ofchildren that may have far reaching consequences for one or morealready vulnerable populations (Barbarin, 2007).

Recent studies have called into question the use of the scoresderived from several commonly used early childhood meas-ures that were not originally developed or standardized for usewith diverse children from low-income backgrounds (Fantuzzo,Hightower, Grim, & Montes, 2002; Fantuzzo, Manz, & McDermott,1998; Knight & Hill, 1998; Konold, Hamre, & Pianta 2003; LeBoeufet al., 2010; Raver, Gershoff, & Aber, 2007). For example, a series ofstudies examining the dimensionality of the Child Behavior Check-list for preschool-age children (CBCL; Achenbach & Rescorla, 2000)with low-income African-American and Latino parents revealinconsistent factor structures and poor fit of the problem behaviorscales across race and socioeconomic status, especially for inter-nalizing problems (Gross et al., 2006; Konold et al., 2003; LeBoeufet al., 2010). In a study examining the factor structure of the CBCL

across different ethnic and socioeconomic groups, Konold et al.,2003 were unable to replicate the factor structure across subgroupsof minority children from low-income backgrounds. LeBoeuf et al.(2010) also were not able to resolve an interpretable factor struc-ture when examining the CBCL in a national sample of childrenparticipating in the Comprehensive Child Development Program(Pierre, Layzer, Goodson, & Bernstein, 1997). As a result, the authorssuggest the cautious use of the CBCL with culturally diverse chil-dren from low-income backgrounds. Clearly, failing to establishadequate construct validity when a measure is used with diversegroups of children has implications for the consequential validityof such measures to guide accurate and timely early identificationefforts within early childhood programs.

2. The Devereux Early Childhood Assessment

The Devereux Early Childhood Assessment (DECA; LeBuffe &Naglieri, 1999) is a standardized rating scale of children’s social-emotional adjustment developed for use by both teachers andparents of children two to six years of age. The DECA consistsof 37 items; 27 of the items assess resilient behaviors and com-prise the Protective Factors subscales. Ten items assess problematicbehaviors and they comprise the Behavioral Concerns subscale. Ini-tial construct validity studies of the teacher rating form with anational sample of 2,000 children involved a series of exploratoryfactor analyses. This set of analyses examined the underlying factorstructure of the 27 positive items separately from the BehavioralConcerns items. The most parsimonious factor solution resultedin three Protective Factors subscales: Initiative, Self-Control, andAttachment. In a separate national sample of 1,108 children, theten problem behavior items were selected by an expert panel froman initial set of 77 problem behaviors based on their “psychometricproperties and their representation of a wide range of challengingbehaviors” (LeBuffe & Naglieri, 1999, p. 12). The ten items, however,were not subjected to exploratory factor analyses in the develop-ment of the measure.

Norm-referenced scores for the Protective Factors and Behav-ioral Concerns subscales were derived from these two separatestandardization samples. Children in the two samples wereincluded from 28 states and approximated the preschool popu-lation (based on the 1996 national census) in age, gender, race,ethnicity, and socioeconomic status across the four geographicregions of the U.S. (LeBuffe & Naglieri, 1999). The first normativesample used to develop the Protective Factors subscales included69.4% White, 17.2% Black, and 10.7% children of Hispanic descent.Only 24.6% of this sample received subsidized day care or publicassistance. In the second normative sample used to develop thenorms for the Behavioral Concerns subscale, 73.3% of children wereWhite, 15.7% were Black, and 9.2% were of Hispanic ethnicity. Ofthis sample, 25% were receiving subsidized day care or public assis-tance. Internal consistency reliabilities of teacher ratings for all foursubscales were high. In the first normative sample, internal consis-tencies for Initiative, Self-Control, and Attachment were .90, .90,and .85, respectively. In the second sample, internal consistencyfor Behavioral Concerns was .80.

Since the development of the DECA published scales, few stud-ies have examined the psychometric properties of the DECA withinsamples of diverse children from low-income backgrounds. How-ever, there have been recent studies examining the reliability andcriterion-related validity of the DECA with culturally and linguis-tically diverse samples. Lien and Carlson (2009) found adequateinternal consistency of the four DECA subscales when completedby teachers and Crane, Mincic, and Winsler (2011) found highinter-rater agreement between parent and teacher raters. In addi-tion, in Head Start samples, scores on the three Protective Factors

Author's personal copy

796 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

subscales were positively associated with direct assessments ofmathematic skills (Dobbs, Doctoroff, Fisher, & Arnold, 2006) andteacher-reported language and literacy skills (Maier, Vitiello, &Greenfield, 2012). Scores on the Behavioral Concerns subscale werenegatively associated with direct assessments of mathematics skills(Dobbs et al., 2006), and teacher-reported mathematics, literacy,and approaches to learning (Escalon & Greenfield, 2009). In a seriesof longitudinal studies conducted by Winsler (2012), preschoolteacher ratings on the Behavioral Concerns subscale were asso-ciated with lower standardized reading scores and overall gradepoint average in third grade.

These studies provide some evidence for the utility of the DECApublished subscales in samples of preschool children from low-income backgrounds; however, to date few studies have examinedwhether the published factor structure of the DECA is replicated inindependent preschool samples, and only one study, has examinedwhether the underlying factor structure is appropriate in a low-income preschool sample. Jaberg, Dixon, and Weis (2009) examinedthe factor structure of the DECA (as reported by both parents andteachers) in a sample of 780 predominantly white (88%) 5-year-oldkindergarten children from the Midwest. Internal consistencies (forteacher ratings) ranged from .83 to .92 and were similar to thosevalues reported in the published manual. Following the proceduresof the publishers, authors conducted a series of exploratory factoranalyses using only the 27 positive items comprising the Protec-tive Factors subscales (the 10 Behavioral Concerns subscale itemswere not included). The resulting three-factor model was compa-rable to the three published scales. However, the EFA did not resultin a parsimonious solution or one consistent with the publishedstructure. In the initial varimax solution, 10 of the 27 items loadedsaliently on more than one factor. In the final oblique solution, fouritems loaded saliently on more than one factor (items 6, 20, 24, and32) and four items loaded on a different factor from the publishedscales (items 6, 7, 28 and 32). Percent agreement between the itemsthat loaded on the published factor structure and on the new factorstructure was 90% for Initiative, 88.88% for Self-Control, and 84.21%for Attachment subscales. Table 1 presents a comparison betweenthe published and derived structure in the Jaberg et al. (2009) study,as well as in the Lien and Carlson (2009) study described below.

To date only one study has examined the factor structure of theDECA in a sample of preschool children from low-income back-grounds (Lien & Carlson, 2009). However, this study examined thefactor structure of the parent report only of the DECA in a sample of1,208 Head Start children. Child demographic information was notavailable for their sample; however, authors reported that the raceof enrolled children in the previous year of the Head Start programwas 43% White, 28.8% African American, and 10% Hispanic (Lien &Carlson, 2009). Authors note that child demographics in their sam-ple underrepresented ethnic minorities as compared to nationalHead Start enrollment for that year (U.S. Department of Healthand Human Services, 2004). Exploratory factor analyses with the27 positive items resulted in a three-factor structure similar to thepublished structure, however, as in the Jaberg et al. (2009) study,several items either loaded on more than one factor or on differentfactors from the published scales (items 4, 16, 22, 32, and 37). Again,while this study provides support for the consistency of the threeProtective Factors subscales in a low-income sample, only parentreport was examined, the structure of the Behavioral Concerns sub-scale was not examined, and there were several items that did notload on the Protective Factors subscales in the same way as in thepublished scales.

In summary, construct validity for the DECA has not been well-established in independent samples of culturally and linguisticallydiverse, low-income preschool children. The original standardiza-tion sample of the DECA underrepresented ethnic and linguisticminorities, and children from low-income backgrounds. Jaberg

et al. (2009) examined the factor structure in a predominantlywhite sample of kindergarten children and while Lien and Carlson(2009) examined the factor structure of the DECA with a Head Startsample, the focus was on parent-report only, not teacher report. Inaddition, these studies examined only the latent structure of the27 items comprising the Protective Factors subscales; none haveexamined the latent structure of the 10 Behavioral Concerns items.Therefore, generalizability of findings from all studies to date arelimited to inform the use of the teacher report form of the DECAwith culturally and linguistically diverse children, served withinHead Start programs.

3. Purpose

The purpose of the present study was to examine the constructvalidity of the DECA for a cohort of culturally and linguisticallydiverse children enrolled in an urban Head Start program todetermine whether the teacher-report form of the DECA was anappropriate tool to identify children’s social-emotional strengthsand needs. We had two research questions: (a) Is the publishedfactor structure of the DECA appropriate for a diverse Head Startsample? (b) Is there an alternative factor structure that is moreappropriate? We employed a series of confirmatory factor analy-ses, exploratory factor analyses, and item response theory analysesto examine these questions for our Head Start sample. We expectedthat we would have difficulty confirming the published structurefor the Protective Factors subscales and the Behavioral Concernssubscales given previous research documenting the difficulty ofconfirming the structure of published measures when they werenot developed or standardized for use with low-income popula-tions (Fantuzzo et al., 1998b; LeBoeuf et al., 2010). In addition,we expected that an alternative factor structure might be moreappropriate for the Head Start sample.

4. Method

4.1. Participants

Children in this study were enrolled in a large, urban Head Startprogram in the Southeastern United States, during the 2008–2009academic year. Participants included 5,197 children who wereenrolled in a total of 318 classrooms across 78 centers, for whichthe DECA was completed in English by lead teachers in the fall of2008. (In the overall sample, there were 860 children for whomtheir Head Start teachers were Spanish speaking and completedthe DECA using the Spanish language form. Because the focus ofthis study was on the English form, these children were excludedfrom the analytic sample for this study).

Approximately 52% of the children were female and their agein the beginning of the school year ranged from 33 to 59 months(M = 48.1, SD = 6.9). Children were predominantly African Americanand Hispanic (60.9% and 38.3%, respectively), with 0.8% identifiedas White/Non-Hispanic, Asian, Native Islander, or other ethnicity.In the 2008–2009 academic year, 98% of children in this Head Startprogram met the federal income requirement (less than $22,050annually for a family of four) for eligibility for enrollment in HeadStart.

Approximately 13% of children in the sample were suspectedor identified as having one or more special need (e.g., speech orlanguage impairment, developmental delay, and emotional orbehavioral disorders). Given that the DECA was used programmat-ically as a screening tool for all children in the program, to ensuregeneralizability of our study findings, all children with availableDECA scores were included in our analyses, including children withdisabilities. Prior to including children with disabilities in the ana-lytic sample, a test of strict measurement invariance was conducted

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 797

Table 1Comparison of the protective factors published and derived structures in independent samples.

Pattern of Salient Item Loadings

(LeBuffe & Naglieri, 1999) Jaberg et al. (2009) Lien and Carlson (2009) Current study

Initiative1: Makes adult smile X2: Does things for himself/herself x x x x3: Does a challenging task x x x x7: Participates in make believe x x12: Keeps trying x x x x16: Different ways to solve x x x x19: Tries new activities x x x x20: Plays with children x x x x22: Asks adult to play/read X X24: Focuses attention on task x x x x28: Acts optimistic x x x32: Asks children to play x x x36: Makes decisions x x x x37: Interest in what others do X

Self-Control4: Listens to/respects others x x x x5: Controls anger x x x x6: Responds positively/upset X X13: Handles frustration well x x x x16: Different ways to solve X21: Shows patience x x x x25: Shares x x x x30: Accepts another choice x x x x33: Cooperates with others x x x x34: Calms down when upset x x x x

Attachment1: Makes adults smile x x x6: Responds positively/upset x x7: Participates in make-believe X X10: Shows affection to adults x x x x17: Happy when parent returns x x x x22: Asks adult to play/read x x x28: Acts optimistic X29: Trusts familiar adults x x x x31: Seeks help when necessary x x x x32: Asks children to play X X X37: Interest in what others do x x x x

Note. Items that are bolded load on a different factor than that of the published structure.

to ensure that the latent structure of the DECA was invariant acrosstwo groups of children: (a) children with suspected or identifiedwith a disability (n = 626) and (b) children with no suspectedor identified disability (n = 4,571). The same factor structure wasapplied to both groups, constraining equal all of the unstandardizedfactor loadings and thresholds, using a multiple group comparisonapproach (Horn & McArdle, 1992). This constrained model resultedin adequate fit, CFI = 0.913, TLI = 0.918, RMSEA = 0.04, suggestingthat the structure was comparable for both children with andwithout disabilities and providing evidence that it was appropriateto include children with disabilities in our analyses.

Demographic records from the Head Start program indicatedthat teachers in the overall program were predominately AfricanAmerican and White (49.7% and 35.8%, respectively), with 14.4%identified as being either Asian, Biracial, or unspecified. Approxi-mately 50% of all the teachers (regardless of race) were of Hispanicdescent. With regard to highest educational degree obtained,eight percent teachers reported having a graduate degree, 44%a Bachelor’s degree, 22% an Associate’s degree, and 17% a ChildDevelopment Associate credential. Nine percent of teachers wereenrolled in a Bachelor’s degree program.

4.2. Measures

4.2.1. Classroom social-emotional adjustmentThe Devereux Early Childhood Assessment (DECA; LeBuffe &

Naglieri, 1999) is a 37-item standardized, norm-referenced teacher

and parent rating scale. In the current study, the DECA was com-pleted by Head Start lead teachers on all enrolled children withinthe first 45 days of school. Teachers are asked to use a five-pointLikert scale (0 = never, 1 = rarely, 2 = occasionally, 3 = frequently,and 4 = very frequently) to rate how often they have observed eachchild display a behavior over the previous four week period. TheDECA is comprised of four subscales, including three ProtectiveFactors subscales (Initiative, Self-Control, and Attachment) andone Behavioral Concerns subscale. The Protective Factors subscalesassess adaptive, prosocial, and “resilient” behaviors children dis-play in the classroom. The Initiative scale is comprised of 11 itemsassessing children’s ability to use independent thought and actionsto meet their needs. Examples of items on the Initiative scaleinclude “does things for himself/herself, chooses tasks challengingfor him/her, tries or asks new things or activities, asks otherchildren to play with him/her.” The Self-Control scale comprised ofeight items assesses children’s ability to experience a range of feel-ings and express those using words and actions. Examples include,“shows patience, handles frustration well, controls anger.” TheAttachment scale comprised of eight items assesses the quality andstrength of the relationship between the child and familiar adults.Examples include, “shows affection for familiar adults, acts happywhen parent returns, seeks help from adults when necessary.”The Behavioral Concerns subscale includes 10 items assessingproblem behaviors exhibited by preschool children. Items include,“destroys or damages property, difficulty concentrating, easilyupset, and temper tantrums.”

Author's personal copy

798 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

The published technical manual reports high internal consis-tency reliabilities for teacher ratings on the Initiative, Self-Control,Attachment, and Behavioral Concerns subscales (.90, .90, .85, and.80, respectively, LeBuffe & Naglieri, 1999). Test-retest reliabilitycoefficients (correlations among teacher ratings 1–3 days apart)are reported as .80, .64, .55, and .68 for the Initiative, Self-Control,Attachment, and Behavioral Concerns subscales, respectively.

4.3. Procedure

This study was part of a larger University-Head Start collabo-rative research project examining school readiness for Head Startchildren. The purpose of the larger project was to integrate twolarge administrative databases programmatically collected by theHead Start program. Prior to accessing these data, approval forthe project was obtained from the University Institutional ReviewBoard, the Head Start director, and the Head Start Parent PolicyCouncil. Two administrative databases were used for the presentstudy: (1) a child and family information database that includedchild demographic information (date of birth, gender, ethnicity,special needs status, center name, classroom assignment) and (2)a database including the teacher ratings on the DECA conductedwithin the first 45 days of school. Teachers were trained to completethe DECA after observing children’s behavior in their classroomover a period of four weeks. Both data were collected and storedelectronically by the program to meet the federal Head Start per-formance standards assessment requirement. Because there wasno unique identifier across the databases, Microsoft IntegratedServices was used to link children’s records using a probabilis-tic matching algorithm, using children’s first name, last name,date of birth, gender, and race/ethnicity combinations. This link-ing strategy was based on a 95% confidence match. Records thathad matches were separated into a master table for that particu-lar assignment and then joined at the end of the process. Duplicatecases were identified within the process and removed. Once thedatabases were integrated, the data were de-identified to protectparticipant confidentiality.

4.4. Data analytic approach

4.4.1. Fit of the published structure in the Head Start sampleTo answer the first research question examining whether the

published factor structure of the DECA was appropriate for thepresent sample, a series of confirmatory and exploratory factoranalyses, as well as Item Response Theory (IRT) analyses wereconducted. First, confirmatory factor analysis (CFA) in a structuralequation modeling (SEM) framework was employed in Mplus ver-sion 6.10 (Muthén & Muthén, 1998–2011) to test two separatemodels based on the published factor structure: (a) the fit of all 27items loading on the three Protective Factors subscales (11 itemson Initiative, eight on Self-Control, eight on Attachment) and (b)the fit of the 10 items loading on the Behavioral Concerns subscale.Second, a series of IRT analyses was conducted to examine moreclosely the fit of each set of individual items to the three ProtectiveFactors subscales and the Behavioral Concerns subscale based onthe published structure.

IRT analyses were employed to supplement our CFA modelsas they have the advantage of providing item-level fit statisticsand item parameter estimates that are sample independent, andthat provide additional information about particular items thatmight be responsible for an overall poor model fit (Hambleton &Jones, 1993; Raju, Laffitte, & Byrne, 2002). IRT is used to model therelationship between a child’s level of latent ability and the proba-bility of endorsing an item (Lord, 1980). A nonlinear, monotonicallyincreasing function is specified to model the relationship betweenlatent ability and probability of item response (known as item

response functions), and this function allows researchers to iden-tify how the probability of endorsing a particular item responsechanges as a function of latent ability. While a CFA framework isuseful in assessing overall model fit to a set of data, IRT providesmore information about specific items that may be the cause ofmodel misfit (Osteen, 2010).

4.4.1.1. CFA analyses. For all CFA analyses the data were specifiedas categorical and a robust weighted least squares (WLS) estima-tor was employed, WLSMV, with mean- and variance-adjusted �2

(Muthén du Toit & Spisic, 1997). Whereas DECA items were rated ona Likert-type scale (ranging from 0 to 4) that might at first appearto be continuous, inspection of the item frequency distributionsin our sample suggested that for each item, the response distribu-tions were functioning more like categorical scales. Several itemdistributions violated normality (e.g., high skewness and kurto-sis values), especially the Behavioral Concerns items. Researchershave cautioned that when categorical item-level data are treated ascontinuous in factor analyses: (a) unstable or spurious factors mayresult, and (b) the �2 statistic and standard errors of the parame-ters estimates may be biased, especially as non-normality increases(Bernstein & Teng, 1989; Chou, Bentler & Satorra, 1991; Finch, West& MacKinon 1997; Kaplan, 2000; Mislevy, 1986). Therefore, giventhese recommendations and the nature of our data, we chose to usethe WLSMV estimator in order to obtain a more reliable �2 statisticfor model fit.

Since children’s scores were nested within rater (classroomteacher) the Mplus syntax TYPE=COMPLEX was used to adjust thestandard errors of the parameters as would be done within a multi-level framework (Muthén & Muthén, 1998–2011). Approximate fitindices were used to assess the fit of the overall model to the data.Specifically, the Comparative Fit Index (CFI > .95; Bentler, 1990), anincremental fit index, and the root mean square error of approxima-tion (RMSEA < .06; Steiger, 1990), a parsimony corrected index, ofeach model were evaluated. Given the categorical nature of the dataand the robust WLS estimators that were employed, the weightedroot mean square residual (WRMR < 1; Yu & Muthén, 2002) was alsoconsidered. If the other fit indices were adequate, a model with asignificant �2 test of model fit (p < .05) was considered acceptablegiven the complexity of the model and large sample size (Bollen &Long, 1993).

4.4.1.2. IRT analyses. The WINSTEPS software was used for all cal-ibration models (Linacre, 2012) to examine item-level fit statisticsfor each of the 37 DECA items as they loaded on the four subscalesof the published structure. There are several IRT models that havebeen developed to estimate item and person parameters for a vari-ety of test and item formats (De Ayala, 2009; Hambleton & Jones,1993). In this study the Rasch Partial Credit Model (PCM; Masters,1982; Wright & Masters, 1982) was selected because it is used whenmodeling data that have more than two ordered response cate-gories (De Ayala, 2009). Given the ordered nature of the responsecategories or “anchors” of the DECA, (i.e., never, rarely, occasionally,frequently and very frequently), the PCM was most appropriate.

IRT analyses were conducted in the same way for each DECAsubscale. Each set of items was calibrated separately for eachsubscale, based on the published factor structure. For example,first the 11 items that comprised the Initiative subscale of thepublished factor structure were calibrated together. Then, the eightitems from the Self-Control subscale were calibrated together,and so on for the remaining set of items on the Attachment andBehavioral Concerns subscales. Item parameters were obtainedas well as item-level fit statistics, indicating how well each of theitems fit their respective subscale. A basic tenet of IRT is that theitems from a measure are assessing a single, underlying construct(Embretson & Reise, 2000; Lord, 1980). Item-level fit statistics

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 799

provide information regarding to what extent the items comprisingeach subscale fit this unidimensional assumption. Two item-levelfit statistics were used: weighted (infit) and unweighted (outfit)mean square error values (Bond & Fox, 2001). These two statisticsprovide information concerning whether the discrepancies foundin the item responses occur close to (i.e., infit) or far away from(i.e., outfit) the estimated parameter (De Ayala, 2009). Infit andoutfit are chi-square-like statistics that are based on the squaredstandardized residual between what is observed and what wouldbe expected on the basis of the model. Infit and outfit statisticsrange from 0 to infinity with an expectation of 1 (De Ayala, 2009).

Although general guidelines are given for appropriate infit andoutfit values, Smith, Schumacker, and Bush (1998) suggest correct-ing for Type I error rates with large sample sizes when interpretingthe fit statistics. For infit, they suggest using 1 ± 2/

√N and for out-

fit, 1 ± 6/√

N when developing appropriate ranges. For our sampleof 5,197 children infit and outfit ranges that would result in cor-rect Type I error rates were 0.97–1.02 and 0.92–1.08, respectively.Therefore, we considered items with infit and outfit values that felloutside of these ranges an indication of poor fit of these items tothe respective subscale and a possible violation of the assumption ofundimensionality (Chernyshenko, Stark, Chan, Dragow, & Williams,2001). Not only would poor infit or outfit values suggest a prob-lematic item with respect to the fit of that item to the respectivesubscale (in this case, the DECA subscales of Initiative, Self-Control,Attachment, and Behavioral Concerns), it might also suggest a prob-lem with the way in which the items of a scale represented thedimensionality of the underlying construct.

4.4.1.3. Alternate factor structure in the Head Start sample. For thesecond research question, a series of exploratory factor analyseswas conducted to examine whether there might be an alternate fac-tor structure in this sample of children. The sample was randomlysplit into two mutually exclusive subsamples [an index sample usedfor exploratory factor analyses (n = 2,598) and a reserve sampleused for confirmatory factor analyses (n = 2,599)]. Prior to subject-ing the index sample to exploratory factor analyses, MicroFACT 2.1(Waller, 2001) was used to calculate polychoric item correlations(Olsson, 1979) and to smooth the matrix for nonsingularity andpositive semidefiniteness. In the initial set of analyses, a series ofcommon factor analyses examined whether a latent structure couldbe resolved when all 37 items were included together at once. Giventhat prior research has only examined the DECA latent structureseparately for the 27 items on the Protective Factors subscales (andno research has used exploratory factor analysis to examine thelatent structure of the ten Behavioral Concerns items), the secondset of exploratory factor analyses then examined the latent struc-ture separately for the 27 items on the published Protective Factorsand 10 items on the Behavioral Concerns subscales. Retained factorswere rotated using orthogonal (varimax, equamax) rotations. Thefinal orthogonal solution was then subjected to a series of oblique(promax) rotations, at variable levels of power.

The most parsimonious factor solution was evaluated basedupon multiple criteria that: (a) satisfied the constraints of testsfor the number of factors [e.g., Cattell’s scree test (Cattell, 1966),minimum-average partialing (Velicer, 1976), and parallel analy-sis (Buja & Eyuboglu, 1992; Horn, 1965)]; (b) retained at leastfour items per factor with salient loadings, where loadings > .40are considered salient (Gorsuch, 1983); (c) yielded high internalconsistency for each factor, with alpha coefficients > .70; (d) heldsimple structure (mutually exclusive assignment of items to fac-tors with the maximum number of items retained; (e) yielded thehighest hyperplane count (Gorsuch, 1983); and (f) comported withthe empirical literature.

Once a new structure was derived via exploratory factor analy-sis, the fit of the newly derived structure in our sample was tested

Table 2Descriptive statistics for all 37 DECA items for Head Start sample.

Item M SD Skew Kurtosis

1: Makes adults smile 2.73 0.84 −0.25 −0.252: Does things for himself/herself 2.87 0.79 −0.38 −0.013: Does a challenging task 2.38 0.91 −0.18 −0.124: Listens to/respects others 2.79 0.82 −0.47 0.285: Controls anger 2.71 0.84 −0.53 0.416: Responds positively when upset 2.88 0.76 −0.55 0.647: Participates in make-believe play 2.95 0.84 −0.72 0.768: Fails to show joy 0.85 1.01 1.20 0.919: Touches others inappropriately 0.19 0.63 3.98 16.6510: Shows affection to adults 3.01 0.83 −0.87 1.2111: Temper tantrums 0.68 0.96 1.43 1.5312: Keeps trying when unsuccessful 2.27 0.83 −0.26 0.2713: Handles frustration well 2.42 0.84 −0.50 0.4114: No reaction to children/adults 0.83 0.95 1.07 0.6115: Obscene gestures 0.22 0.64 3.34 11.6116: Different ways to solve problem 2.19 0.83 −0.21 0.2517: Acts happy when parent returns 3.22 0.79 −1.10 1.7618: Destroys or damages property 0.41 0.79 2.18 4.6519: Tries new activities 2.38 0.85 −0.31 0.2620: Plays with children 2.64 0.84 −0.42 0.2921: Shows patience 2.56 0.81 −0.33 0.2422: Asks adult to play/read 2.26 0.98 −0.28 −0.2523: Short attention span 1.56 0.99 0.33 −0.2424: Focuses attention on task/activity 2.44 0.80 −0.38 0.2525: Shares 2.73 0.79 −0.56 0.7326: Fights with children 0.90 0.97 1.00 0.5927: Upset or cries easily 1.16 1.00 0.74 0.1628: Acts optimistic 2.21 1.01 −0.32 −0.2529: Trusts familiar adults 2.97 0.71 −0.47 0.6030: Accepts another choice 2.71 0.76 −0.46 0.5131: Seeks help when necessary 2.73 0.78 −0.40 0.3732: Asks children to play 2.74 0.83 −0.49 0.3533: Cooperates with others 2.80 0.75 −0.44 0.5634: Calms down when upset 2.60 0.78 −0.34 0.3535: Easily distracted 1.70 0.97 0.30 −0.1436: Makes decisions 2.67 0.81 −0.38 0.2737: Interest in what others do 2.92 0.76 −0.41 0.23

Note. N = 5,197. The range for all items was 0 to 4.

using CFA and IRT analyses. First, two separate CFAs were con-ducted in Mplus (one for the Protective Factors subscales and onefor the Behavioral Concerns subscale). The relative fit of the newlyderived factor structure was then compared to the published struc-ture. Finally, a follow up set of IRT analyses was conducted in thesame manner described above to compare the relative fit of theindividual items to their respective subscale based on the newlyderived factor structure.

5. Results

Table 2 displays the descriptive statistics for all 37 items of theDECA. For all items in the sample, each response category wasendorsed at least one time. The most frequently endorsed itemwas Item 17, “Acts happy when parent returns,” with a mean of3.22 (SD = 0.79) and the least frequently endorsed item was Item 9,“Touches inappropriately,” with a mean of 0.19 (SD = 0.63). Items 9and 15 were skewed (values of 3.98 and 3.34, respectively) and kur-totic (values of 16.65 and 11.61, respectively, as would be expectedfor problem behavior items). Skew values > 3.0 and kurtosis val-ues > 10.0 were considered non-normal (Kline, 2005) and this wasexpected given the rarity of these behavioral items as they wereendorsed on the scale.

The percent of variance in outcomes attributable to differencesbetween raters (classroom teachers) was substantial confirmingthat applying the multilevel correction in our models was themost appropriate analytic approach (Raudenbush & Bryk, 2002).Intraclass correlations for each of the subscale scores based on thepublished factor structure were calculated from the unconditional

Author's personal copy

800 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

Table 3Comparison of relative model fit indices for published and newly derived DECA factor structure.

Model �2 (df) CFI TLI RMSEA WRMR �CFI ��2 (df)

Model 1: Protective Factors (published structure) 7005.76* (321) 0.84 0.82 0.06 3.39Model 2: Protective Factors (newly derived) 3930.83* (321) 0.88 0.87 0.07 2.61 Model 1 −0.04Model 3: Behavioral Concerns (published structure) 2209.36* (35) 0.76 0.70 0.11 4.28Model 4: Behavioral Concerns (newly derived) 878.07* (34) 0.88 0.84 0.10 2.60 Model 3 −0.12 318.80 (1)

n = 2,599 for the reserve sample. Note. CFI = Comparative Fit Index; TLI = Tucker-Lewis Reliability Index; RMSEA = Root Mean Squared Error of Approximation;WRMR = Weighted Root Mean Square Residual.

* p < .001.

multilevel models in Mplus. For Initiative, Self-Control, Attach-ment, and Behavioral Concerns subscales, 30.3%, 32.0%, 42.6%,and 34.8%, respectively, of the variance in children’s scores wasattributable to differences between raters (classroom teachers).

5.1. Fit of the published structure in the Head Start sample

5.1.1. CFA findingsTable 3 lists relative fit indices for the published structure of

the Protective Factors subscales and the Behavioral Concerns sub-scale in the Head Start sample. While all of the items for Model 1(Protective Factors subscales) significantly loaded on their respec-tive subscale ( ̌ = 0.53–0.86; SE = 0.01–0.02), fit indices suggestedthat the model fit was not adequate, �2 (321) = 7005.759, p < 0.001,CFI = 0.837, TLI = 0.822, RMSEA = 0.063, WRMR = 3.393. Similarly, allitems for Model 3 (Behavioral Concerns subscale) significantlyloaded on the subscale ( ̌ = 0.41–0.73; SE = 0.01–0.03). However,fit indices suggested that the model fit was not adequate, �2

(35) = 2209.358, p < 0.001, CFI = 0.763, TLI = 0.695, RMSEA = 0.109,WRMR = 4.276.

5.1.2. IRT findingsThe first three columns of Table 4 (Online Supplemental Mate-

rials) present the infit and outfit item-level fit statistics for thepublished structure of the Protective Factors subscales under thePCM. The infit and outfit values for the Initiative subscale rangedfrom 0.84 to 1.19 and 0.85 to 1.24, respectively. Only one item, outof 11, had an infit value that fell in the sample-size corrected infitrange (0.97–1.02) and six items had outfit values that fell in thesample-size corrected outfit range (0.92–1.08). For the Attachmentsubscale, the infit and outfit values for the items ranged from 0.86 to1.20 and 0.85 to 1.24, respectively. Only one item, out of eight, hadan infit value that fell in the sample-size corrected infit range andtwo items that fell in the outfit range. For the Self-Control subscale,the infit and outfit values for the items ranged from 0.88 to 1.13and 0.87 to 1.14, respectively. Two items, out of eight, had infit val-ues that fell in the sample-size corrected infit range and four itemswhose outfit values fell in the corrected outfit range. While some ofthe item-fit values fell outside the cut-off ranges, these deviationswere minimal, suggesting that overall most items were represent-ing each of the unidimensional constructs of the three ProtectiveFactors subscales (Initiative, Self-Control, and Attachment) reason-ably well.

The first two columns of Table 5 (Online Supplemental Mate-rials) present the infit and outfit item-level fit statistics for thepublished Behavioral Concerns subscale under the PCM. In thisset of analyses using the structure of the published scale, the 10Behavioral Concerns items were calibrated together. The infitand outfit values for the Behavioral Concerns subscale rangedfrom 0.87 to 1.33 and from 0.89 to 1.41, respectively. No itemshad an infit value that fell within the sample-size corrected infitrange and only four, out of the ten items, had outfit values thatfell within the sample-size corrected outfit range. Most items on

this subscale did not exhibit good fit to the Behavioral Concernssubscale. Two items (Item 8 “fails to show joy” and Item 14 “noreaction to children/adults”) exhibited the greatest misfit of allitems on this subscale (with infit and outfit values of 1.33 and 1.41and 1.26 and 1.31, respectively). The magnitude of their deviationfrom the cut-off range, given the sample-size, raised concernsand warranted a closer examination. Figs. 1 and 2 (Online Sup-plemental Materials) graphically depict the expected item-scorecurve for Items 8 and 14, showing the relative extreme deviationof the observed item-score curve from the expected item-scorecurve. An expected item-score curve displays the expected valueof the item score for individuals at each value of the target trait(i.e., the mean value of the item score for individuals at a particularvalue of the target trait) (Linacre, 2012). Figures 1 and 2 depictthe expected score on the DECA (e.g., 0 for ‘never’, 1 for ‘rarely’,etc.) relative to the level of latent Behavioral Concerns (e.g., neg-ative values represent lower levels of Behavioral Concerns whilepositive values represent higher levels of Behavioral Concerns).The red line in each figure represents how the item-score curvefor items 8 and 14 is expected to look if the DECA data fit the PCMperfectly. The blue line represents the observed frequencies of ourDECA sample, mapped onto the expected item-score curve. Theitem-score curve for both items 8 and 14 contain observations thatfall well outside the 95% confidence interval of what is expectedunder the PCM (demarcated by the black lines). These departuressuggest that for these items there may be a violation of the unidi-mensionality and that these items may be measuring a constructthat is not consistent with the overall Behavioral Concerns latentconstruct.

5.2. Alternate factor structure in the Head Start sample

Given that the published factor structure for neither the Pro-tective Factors subscales nor the Behavioral Concerns subscaleadequately fit the data in our Head Start sample, a series of EFA’swere performed with the index sample (n = 2,598) to examine theunderlying factor structure. In the initial set of analyses, a seriesof EFA’s examined whether a latent structure could be resolvedwhen all 37 items were included. The 37 items were subjected toa series of common factor analyses for one- to five-factor modelsusing squared multiple correlations as initial communality esti-mates (Snook & Gorsuch, 1989). The final orthogonal solution (fromvarimax and equamax) rotations was subjected to oblique (pro-max) rotations at increasing levels of power. In this first set ofanalyses, it was not possible to resolve a parsimonious structurewhich met the aforementioned criteria for factor retention (e.g.,less than 3 items loading on factors, multiple loaders or negativeitem loadings on factors, or factors did not represent a psycho-logically meaningful construct). Subsequently, two sets of EFA’swere conducted separately for the (a) 27 items from the Pro-tective Factors subscales; and (b) 10 items from the BehavioralConcerns subscale. Findings for this second set of EFA’s are reportedbelow.

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 801

Table 6Newly derived DECA exploratory factor structure (Protective Factors subscales).

Item Label Protective factors subscales

Initiative Self-control Attachment

19 Try or ask to try new things or activities 0.89 . .3 Choose to do a task that was challenging 0.89 . .16 Try different ways to solve a problem 0.88 . .12 Keep trying when unsuccessful (act persistent) 0.81 . .2 Do things for himself/herself 0.62 . .36 Make decisions for himself/herself 0.61 . .20 Start or organize play with other children 0.58 . .28 Say positive things about the future (act optimistic) 0.58 . .22 Ask adults to play with or read to him/her 0.55 . .24 Focus attention or concentrate on a task or activity 0.53 . .1 Act in a way that made adults smile or show interest 0.40 . .5 Control her/his anger . 0.91 .21 Show patience . 0.85 .13 Handle frustration well . 0.81 .4 Listen to or respect others . 0.75 .34 Calm herself/himself down when upset . 0.74 .30 Accept another choice when first choice is unavailable . 0.63 .6 Respond positively to adult comforting when upset . 0.60 .33 Cooperate with others . 0.56 .25 Share with other children . 0.52 .17 Act happy or excited when parent/guardian returned . . 0.8110 Show affection for familiar adults . . 0.7529 Trust familiar adults and believe what they say . . 0.6231 Seek help from children/adults when necessary . . 0.5332 Ask other children to play with him/her . . 0.5337 Show an interest in what children/adults are doing . . 0.437 Participate actively in make-believe play with others . . 0.41

Note. Final exploratory factor structure resulted in a 3-factor, prerotate = equamax, k = 4; promax oblique rotation (n = 2,598 for the index sample). Items that are boldedloaded appreciably on a different factor than that of the published structure.

5.3. Exploratory factor structure of the Protective Factorssubscales

For the 27 items from the DECA’s Protective Factors subscales,a three-factor promax solution (k = 4, with an equamax struc-ture matrix serving as the initial orthogonal structure), producedthe most useful and parsimonious solution that satisfied the cen-tral criteria for retention and had the highest hyperplane count(33.33%). The final structure was determined by examining the 27items that loaded saliently on each of the factors (loadings > .35).Table 6 displays the 27 DECA items comprising the three subscales,with their promax structure loadings. The three-factor structurederived was comparable to the published structure for the threesubscales (Initiative, Self-Control, and Attachment). However, therewere five items that loaded differently on the subscales from thepublished structure. Two items (items 32 and 7) loaded on theAttachment subscale (whereas in the published structure theyloaded on the Initiative subscale). Two items (items 1 and 22)loaded on the Initiative subscale (whereas in the published struc-ture they loaded on the Attachment subscale). The factor structure

for the Self-Control subscale was replicated (with the same eightitems loading as on the published Self-Control scale); however,one additional item (item 6) loaded on Self-Control, whereas inthe published structure this item loaded on the Attachment sub-scale. Cronbach’s alphas for the three Protective Factor subscales(Initiative, Self-Control, and Attachment) were adequate (.91, .91 &.83, respectively). A comparison of the pattern of loadings from thenewly derived structure for the three Protective Factors subscalesand previous studies is also included in Table 1.

5.4. Exploratory factor structure of the Behavioral Concernssubscale

For the ten Behavioral Concerns items, a two-factor varimaxsolution resulted in the most parsimonious solution that satisfiedall criteria for retention. Table 7 displays the two resulting factorsand their varimax structure loadings. The first factor, ExternalizingBehavior, was comprised of eight items related to overactive oroutward behavioral needs (e.g., has temper tantrums, destroysor damages property, fights with other children, becomes upset

Table 7Newly derived DECA exploratory factor structure (Behavioral Concerns subscales).

Item Label Behavioral Concerns

Externalizing Behavior Internalizing Behavior

18 Destroy or damage property 0.7826 Fight with other children 0.7315 Use obscene gestures or offensive language 0.7111 Have temper tantrums 0.67

9 Touch children/adults inappropriately 0.6727 Become upset or cry easily 0.5623 Have a short attention span (difficulty concentrating) 0.5535 Get easily distracted 0.5314 Have no reaction to children/adults – 0.82

8 Fail to show joy or gladness at a happy occasion – 0.68

Note. Final exploratory factor structure was a 2-factor, varimax orthogonal solution (n = 2,598 for the index sample).

Author's personal copy

802 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

easily). The second factor, Internalizing Behavior, was comprisedof the two remaining items related to underactive or withdrawnbehavioral needs (e.g., fails to show joy or gladness, and has noreaction to children/adults). Cronbach’s alpha was .80 for thenewly derived Externalizing Behavior subscale and .64 for thenewly derived Internalizing Behavior subscale. Given that theInternalizing Behavior subscale was only comprised of two itemsand its internal consistency value was lower than .70, there waslittle evidence that it could be considered a viable scale.

5.5. Relative fit of newly-derived factor structure in the HeadStart sample

5.5.1. Protective Factors subscales (CFA)Table 3 lists the relative fit indices for the newly derived struc-

ture of the Protective Factors subscales in the Head Start sample.All of the items for Model 2 (the newly derived Protective Fac-tors subscales) significantly loaded on their respective subscale( ̌ = 0.56–0.86; SE = 0.01–0.02). However, fit indices suggested thatthe model fit was still not adequate, �2 (321) = 3930.833, p < 0.001,CFI = 0.884, TLI = 0.873, RMSEA = 0.066, WRMR = 2.607. Despite theinadequate model fit of the newly derived structure, all of the rel-ative fit indices improved when compared to the published factorstructure (Model 1). Given that both the published (Model 1) andnewly derived (Model 2) structures were three-factor models andincluded the same number of indicators, the models required thesame number of parameters to be estimated and therefore were notnested. Consequently, the statistical significance of the change inmodel fit could not be compared via �2 difference test (��2; Kline,2005). However, improvement in the relative fit of the models wascompared by examining change in the CFI (with �CFI = <−0.01 sug-gesting meaningful improvement; according to Cheung & Rensvold,2002).The change in the relative fit of the models, as indicated by�CFI = −0.04, with Model 2 having a higher CFI than Model 1, sug-gested that the newly derived factor structure (Model 2) better fitthe data for our sample.

5.5.2. Behavioral Concerns subscale (CFA)Table 3 also lists relative fit indices for the newly derived struc-

ture of the Behavioral Concerns subscales in the Head Start sample.All of the items for Model 4 (the newly derived Behavior Concernssubscale) loaded significantly on their respective subscale( ̌ = 0.62–0.78; SE = 0.01–0.04). However, fit indices suggested thatthe model fit was still not adequate, �2 (321) = 878.074, p < 0.001,CFI = 0.882, TLI = 0.843, RMSEA = 0.098, WRMR = 2.601. The pub-lished one-factor (Model 3) and the newly derived two-factor(Model 4) Behavior Concerns subscales were nested models. There-fore, the relative fit of the newly derived structure compared to thepublished structure was examined using the DIFFTEST option inMplus. Because the difference in chi-square values for two nestedmodels using the WLSMV chi-square values is not distributed as achi-square, the DIFFTEST option in Mplus was necessary to con-duct a corrected chi-square difference test (Muthén & Muthén,1998–2011). The corrected ��2 test between the two models(Model 3 and Model 4) was significant, ��2(1) = 318.802, p < 0.001.This indicated that constraining the parameters of the nested modelsignificantly worsened the fit of the model, which suggested thatthe least restrictive model should be retained (Muthén & Muthén,1998–2011). In other words, the constraints on Model 3 (the pub-lished factor structure with all 10 items loading onto one factor)significantly worsened the model fit when compared to Model 4(derived structure with two factors). Therefore the least restric-tive, newly derived model should be retained. Furthermore, thechange in the CFI, �CFI = −0.12, suggested a meaningful improve-ment (≥−0.01; according to Cheung & Rensvold, 2002), with Model

4 having a higher CFI than Model 3. Together this suggested thatthe newly derived, two-factor model was a better fit to our data.

5.5.3. IRT analysesA follow up set of IRT analyses examined the fit of the newly

derived structure in the sample. As with the prior set of analysesexamining item-level fit to the published structure, each of the 37items was calibrated separately within their respective subscalebased on the newly derived structure. In this set of analyses, 11items were calibrated for the Initiative subscale, seven items werecalibrated for the Attachment subscale, and nine items were cali-brated for the Self-Control subscale. The 10 items that comprisedthe Behavioral Concerns subscale were divided into two separatecalibrations: one for Externalizing Behavior, which consisted ofeight items and one calibration for Internalizing Behavior, whichconsisted of two items. Figures 3 and 4 (Online SupplementalMaterials) graphically depict the comparison between the itemloadings on the published and newly derived DECA factor struc-ture for the Protective Factors subscales and Behavioral Concernsscale.

5.5.4. Fit of newly derived Protective Factors (IRT)Columns four and five in Table 4 (Online Supplemental Mate-

rials) present the infit and outfit item-level fit statistics for thenewly derived Protective Factors subscales under the PCM. Forthe Initiative subscale, the infit and outfit values ranged from 0.81to 1.24 and 0.80 to 1.27, respectively. Only two items out of 11had an infit value that fell in the sample-size corrected infit range(values from 0.97–1.02) and three items had outfit values thatfell within the outfit cut-off range (values from 0.92–1.08). Forthe newly derived Attachment subscale, the infit and outfit val-ues ranged from 0.89 to 1.18 and 0.89 to 1.22, respectively. Therewere no items with an infit value that fell in the sample-size infitrange and only three items fell in the corrected outfit range. Forthe newly derived Self-Control subscale, the infit and outfit val-ues ranged from 0.89 to 1.15 and 0.88 to 1.16, respectively. Oneitem, out of nine had an infit value that fell in the sample-sizecorrected infit range and six items had outfit values that fell inthe sample-size corrected outfit range. While some of the item-fitvalues fell outside the cut-off ranges, these deviations were min-imal; suggesting that overall most items were representing eachof the unidimensional constructs of the newly derived three Pro-tective Factors subscales (Initiative, Self-Control, and Attachment)reasonably well.

5.5.5. Fit of newly derived Behavioral Concerns subscales (IRT)Columns four and five of Table 5 (Online Supplemental Mate-

rials) present the infit and outfit item-level fit statistics for thenewly derived Behavioral Concerns subscales under the PCM. Therewere two separate calibrations for the Behavioral Concerns: anExternalizing Behavior subscale and an Internalizing Behavior sub-scale. Overall, the two subscales demonstrated reasonable fit tothe PCM. The Externalizing Behavior subscale had infit and out-fit values that ranged from 0.90 to 1.18, respectively. Three items,out of eight, had infit values that fell in the sample-size correctedinfit range and five items had infit values that fell in the sample-size corrected outfit range. The minimal deviation of the item-levelfit statistics from the cut-off values of the items comprisingthe newly derived Externalizing Behavioral Concerns, suggestedthat the eight items were representative of the underlying latentconstruct.

The two Internalizing Behavioral Concerns items had infit valuesof 0.94 and 1.05 and outfit values of 0.86 and 0.95, respectively. Onlyone item had an outfit value that fell in the sample-size correctedoutfit range. The minimal deviation of the item-level fit statisticsfrom the cut-off values of the items comprising the newly derived

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 803

Internalizing Behavioral Concerns, suggested that the two itemswere representative of the underlying latent construct. When com-paring the one-factor published Behavioral Concerns structure tothe two-factor newly derived structure, the two-factor structure fitthe PCM better. Better item-level fit was obtained for items 8 and 14when these items were treated as comprising their own separateconstruct.

6. Discussion

The present study examined the factor structure of the DECAteacher report in an independent sample of culturally and lin-guistically diverse Head Start children. Using a rigorous series ofexploratory, confirmatory, and IRT analyses, we examined boththe Protective Factors and the Behavioral Concerns subscales.This was the first study to examine the structure of the Behav-ioral Concerns subscale through confirmatory or exploratory factoranalysis. We found that the published structure was not ade-quate in the diverse Head Start sample, with the BehavioralConcerns subscale exhibiting the worst fit. In addition, we foundthat an alternate structure was more appropriate for our HeadStart sample. The newly derived structure that best representedour data included a two-factor structure for Behavioral Concernsthat distinguished between internalizing and externalizing behav-ior. However, only the externalizing behavior subscale was aviable scale. Findings from this initial study document the diffi-culty of confirming the structure of published scales in diverselow-income populations and suggest that further psychometricresearch is needed prior to the use of the DECA in its currentform to inform early identification efforts within early childhoodprograms serving children from low-income, ethnically diversebackgrounds.

6.1. Fit of the published structure in the Head Start sample

Supporting our hypotheses, the initial CFA results suggestedinadequate fit of our data to the published structure for boththe three Protective Factors subscales (Initiative, Self-Control, andAttachment) and the Behavioral Concerns subscale. While the fit ofthe Protective Factors subscales was not adequate, the BehavioralConcerns subscale exhibited the worst fit in our sample of diverseHead Start children. IRT findings suggested that for the publishedstructure, items on the Protective Factors subscales exhibited rea-sonable fit to their respective subscales. Similar to our CFA findings,IRT analyses suggested more problems with the fit of the itemson the Behavioral Concerns subscale; in particular items 8 (“failsto show joy”) and 14 (“no reaction to children/adults”) demon-strated the most problematic fit. Overall, IRT fit statistics suggesteda violation of the assumption that the ten items comprised a unidi-mensional construct. Likely, these two poorly fitting items did notrepresent the construct of Behavioral Concerns in the same way asthe other eight items did.

Together the initial CFA and IRT analyses suggested that at thesubscale and at the item level, the published factor structure wasnot appropriate for our Head Start sample. This set of findings issupported by other recent studies that document the difficulty ofreplicating published structures in independent samples of diverselow-income populations—especially for social-emotional assess-ments (Gross et al., 2006; Konold et al., 2003; LeBoeuf et al., 2010).Given that our analyses did not provide strong evidence for the useof the published factor structure in our sample, and the potentialimplications of applying this structure programmatically for earlyidentification and intervention efforts, we conducted another setof analyses to examine whether an alternative structure might bemore appropriate in our sample.

6.2. Alternate factor structure in the Head Start sample

6.2.1. Protective factors subscalesOur second set of analyses examined whether an alternative

factor structure of the DECA might be more appropriate in ourHead Start sample. The exploratory factor analyses for the Pro-tective Factors subscales resulted in a three-factor solution, whichwas comparable to the three published subscales (see Table 6).While the three-factor structure emerged as the best structurein our sample, five items loaded differently from the publishedsubscales. This comports with previous studies, which found thatseveral items inconsistently loaded or double-loaded, particularlyon the Attachment and Initiative subscales of the published fac-tor structure (Jaberg et al., 2009; Lien & Carlson, 2009). Threeitems on the published Initiative subscale were problematic acrossour study, Jaberg et al. (2009), and Lien and Carlson (2009): item1 (makes adult smile), item 22 (asks adult to play/read), anditem 37 (shows interest in what others do). In addition, threeitems on the published Attachment subscale, were problematicacross these studies and either double-loaded with the Initiativesubscale or only loaded on the Initiative subscale: item 7 (par-ticipates in make-believe), item 28 (acts optimistic), and item 32(asks children to play).

These findings suggest that perhaps the items on the Attach-ment and Initiative subscales are not capturing distinctiveconstructs or the constructs as they are currently named. As crit-icized in a report by Bridges et al. (2004), it could the case thatin our sample the construct of attachment was not being assessedin the DECA as it is conceptualized in the broader developmen-tal literature. In fact, Attachment items refer more generally tothe relationship that children have with familiar adults (such asteachers) and parents, and in the recent revision of the DECA (DECA-P2; LeBuffe & Naglieri, 2012) the name of this subscale was changedto “Attachment/Relationships.” It should be acknowledged thatthese foundational construct validity concerns could be in fact driv-ing the misfit of the published structure in our Head Start sample(and in other independent samples) rather than differences due toour sample.

Findings from the CFA and IRT analyses suggested relativeimprovement in fit to our data when the newly derived factor struc-ture for the Protective Factors subscales was applied. Althoughstatistical comparisons could not be made, the relative improve-ment in CFA model fit indicated that the newly derived structurebetter represented these data for our diverse sample of Head Startchildren. IRT findings for the published and the newly derivedProtective Factors subscales suggested that across the two fac-tor structures, relative improvement in item-level fit to the PCMwas mixed. The items comprising each of the three Protective Fac-tors subscales under both factor structures fit the unidimensionalassumption in relatively the same way; however, further measure-ment work is needed to identify a parsimonious structure whereitems across subscales are all fitting the hypothesized constructwell.

Taken together, this set of analyses for the Protective Factorssubscales suggests that while there may be discrepancies amongthe pattern of item loadings across independent samples, threefactors generally emerge. While further measurement work iswarranted, the use of the Protective Factors subscales in diversesamples of preschool children may be supported. Our study sam-ple represented a population of teachers and children who werepredominantly of ethnic minority background (African Americanor Hispanic) and Spanish speaking, from an urban Head Start pro-gram. However, future studies should continue to examine whetherthe factor structure replicates in independent samples of other eth-nically, linguistically, or socioeconomically different populations(e.g., those living in rural areas, those speaking other languages or

Author's personal copy

804 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

predominantly English, or those served by early childhood specialeducation programs). Also, it is critical to similarly examine thevalidity of parent reports on the DECA with diverse populations infuture research.

6.2.2. Behavioral concerns subscaleExploratory factor analyses of the 10 Behavioral Concerns items

revealed that the most parsimonious solution was a two-factorstructure comprised of an Externalizing and Internalizing Behav-ior scale. Externalizing behavior included eight items such as “hastemper tantrums” and “fights with other children” and the Inter-nalizing behavior factor included two items “fails to show joy orgladness” and “has no reaction to children/adults.” When the rel-ative fit of the newly derived two-factor solution was examinedvia CFA and IRT analyses, models resulted in fit statistics that werebetter than those generated from the published factor structure. Inthe CFA analyses, relative model fit showed improvement. In addi-tion, the corrected ��2 and �CFI tests suggested that constrainingthe model (having all 10 items load onto one factor as in the pub-lished structure) significantly worsened the model fit. In the IRTmodels, better item-level fit was obtained when the Externalizingand Internalizing behavior items were calibrated separately.

This set of findings suggested that, in our sample, the BehavioralConcerns subscale was better represented by a two-factor modelthat distinguished between externalizing and internalizing prob-lem behaviors. Evidence for the unidimensional structure of theDECA Behavioral Concerns subscale in our sample was not found.In addition, the two-factor structure found in our sample comportswith distinctions made in the developmental literature as well as inseveral widely used early childhood assessments of problem behav-ior within the school context (Achenbach & Rescorla, 2000; Lutz,Fantuzzo, & McDermott, 2002; McDermott, 1993). Similar to theitems that make up the Externalizing Behavior factor in our sample,externalizing behavior problems are typically defined in the litera-ture as outward acts of aggression, disruption, tantrums, and overactivity (Campbell, 2002). Similar to the items that made up theInternalizing Behavior factor; internalizing behaviors are typicallydefined by shyness, flat affect, and social withdrawal in preschoolchildren (Campbell, 2002; Qi & Kaiser, 2003).

6.3. Limitations and future directions

The present study extends previous research by examiningthe factor structure of the DECA in a culturally and linguisticallydiverse sample of preschool children from low-income back-grounds attending a municipal Head Start program; however, thereare several limitations that must be acknowledged. First, while wefound that a two-factor structure best represented the BehavioralConcerns subscale in our sample, the Internalizing factor was com-prised only of two items, and internal consistency for this factorwas lower than acceptable values (alpha = .64). The Externalizingscale did show adequate internal consistency and therefore thereis some initial evidence to suggest it could be used as a screenerto identify children exhibiting externalizing problems. However,it is not our recommendation that the either the externalizing orinternalizing behavior scales derived in our study be used. Ourfindings instead suggest that caution be taken when using theoverall Behavioral Concerns subscale score as a screening tool toidentify the social-emotional needs of children, particularly thoseexhibiting internalizing behavior. Much more research is needed toexamine whether there is criterion-related validity evidence for theuse of two newly derived scales in samples of preschool childrenfrom low-income backgrounds. Future studies also are needed toexamine the overlap between these two factors and more compre-hensive, multidimensional measures of problem behavior.

In addition, because in the original development of the DECA theBehavioral Concerns subscale was never derived via exploratoryfactor analyses, it is not clear whether in fact a two-factor struc-ture might be more appropriate in the original normative sampleas well. It would be important for future studies to apply a care-ful series of analyses in other samples to see whether in fact theimprovement in fit with respect to the two-factor structure of theBehavioral Concerns subscale is specific to our low-income sampleor whether it is found in other samples.

It should also be noted that we chose to include only scoresfrom the English language form of the DECA. In the overall sample,there were 14% of Head Start teachers who were Spanish speak-ing and completed the DECA using the Spanish language form. Weexcluded the Spanish scores in our set of analyses because we feltcareful examination of the English form was needed as a first step.While beyond the scope of the present study, it is clear that care-ful research is needed to examine the measurement properties ofthe Spanish language form since it is increasingly being used inHead Start programs nationally. No published factor analytic workto date has been conducted on the Spanish language form. In themanual, (LeBuffe & Naglieri, 2001) describe findings from a smallstudy, examining the comparability of English and Spanish DECAsubscale scores. Paired sample t-tests were used to compare theDECA scores for a sample of Spanish speaking bilingual parents(n = 44) and teachers (n = 48) who completed the DECA in Englishand Spanish on the same child. No significant differences in sub-scale means were found and therefore the DECA manual suggeststhat the English norms be used to calculate normative scores forthe Spanish language form (LeBuffe & Naglieri, 2001). An impor-tant next step would be to examine carefully the underlying factorstructure of the Spanish form of the DECA to test whether in factthe latent structure of the two language forms are comparable andwhether a different factor structure or set of norms are warranted.

In addition, given that there is a newly available revision ofthe DECA (Devereux Early Childhood Assessment for Preschoolers[DECA-P2; LeBuffe & Naglieri, 2012]) it is critical that future studiescarefully examine the underlying latent structure of this measure.As many programs continue to use the original form of the DECA,our study findings are a contribution to the field, however, giventhat publishers (a) made minimal changes to the Behavioral Con-cerns subscale (more substantive changes in wording or labelingwere made to the Protective Factors subscales); and (b) took a simi-lar approach to examining the latent dimensionality of the measure,it is likely that similar issues that we identified may exist with therevised measure. Our greatest concern continues to be with the psy-chometric properties of the Behavioral Concerns subscale and itsuse as a behavioral screener for children exhibiting a range of emo-tional and behavioral needs. In the revision, while seven new itemswere added (and eight original items removed) from the ProtectiveFactors subscales, only one item was added to the Behavioral Con-cerns subscale, and it was an externalizing behavior item (LeBuffe& Naglieri, 2012).

Finally, we did not have teacher demographic information tolink to individual children’s scores, because of the programmaticand archival nature of our data. While we were able to control sta-tistically for the fact that children were nested within rater (HeadStart teachers) by employing a multilevel approach in our anal-yses, we could not examine whether the misfit of the publishedstructure in our sample was due to rater effects. Research suggeststhat patterns of endorsement on behavior scales may in part reflectcharacteristics of teachers (Hamre, Pianta, Downer, & Mashburn,2007; Konold & Pianta, 2007; Mashburn, Hamre, Downer, & Pianta,2006) but we were not able to examine these in our models.This is of particular concern as the intraclass correlations wereextremely high, ranging from 30.3% to 42.6% attributed to differ-ences between classroom teachers (raters). While recent studies

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 805

have found comparable levels of teacher rater variance attributableto children’s scores, this high percent of variance attributable toteacher rater is of concern. Screening tools to identify children’ssocial-emotional strengths and needs should reflect individual dif-ferences in children’s developmental skills, rather than systematicdifferences between teacher raters (Waterman, McDermott, Fan-tuzzo, & Gadsden, 2012). As suggested by Waterman et al. (2012)and by the National Research Council (Snow & Van Hemel, 2008)early childhood program administrators should carefully considerthe amount of training, technical assistance, and observation timeneeded by teachers to accurately complete ratings scales such asthe DECA, and therefore minimizing error variance attributed torater, when choosing tools to be used for screening purposes. Thisis a growing area of research and future studies are warranted.

6.4. Implications for policy and practice

As early childhood programs are pressed to meet the mentalhealth needs of the children they serve, the importance of care-fully establishing the construct validity of measures before they areused to inform early identification and intervention efforts cannotbe overstated. Comprehensive social-emotional assessments areneeded that are validated for use within early childhood programsserving diverse families. While there are several widely availablepublished measures for use within early childhood programs, itis critical that their psychometric properties are examined withinthe target population before they are used to inform decisionsabout individual children (Snow & Van Hemel, 2008). For example,there have been a series of measurement projects that intentionallyhave developed multidimensional measures to assess classroombehavior for diverse preschool populations from low-income back-grounds (Fantuzzo, McWayne, & Bulotsky, 2003). In these studies,careful steps were taken to ensure that the items and factorstructure were appropriate for use within the population. In thedevelopment of the Adjustment Scales for Preschool Intervention(ASPI; Lutz et al., 2002) a measure of classroom problem behav-ior, the Penn Interactive Peer Play Scale (PIPPS; Fantuzzo, Coolahan,Mendez, McDermott, Sulton-Smitt, 1998), and the Preschool Learn-ing Behaviors Scale (PLBS; McDermott, Leigh, & Perry, 2002) a seriesof careful development and validation studies were undertaken toensure that the derived scales were reliable and valid for use withindiverse Head Start populations.

Given that children from low-income backgrounds servedwithin publically funded programs, such as Head Start, face ele-vated risks to their future social and academic success, it is criticalthat our nation’s early childhood programs are equipped with thetools needed to identify those in greatest need. Only when com-prehensive, validated social-emotional assessments are used canproblem behaviors that interfere with children’s engagement inlearning and the development of social relationships within theclassroom be detected early enough to support positive schooladjustment. Our study findings suggest that use of the total scoreas a comprehensive screening tool to identify Behavioral Concernsin diverse samples of preschool children from low-income back-grounds is not supported, especially when used to identify childrenexhibiting internalizing behavioral needs. It is not our position thatscores from our newly derived factor structure should be used byearly childhood programs, but rather that future work is criticallyneeded in particular to revise the Behavioral Concerns subscale.This is the case if the Behavioral Concerns subscale is to be used asa comprehensive screening tool programmatically to identify chil-dren exhibiting a range of emotional and behavioral needs. In ouranalyses, while some of the items shifted around on the Protec-tive Factors subscales, the shifting and level of item misfit werenot as serious as what we found in our analyses of the 10 items onthe Behavioral Concerns subscale. If programs are to use the scores

from the DECA to identify children exhibiting both externalizingand internalizing problem behavior, it is likely that children withinternalizing problems will be missed.

This is a concern because the Behavioral Concerns subscale ofthe DECA is currently widely used within programs serving low-income families as a tool to identify children with mental healthneeds in need of intervention (Chain, Dopp, Smith, Woodland,& LeBuffe, 2010). Research documents programmatic challengesto identifying and referring children with emotional and behav-ioral needs served within early childhood programs, such as HeadStart (Fantuzzo et al., 1999; Feil et al., 2005; Lopez et al., 2000).Even when validated measurement tools are used, children exhibit-ing internalizing problem behavior are typically under-identifiedand under-referred for intervention services (Fantuzzo, Bulotsky,McDermott, Mosca, & Lutz, 2003; Lopez et al., 2000). In theseearly childhood studies, preschool teachers report concerns aboutchildren exhibiting externalizing behavior and children exhibit-ing aggressive, oppositional, or inattentive behavior are typicallythe children who are referred and identified for early interven-tion services (Fantuzzo et al., 2003a). Whereas it could be thecase that preschool teachers are better reporters of externalizingbehavior because it is more visible and disruptive to classroomroutines, our current study of the DECA Behavioral Concerns sub-scale suggests that externalizing behavior items outnumber theinternalizing items eight to one. More internalizing items may beneeded to assist preschool programs in identifying children dis-playing internalizing behavioral needs, who might otherwise notbe identified by teachers in the classroom. Again this is of con-cern, as research documents the academic and social vulnerabilityof children who exhibit internalizing problem behavior early intheir development (Bulotsky-Shearer, Bell, & Dominguez, 2012;Fantuzzo et al., 2003a).

If the Behavioral Concerns subscale of the DECA is to be usedprogrammatically, further development of the scale is recom-mended to include additional internalizing behavior items. Thiswould allow for a more comprehensive screening tool to iden-tify both externalizing and internalizing problem behavior withinearly childhood programs serving diverse populations. It is incum-bent on the research community to partner with early childhoodpractitioners and families, to ensure that measurement tools usedwithin programs serving low-income communities fit the needs ofthe child and the program (Fantuzzo et al., 2003b).

Author note

This research project was funded by a Provost’s Research Awardto the first author from the University of Miami (2009–2010). Avery special thank you to the Miami-Dade County Human ServicesAction Agency Head Start/Early Head Start Program for their col-laboration in this project.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.ecresq.2013.07.009.

References

Achenbach, T. M., & Rescorla, L. A. (2000). Manual for ASEBA preschool forms & profiles.Burlington, VT: University of Vermont, Research Center for Children, Youth, &Families.

American Educational Research Association, American Psychological Association, &National Council on Measurement in Education. (1999). Standards for educationaland psychological testing. Washington, DC: American Psychological Association.

Barbarin, O. (2007). Mental health screening of preschool children: Validity andreliability of ABLE. American Journal of Orthopsychiatry, 77, 402–418.

Author's personal copy

806 R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807

Bentler, P. M. (1990). Comparative fit indices in structural models. PsychologicalBulletin, 107, 238–246.

Bernstein, I. H., & Teng, G. (1989). Factoring items and factoring scales are different:Spurious evidence for multidimensionality due to item categorization. Psycho-logical Bulletin, 105, 467–477.

Bollen, K. A., & Long, J. S. (1993). Testing structural equation models. Newbury Park,CA: Sage Publications.

Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model (2nd ed.). Mahwah, NJ:Lawrence Erlbaum.

Brauner, C. B., & Stephens, C. B. (2006). Estimation the prevalence of early childhoodserious emotional/behavioral disorders: Challenges and recommendations. Pub-lic Health Reports, 121, 303–310.

Bridges, L. J., Berry, D. J., Johnson, R., Calkins, J., Margie, N. G., Cochran, S. W., &Brady-Smith, C. (2004). Early childhood measures profiles. Washington, DC: ChildTrends.

Buja, A., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate BehavioralResearch, 27(4), 509–540.

Bulotsky-Shearer, R. J., Bell, E., & Dominguez, X. (2012). Latent profiles of problembehavior within learning, peer and teacher contexts: Identifying subgroups ofchildren at academic risk across the preschool year. Journal of School Psychology,50, 775–789.

Campbell, S. B. (1995). Behavior problems in preschool children: A review of recentresearch. Journal of Child Psychology & Psychiatry & Allied Disciplines, 36, 113–149.

Campbell, S. B. (2002). Behavior problems in preschool children: Clinical and develop-mental issues. New York, NY: Guilford Press.

Caselman, T. D., & Self, P. A. (2008). Assessment instruments for measuring youngchildren’s social-emotional behavioral development. Children & Schools, 30,103–115.

Cattell, R. B. (1966). Handbook of multivariate experimental psychology. Chicago, IL:Rand McNally.

Chain, J., Dopp, A., Smith, G., Woodland, S., & LeBuffe, P. (2010). Devereux EarlyChildhood Assessment literature review. Villanova, PA: Devereux Center forResilient Children. Retrieved from. http://www.devereux.org/site/DocServer/Research Compilation 7 29 2010 1 .pdf (Unpublished research report)

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testingmeasurement invariance. Structural Equation Modeling, 9, 233–255.

Chernyshenko, O. S., Stark, S., Chan, K., Drasgow, F., & Williams, B. (2001). Fittingitem response item theory models to two personality inventories: Issues andinsights. Multivariate Behavioral Research, 36, 523–562.

Chou, C. P., Bentler, P. M., & Satorra, A. (1991). Scaled test statistics and robuststandard errors for nonnormal data in covariance structure analysis: A MonteCarlo study. British Journal of Mathematical and Statistical Psychology, 44,347–357.

Crane, J., Mincic, M. S., & Winsler, A. (2011). Parent-teacher agreement and reliabilityon the Devereux early childhood assessment (DECA) in English and Spanish forethnically diverse children living in poverty. Early Education and Development,22, 520–547.

De Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY:Guilford.

Denham, S. A. (2006). Social-emotional competence as support for school readiness:What it is and how do we assess it? Early Education and Development, 17, 57–89.

Dobbs, J., Doctoroff, G. L., Fisher, P. H., & Arnold, D. H. (2006). The association betweenpreschool children’s socio-emotional functioning and their mathematical skills.Journal of Applied Developmental Psychology, 27, 97–108.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,NJ: Lawrence Erlbaum.

Escalon, X. D., & Greenfield, D. (2009). Learning behaviors mediating the effects ofbehavior problems on academic outcomes. NHSA Dialog: A Research-to-PracticeJournal for the Early Intervention Field, 12, 1–17.

Fantuzzo, J., Bulotsky, R. J., McDermott, P., Mosca, S., & Lutz, M. (2003). A multivari-ate analysis of emotional and behavioral adjustment and preschool educationaloutcomes. School Psychology Review, 32, 185–203.

Fantuzzo, J., Coolahan, K., Mendez, J., McDermott, P., & Sutton-Smith, B. (1998).Contextually-relevant validation of peer play constructs with African-AmericanHead Start children: Penn Interactive Peer Play Scale. Early Childhood ResearchQuarterly, 13, 411–431.

Fantuzzo, J., Hightower, D., Grim, S., & Montes, G. (2002). Generalization of the ChildObservation Record: A validity study for diverse samples of urban, low-incomepreschool children. Early Childhood Research Quarterly, 17, 106–125.

Fantuzzo, J., Manz, P. H., & McDermott, P. (1998). Preschool version of the socialskills rating system: An empirical analysis of its use with low-income children.Journal of School Psychology, 36, 199–214.

Fantuzzo, J., McWayne, C., & Bulotsky, R. (2003). Forging strategic partnerships toadvance mental health science and practice for vulnerable children. School Psy-chology Review, 32, 17–37.

Fantuzzo, J. W., Stoltzfus, J., Lutz, M. N., Hamlet, H., Balraj, V., Turner, C., & Mosca,S. (1999). An evaluation of the special needs referral process for low-incomepreschool children with emotional and behavioral problems. Early ChildhoodResearch Quarterly, 14(4), 465–482.

Feil, E. G., Small, J. W., Forness, S. R., Serna, L. A., Kaiser, A. P., Hancock, T. B., & Lopez,M. L. (2005). Using different measures, informants, and clinical cut-off pointsto estimate prevalence of emotional or behavioral disorders in preschoolers:Effects on age, gender, and ethnicity. Behavioral Disorders, 30, 375–391.

Finch, J. F., West, S. G., & MacKinnon, D. P. (1997). Effects of sample size and nonnor-mality on the estimation of mediated effects in latent variable models. StructuralEquation Modeling: A Multidisciplinary Journal, 4, 87–107.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.Gross, D., Fogg, L., Young, M., Ridge, A., Cowell, J. M., Richardson, R., & Sivan, A.

(2006). The equivalence of the child behavior Checklist/1 1/2-5 across par-ent race/ethnicity, income level, and language. Psychological Assessment, 18,313–323.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory anditem response theory and their applications to test development. EducationalMeasurement: Issues and Practice, 12, 38–47.

Hamre, B. K., Pianta, R. C., Downer, J. T., & Mashburn, A. J. (2007). Teachers’ percep-tions of conflict with young students: Looking beyond problem behaviors. SocialDevelopment, 17, 115–136.

Horn, J. L. (1965). An empirical comparison of methods for estimating factor scores.Educational and Psychological Measurement, 25, 313–322.

Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide in measurementinvariance in aging research. Experimental Aging Research, 18, 117–144.

Huffman, L. C., Mehlinger, S. L., & Kerivan, A. S. (2000). Off to a good start: Researchon the risk factors for academic and behavioral problems at the beginning of school.Washington, DC: National Institutes of Mental Health.

Jaberg, P. E., Dixon, D. J., & Weis, G. M. (2009). Replication evidence in support of thepsychometric properties of the Devereux Early Childhood Assessment. CanadianJournal of School Psychology, 24, 158–166.

Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Thou-sand Oaks, CA: Sage.

Kline, R. B. (2005). Principles and practice of structural equation modeling. New York,NY: Guilford Press.

Knight, G. P., & Hill, N. E. (1998). Measurement equivalence in research involvingminority adolescents. In V. C. McLoyd, & L. Steinberg (Eds.), Studying minor-ity adolescents: Conceptual, methodological, and theoretical issues (pp. 183–210).Mahwah, NJ: Erlbaum.

Konold, T. R., Hamre, B. K., & Pianta, R. C. (2003). Measuring problem behaviors inyoung children. Behavioral Disorders, 28, 111–123.

Konold, T. R., & Pianta, R. C. (2007). The influence of informants on ratings ofchildren’s behavioral functioning. Journal of Psychoeducational Assessment, 25,222–236.

Lavigne, J. V., Gibbons, R. D., Christoffel, K. K., Arend, R., Rosenbaum, D., Binns, H., &Isaacs, C. (1996). Prevalence rates and correlates of psychiatric disorders amongpreschool children. Journal of the American Academy of Child and Adolescent Psy-chiatry, 35, 204–214.

LeBoeuf, W. A., Fantuzzo, J. W., & Lopez, M. L. (2010). Measurement and popula-tion miss-fits: A case study on the importance of using appropriate measuresto evaluate early childhood interventions. Applied Developmental Science, 14,45–53.

LeBuffe, P. A., & Naglieri, J. A. (1999). Devereux Early Childhood Assessment (DECA).Lewisville, NC: Kaplan Press.

LeBuffe, P. A., & Naglieri, J. A. (2001). Devereux Early Childhood Assessment – Spanishversion (DECA-Sp). NC: Kaplan Press: Lewisville.

LeBuffe, P. A., & Naglieri, J. A. (2012). Devereux Early Childhood Assessment forpreschoolers (DECA-P2) (2nd ed.). Lewisville, NC: Kaplan Press.

Lien, M. T., & Carlson, J. S. (2009). Psychometric properties of the Devereux EarlyChildhood Assessment in a Head Start sample. Journal of PsychoeducationalAssessment, 27, 386–396.

Linacre, J. M. (2012). WINSTEPS (Version 3.74.0) [Computer software]. Chicago, IL:Winsteps.com.

Lopez, M. L., Tarullo, L. B., Forness, S. R., & Boyce, C. A. (2000). Early identification andintervention: Head Start’s response to mental health challenges. Early Educationand Development, 11, 265–282.

Lord, F. M. (1980). Applications of item response theory to practical testing problems.Hillsdale, NJ: Erlbaum.

Lutz, M. N., Fantuzzo, J., & McDermott, P. (2002). Multidimensional assessment ofemotional and behavioral adjustment problems of low-income preschool chil-dren: Development and initial validation. Early Childhood Research Quarterly, 17,338–355.

Maier, M. F., Vitiello, V. E., & Greenfield, D. B. (2012). A multilevel model of child-and classroom-level psychosocial factors that support language and literacyresilience of children in Head Start. Early Childhood Research Quarterly, 27,104–114.

Mashburn, A. J., Hamre, B. K., Downer, J. T., & Pianta, R. C. (2006). Teacher andclassroom characteristics associated with teachers’ ratings of prekindergarten-ers’ relationships and behaviors. Journal of Psychoeducational Assessment, 24,367–380.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47,149–174.

McDermott, P. A. (1993). National standardization of uniform multisituational meas-ures of child and adolescent behavior pathology. Psychological Assessment, 5,413–424.

McDermott, P. A., Leigh, N. M., & Perry, M. A. (2002). Development and validationof the Preschool Learning Behaviors Scale. Psychology in the Schools, 39(4),353–365.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–105).New York, NY: American Council on Education & Macmillan.

Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical vari-ables. Journal of Educational Statistics, 11, 3–31.

Muthén, B., du Toit, S. H. C. & Spisic, D. (1997). Robust inference using weightedleast squares and quadratic estimating equations in latent variable modelingwith categorical and continuous outcomes. Unpublished Manuscript. Retrievedfrom http://gseis.ucla.edu/faculty/muthen/articles/Article 075.pdf

Author's personal copy

R.J. Bulotsky-Shearer et al. / Early Childhood Research Quarterly 28 (2013) 794– 807 807

Muthén, L. K., & Muthén, B. O. (1998–2011). Mplus user’s guide (6th ed.). Los Angeles,CA: Muthén & Muthén.

Center for Children in Poverty (NCCP). (2011). Who are America’s poor children?The official story. New York, NY: Author. Retrieved from. http://www.nccp.org/topics/childpoverty.html

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlationcoefficient. Psychometrika, 44, 443–460.

Osteen, P. (2010). An introduction to using multidimensional item response the-ory to assess latent factor structures. Journal of the Society for Social Work andResearch, 1, 66–82.

Qi, C. H., & Kaiser, A. P. (2003). Behavior problems of preschool children fromlow-income families: Review of the literature. Topics in Early Childhood SpecialEducation, 23, 188–216.

Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A compari-son of methods based on confirmatory factor analysis and item response theory.Journal of Applied Psychology, 87, 517–529.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications anddata analysis methods (2nd ed.). Thousand Oaks, CA: Sage.

Raver, C. C. (2002). Emotions matter: Making the case for the role of young children’semotional development for early school readiness. Social Policy Report, 16, 1–20.

Raver, C. C., Gershoff, E. T., & Aber, J. L. (2007). Testing equivalence of mediatingmodels of income, parenting, and school readiness for White, Black, and Hispanicchildren in a national sample. Child Development, 78, 96–115.

Salvia, J., & Ysseldyke, J. E. (1991). Assessment (5th ed.). Boston, MA: Houghton Mifflin.Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to

evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.Snook, S. C., & Gorsuch, R. L. (1989). Component analysis versus common factor

analysis: A Monte Carlo study. Psychological Bulletin, 106, 148–154.Snow, C. E., & Van Hemel, S. B. (2008). Early childhood assessment: Why, what, and

how. Washington, DC: The National Academies Press.Steiger, J. H. (1990). Structural model evaluation and modification: An interval esti-

mation approach. Multivariate Behavioral Research, 25, 173–180.

St. Pierre, R. G., Layzer, J. I., Goodson, B. D., & Bernstein, L. (1997). National impact eval-uation of the Comprehensive Child Development Program: Final report. Cambridge,MA: Abt Associates.

Thompson, R. A., & Raikes, H. A. (2007). The social and emotional foundations ofschool readiness. In D. F. Perry, R. K. Kaufman, & J. Knitzer (Eds.), Social andemotional health in early childhood: Building bridges between services and systems(pp. 13–36). Baltimore, MD: Brookes.

U.S. Department of Health and Human Services. (2004). Head Start program fact sheet.Washington, DC: Administration of Children, Youth, and Families.

U.S. Department of Health and Human Services. (2007). Improving Head Startfor School Readiness Act of 2007. H.R. 1429. Washington, DC: U.S. Gov-ernment Printing Office. Retrieved from. http://www.govtrack.us/congress/billtext.xpd?bill=h110-1429

Velicer, W. F. (1976). Determining the number of components from the matrix ofpartial correlations. Psychometrika, 41, 321–327.

Waller, N. G. (2001). MicroFACT 2.1: A microcomputer factor analysis program forordered polytomous data and mainframe size problems [Computer software]. St.Paul, MN: Assessment Systems.

Winsler, A. (2012). Childcare type, stability, school readiness, kindergartenretention, English proficiency, and early school trajectories for low-income,immigrant, and/or dual language learners: The Miami school readiness project.In Paper presented at the meeting of Early Learning Coalition of Miami-Dade/MonroeCounty Board Meeting Miami, FL,. Retrieved from. http://www.elcmdm.org/about us/Board/minutes/presentations/ELC2012Presentation.pdf

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA Press.Yoshikawa, H., & Zigler, E. (2000). Mental health in Head Start: New direc-

tions for the twenty-first century. Early Education & Development, 11,247–264.

Yu, C. Y., & Muthén, B. O. (2002). Evaluation of model fit indices for latent variablemodels with categorical and continuous outcomes (Technical report). Los Ange-les, CA: University of California, Los Angeles, Graduate School of Education andInformation Studies.