Exploratory and Confirmatory Factor Analyses of the Child Health Questionnaire-Child Form 87...

16
Exploratory and confirmatory factor analyses of the DIAL-3: What does this developmental screenerreally measure? Jason L. Anthony , Mike A. Assel, Jeffrey M. Williams University of Texas Health Science Center at Houston, Division of Developmental Pediatrics, 7000 Fannin Street, Suite 2377, Houston, TX, 77030, United States Received 16 May 2006; received in revised form 23 January 2007; accepted 12 February 2007 Abstract To examine the convergent and discriminant validity of the scales on the Developmental Indicators for the Assessment of LearningThird Edition [DIAL-3; Mardell-Czudnowski, C., and Goldenberg, D.S. (1998). Developmental indicators for the assessment of learningthird edition. Circle Pines, MN: American Guidance Service, Inc.], exploratory and confirmatory factor analyses were performed on randomly selected subsamples of 2012 children who attended Head Start. Exploratory factor analysis yielded three factors, labeled Verbal Ability, Nonverbal Ability, and Achievement, which collectively accounted for 56% of the variance in children's performances. Confirmatory factor analysis evaluated this empirically-derived model and the conceptually-derived model of the authors of the DIAL-3 in a separate subsample of children. Although neither model explained the data extremely well, the empirically-derived model characterized children's performances better than the conceptually-derived model, e.g., CFIs = .90 and .85, RMSEAs = .07 and .10, respectively. The discussion highlights an alternative conceptualization of the DIAL-3, potential uses of the factor scores, ideas for consideration during the next revision of the DIAL-3, and the need for additional validity research. © 2007 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved. Keywords: Developmental screening; Preschool; Assessment Journal of School Psychology 45 (2007) 423 438 Corresponding author. E-mail address: [email protected] (J.L. Anthony). 0022-4405/$ - see front matter © 2007 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.jsp.2007.02.003

Transcript of Exploratory and Confirmatory Factor Analyses of the Child Health Questionnaire-Child Form 87...

Journal of School Psychology

45 (2007) 423–438

Exploratory and confirmatory factor analyses of theDIAL-3: What does this “developmental screener”

really measure?

Jason L. Anthony ⁎, Mike A. Assel, Jeffrey M. Williams

University of Texas Health Science Center at Houston, Division of Developmental Pediatrics,7000 Fannin Street, Suite 2377, Houston, TX, 77030, United States

Received 16 May 2006; received in revised form 23 January 2007; accepted 12 February 2007

Abstract

To examine the convergent and discriminant validity of the scales on the DevelopmentalIndicators for the Assessment of Learning—Third Edition [DIAL-3; Mardell-Czudnowski, C., andGoldenberg, D.S. (1998). Developmental indicators for the assessment of learning—third edition.Circle Pines, MN: American Guidance Service, Inc.], exploratory and confirmatory factor analyseswere performed on randomly selected subsamples of 2012 children who attended Head Start.Exploratory factor analysis yielded three factors, labeled Verbal Ability, Nonverbal Ability, andAchievement, which collectively accounted for 56% of the variance in children's performances.Confirmatory factor analysis evaluated this empirically-derived model and the conceptually-derivedmodel of the authors of the DIAL-3 in a separate subsample of children. Although neither modelexplained the data extremely well, the empirically-derived model characterized children'sperformances better than the conceptually-derived model, e.g., CFIs= .90 and .85, RMSEAs= .07and .10, respectively. The discussion highlights an alternative conceptualization of the DIAL-3,potential uses of the factor scores, ideas for consideration during the next revision of the DIAL-3,and the need for additional validity research.© 2007 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

Keywords: Developmental screening; Preschool; Assessment

⁎ Corresponding author.E-mail address: [email protected] (J.L. Anthony).

0022-4405/$ - see front matter © 2007 Society for the Study of School Psychology. Published by Elsevier Ltd.All rights reserved.doi:10.1016/j.jsp.2007.02.003

424 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

The Developmental Indicators for the Assessment of Learning — Third Edition (DIAL-3; Mardell-Czudnowski & Goldenberg, 1998) is a widely used screener for identifyingyoung children who are at risk for school failure. One of the primary principles that guideddevelopment of the DIAL-3 and its predecessors, i.e., DIAL (Mardell & Goldenberg, 1975)and DIAL-R (Mardell-Czudnowski & Goldenberg, 1990), was to ensure there werescorable components of the screener that mapped onto all five of the developmentaldomains specified in the Individuals with Disabilities Education Act (IDEA, 1997). Thus,the scales of the screening measure were preconceived and rationally derived. Specifically,the DIAL-3 and its predecessors were designed to screen for developmental delays in motorabilities, conceptual knowledge, linguistic competence, psychosocial functioning, and self-help skills. The former three areas of development are assessed via direct assessment ofchildren, and the later two areas are assessed via parent questionnaire. Given the authorsmultidimensional conceptual framework of child development and developmentaldisabilities, they suggest that scores on the separate area scores or scales are likely to bemore useful for predicting school success and school failure than the overall score (Mardell-Czudnowski & Goldenberg, 1998).

Initial item development for all versions of the DIAL was driven by content analysis andlogical mapping of test items onto preconceived scales. The most recent item developmentand restandardization on a nationally representative sample took place in the mid 1990s.Retained, modified, and new items were evaluated by a panel of content experts andthrough the use of item response theory (IRT). Some subtests and items were dropped inaccord with recommendations of content experts and testers in pilot phases and in accordwith Rasch scaling that identified items with low discrimination. However, the IRTanalyseswere performed within area scores, or scales, and the factor analyses that helped establishunidimensionality were conducted on the six or seven subtests within a given scale. In otherwords, no factor analyses of all subtests were conducted to evaluate whether or not thesubtests empirically clustered into the three rationally derived scales. This limitation leavesthe door open for two negative implications. One, it is possible that some of the subtests onthe DIAL-3 reflect a different developmental domain than intended as much as or evenmore than the developmental domain intended to be measured. Two, it also remainspossible that some of the three area scores actually reflect the same developmental domain.Both possible implications threaten the validity of the scores that clinicians obtain andweaken the utility of the measure for differential diagnosis.

Although the technical manual of the DIAL-3 details a number of reliability and validitystudies, the manual includes little evidence concerning the discriminant validity of the threescale scores obtained through direct assessment (i.e., Motor, Language, and Concepts).Instead, the distinctiveness of these scales is implied and assumed. However, careful reviewof the validity studies that were conducted with subsamples of the standardization sampleleads one to question the discriminant validity of the Language and Concepts scales inparticular. For example in a study of seventy-six 3-, 4-, and 5-year-old children, the DIAL-3Concepts score correlated more highly with the Language score of the Early ScreeningProfiles (ESP; Harrison et al., 1990) than with the Verbal Concepts score of the ESP. Also,DIAL-3 Language had identical correlations with ESP Language, ESP Verbal Concepts,and ESP Visual Discrimination, rs= .51. In another study of seventy-one 3-, 4-, and 5-year-old children, DIAL-3 Language was found to correlate more highly with the Cognitive and

425J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

Social scales of the Battelle Developmental Inventory Screening Test (BDIST; Newborg,Stock, Wnek, Guildubaldi, & Svinicki, 1984) than with the Receptive Language scale or theExpressive Language scale of the BDIST. Finally, in a study of fifty 3-, 4-, and 5-year-oldchildren, DIAL-3 Concepts and DIAL-3 Language had nearly identical relations with all ofthe scales of the Differential Ability Scales (Elliott, 1990). All of these findings providereason to question the clustering of DIAL-3 subtests into the three rationally derived scalesof Language, Concepts, and Motor.

The purpose of the present study was to evaluate one aspect of the validity of the threearea scores obtained through direct assessment with the DIAL-3. Specifically, we wereinterested in testing the extent to which the rationally derived scales reflected the empiricalclustering of DIAL-3 subtests. Exploratory and confirmatory factor analyses wereperformed of data collected from a sample of young children that was more sizable thanthe standardization sample. The practical goal of this research was to either empiricallysupport the commonplace and recommended usage of the DIAL-3 area scores or to provideusers of the DIAL-3 (e.g., school psychologists, educational diagnosticians, specialeducators) with a more appropriate and potentially more diagnostically efficacious and/orpredictive set of scales.

Method

Participants

Data were gathered over two consecutive years as part of the annual screening process ofa Head Start agency. This agency coordinates both center-based Head Start services andpublic-school-housed Head Start services and includes 24 sites in or around a large city ofTexas. Of nearly 3500 children who received Head Start services and who were screenedduring the two year period, 2012 met inclusion criteria for this study. Specifically,participants had to be native speakers of English and between the ages of 36 and 59 monthsof age. Over 1350 children were excluded because their native language was Spanish andthey were screened with a different instrument.

The 2012 participants attended classrooms in which instruction was conducted inEnglish. All participants came from economically disadvantaged families. Fifty-onepercent of participants were male; 48% were female. Participants ranged in age from 36 to59 months (X=49, SD=6). The sample was 62% African American, 34% HispanicAmerican, 2% Caucasian, 0.5% Asian American, and 1% other or mixed ethnicity.Participants had low-average to average language skills and average motor skills, accordingto the DIAL-3 scale scores, Xs=92 and 100; SDs=12 and 14, respectively.

Measures

The DIAL-3 includes a variety of age-appropriate manipulatives and tasks. The 21subtests that form the three scales include naming picture vocabulary items; solving verbalproblems; providing personal information; articulating common objects; identifyingshapes, colors, letters, and body parts; understanding relative positions and measurementconcepts; counting; building with blocks; copying line drawings; cutting; finger play; and

426 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

gross motor activities. Each subtest included in data analyses consisted of multiple items(range=4–26 items)—the single item Writing Name subtest was excluded from analysesfor reasons described below.

Although the test yields standard scores for three scales and an overall standard score,most of the psychometric research to data concerns the test as a whole. For example, the testas a whole has good internal consistency with alphas ranging from .85–.90 for 3-to 5-year-olds. One month test–retest reliabilities for the DIAL-3 total score are also good, e.g.,rs= .84–.88. The validity of the DIAL-3 total score is supported by moderate correlationswith total scores from other developmental screeners like the Bracken Screening Test(Bracken, 1984), Brigance Preschool Screen (Brigance, 1985), Battelle Screening Test(Newborg et al., 1984), and Early Screening Profiles (Harrison et al., 1990), as detailed inthe technical manual of the DIAL-3.

Design and procedures

Children were tested within 45 days of entering the Head Start program. Most childrenwere tested at the beginning of the school year. Testing took place at children's preschoolsin a quiet location that was temporarily dedicated to testing. Testing of individual childrenrequired approximately 30 minutes and was typically completed in a single session.However, examiners were allowed to divide the testing into multiple sessions if need wasestablished on a case by case basis, in accord with standard administration procedures.

Examiners were recruited and hired through a temporary hiring agency. Only applicantswith a bachelor's degree were considered. Although there was a mix of degree types in thefinal examiner pool, the majority of individuals hired had obtained a bachelor's degree inone of the social sciences and were between the ages of 25–35. Examiners were carefullytrained by the first author and representatives from the publisher of the DIAL-3 (i.e.,American Guidance Service, Inc.). The initial phase of training with the publisher occurredduring a full-day session and consisted of a variety of training modalities. Initially,examiners were presented with a didactic overview of each scale within the DIAL-3 (e.g.,correct use of the record form, skills evaluated within a scale, item types, etc.). Followingthe didactic overview, individual items within each specific scale were demonstrated liveand/or by the publisher's videotape in a large group session. Following this overview,examiners engaged in supervised practice within small-group settings while the first authorand representatives from the publisher provided individualized feedback and instruction.Once examiners understood proper administration procedures for all items, they wereinstructed to practice with a partner until they were ready for formal certificationprocedures. Certification took place several days after the initial phase of training. The firstauthor observed each examiner administer the entire DIAL-3. Examiners were deemedcertified and ready for field work or in need of continued practice and repeating thecertification process. Once examiners began working in schools, the first author conductedunannounced observations of examiners and provided individual feedback as necessary.Additionally, the entire assessment team continued to meet biweekly to discuss issues andconcerns that might threaten the validity or integrity of the data (e.g., dealing withoppositional children, working in sites with limited space, etc.). Because there was (a) greatinvestment in professional training, (b) a high level of field oversight, (c) high testing

427J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

demands on the young children secondary to Head Start requirements for an annualprogram evaluation and the Head Start National Reporting System, and (d) budgetaryconstraints, a decision was made to forgo systematic reliability testing.

Results

Preanalysis data inspection and transformation

Preanalysis data inspection included examination of missing data, outliers, and potentialdepartures from normality and linearity. All three-year-olds were missing data for thephonological awareness subtest and the writing subtest, according to standard adminis-tration procedures. Additionally, 41% of four-year-olds performed at the floor on thephonological awareness subtest and 42% performed at the floor of the writing subtest. Also,80% of the full sample was missing data on the Rapid Color Naming subtest because theydid not know the names of colors well enough to perform the task. This missing data patternwas also in accord with standard administration. As a consequence of the large amount ofmissing data on the phonological awareness, writing, and rapid automatic naming (a.k.a.,RAN) subtests and the nonrandom pattern of missingness on these subtests, these threevariables were excluded from subsequent analysis. Further investigation revealed 21 casesmissing data on one or more of the other DIAL-3 subtests. These 21 observations wereexcluded from analysis because the sample size was sufficiently large and because thesedata appeared to be missing at random. Thus, the final sample included 1991 children.

We used a criterion z-score of 3.4, equivalent to a probability of .001, to identifyunivariate outliers. One outlier on the Letters and Sounds subtest knew the names andsounds of each of the 7 letters that were tested. Six outliers had perfect scores on theCopying subtest. Two outliers performed poorly on the Naming Objects subtest. Nineoutliers performed poorly on the Articulation subtest. Review of test packets led us to theconclusion that data from each of these outliers were valid. Therefore, outliers were retainedin analyses.

Inspection of histograms and normality statistics indicated some minor floor effects andsome minor ceiling effects on a few subtests. None of the variables included in analyses hadsubstantial skewness or kurtosis. Although transformation of variables may have improvedthe overall fits of the models tested, untransformed data were analyzed because theyreflected the nature of the performances of this population of children and becauseuntransformed data are used in calculation of the standard scores that are generally reportedfor each scale and the measure as a whole. Bivariate scatterplots were suggestive of small,positive linear relations among the DIAL-3 subtests.

Exploratory factor analysis of DIAL-3

With the aim of exploring how the DIAL-3 subtests empirically cluster, we performedexploratory factor analysis of data from a randomly selected 50% of the participants.Factorability of these data was evidenced by reliable correlations that ranged from .15–.65,ps< .001, and values for Kaiser's Measure of Sampling Adequacy that ranged from.80–.95. Exploratory factor analysis using principal axis factoring yielded three factors with

428 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

eigenvalues greater than 1.0. Factor 1 accounted for 42.5% of the variance. Factor 2accounted for an additional 8.5% of the variance, and Factor 3 accounted for an additional6% of the variance. The scree plot was consistent with a two- or three-factor solution.

Examination of the three factor solution revealed that Factor 1 had unique associationswith naming Actions, naming Body Parts, naming Objects, providing PersonalInformation, Solving (verbal) Problems, understanding relative Positions, understandingbasic Concepts, identifying Colors, and Articulation (see pattern matrix in Table 1). Giventhe task demands of these subtests, Factor 1 clearly reflected children's verbal abilities andwas labeled Verbal Ability accordingly. Factor 2 had unique associations with Copying linedrawings, Cutting, playing with Thumbs and Fingers, Building with cubes, identifyingShapes, Catching, Counting, identifying Colors, and gross motor coordination (i.e., Jump,Hop, and Skip). Factor 2 reflected children's abilities to integrate spatial, visual, and motorinformation, and so it was labeled Nonverbal Ability. Factor 3 had unique associations withLetter Knowledge, Counting knowledge, and Color knowledge, thereby reflectingchildren's scholastic achievement, and so it was labeled Achievement.

Following oblique rotation, the three factors remained moderately correlated,rs= .34–.60, and none included marker variables (see pattern and structure matrices inTable 1). Factor 3 was relatively less well identified than the other two factors because ithad unique associations with only three variables, two of these three variables loaded on

Table 1Factor loadings of DIAL-3 subtests on verbal ability, nonverbal ability, and achievement factors

DIAL-3 subtest Pattern matrix Structure matrix

Verbalability

Nonverbalability

Achievement Verbalability

Nonverbalability

Achievement

Naming actions .82 .05 .05 .78 .43 .21Naming body parts .82 .20 .19 .76 .37 .39Naming objects .80 .10 .16 .81 .52 .15Personal information .63 .05 .00 .66 .43 .23Problem solving .61 .05 .01 .64 .42 .23Positions .60 .08 .15 .70 .50 .38Basic concepts .50 .29 .11 .71 .63 .39Articulation .36 .13 .03 .45 .36 .21Color knowledge .31 .30 .31 .60 .61 .53Copying .12 .73 .25 .40 .75 .49Cutting .03 .67 .06 .45 .71 .33Thumbs and fingers .11 .59 .07 .45 .63 .20Building .07 .55 .11 .30 .55 .30Shapes .23 .41 .23 .56 .65 .47Jump, hop, and skip .29 .39 .02 .52 .56 .23Catching .09 .34 .12 .27 .35 .05Letter knowledge .23 .13 .63 .52 .51 .76Counting .20 .32 .47 .54 .61 .65

Note. n=824. Loadings of .30 and above are in bold type. The pattern matrix reflects partial correlations ofobserved variables with factors after controlling for shared variance among the factors. The structure matrix reflectszero-order correlations of observed variables with factors. Verbal Ability, Nonverbal Ability, and Achievementfactors were significantly intercorrelated after oblique rotation (rs= .34–.60).

429J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

other factors, and Factor 3 only explained 6% of the variance in children's performancesbeyond Factors 1 and 2. Nonetheless, Factor 3 was retained because it (a) had an eigenvaluegreater than 1.0, (b) it was consistent with the scree plot, (c) it was interpretable, and (d) ithad the potential to improve the prediction of children's academic success and failurebeyond what is predicted by children's general verbal and nonverbal abilities.

Confirmatory factor analysis of DIAL-3

Because exploratory factor analysis capitalizes on chance relations among variables in agiven sample, we performed confirmatory factor analysis (CFA) on the second randomlyselected half of the participants. CFA was used to evaluate two a priori models. First, wetested the utility of the model derived by exploratory techniques, which served as acrossvalidation of that model on a different sample of children from the same population.Second, we tested the utility of the model asserted by the authors of the DIAL-3. Finally, wecompared the utility of the two different models, one being empirically derived and theother being conceptually derived. CFAwas conducted using Robust Maximum Likelihoodestimation. This estimation method produces a chi-square statistic, standard errors ofestimates, and standard fit indices that are adjusted to the extent of the nonnormality in theraw data.

Our empirically-derived model was evaluated and crossvalidated by Model 1, whichspecified three intercorrelated factors. The first factor, Verbal Ability, was indexed byActions, Body Parts, Objects, Personal Information, Solving Problems, Positions,Concepts, and Articulation. The second factor, Nonverbal Ability, was indexed byCopying, Cutting, Thumbs and Fingers, Building, Shapes, Catching, and Jump, Hop, andSkip. The third factor, Achievement, was indexed by Letter Knowledge, Counting, andColors. Despite equivalent correlations with all three factors and equivalent partialcorrelations with all three factors in the EFA, Color knowledge exclusively indexedAchievement in the CFA because it helped empirically and conceptually identify theAchievement factor. None of the observed variables were allowed to crossload in the CFA,but the factors were allowed to covary. Fig. 1 completely illustrates Model 1.

The empirically-derived model yielded a significant Satorra–Bentler scaled chi-squarestatistic of 903 with 132 degrees of freedom. Because chi-square values are inflated by largesample sizes, standardized fit indices were also examined. Accordingly, Model 1 yielded aComparative Fit Index (CFI) of .90, Tucker Lewis Index (TLI) of .88, and a Root Mean-Square Error of Approximation (RMSEA) of .08 (see Table 2). These values meet criteriafor a good fitting model only when liberal rules of thumb are employed (Bentler & Bonnett,1980). More conservative criteria demand CFIs and TLIs equal to or greater than .95 andRMSEAs equal to or less than .06 to earn status as a good fitting model (Hu & Bentler,1999). The magnitude of most factor loadings ranged from good to excellent; the oneexception was the magnitude of the factor loading for Catching, which was poor (see Fig.1). All factor loadings were reliable, ps< .001.

The conceptually-derived model that is employed in common practice was evaluated byModel 2, which also specified three intercorrelated factors. The first factor, Language, wasindexed by Actions, Objects, Personal Information, Solving Problems, Letter Knowledge,and Articulation. The second factor, Concepts, was indexed by Body Parts, Colors,

Fig. 1. Confirmatory factor analysis of empirically-derived Model 1 in children attending Head Start. Factorvariances were fixed to 1.0 to identify and standardize the model. All estimated parameters were significant atp<.001. n=1015.

430 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

Positions, Concepts, Counting, and Shapes. The third factor, Motor, was indexed byCatching, Copying, Cutting, Thumbs and Fingers, Building, and Jump, Hop, and Skip. Aswith Model 1, none of the observed variables were allowed to crossload and the factorswere allowed to covary. Model 2 yielded a significant Satorra–Bentler scaled chi-squarestatistic of 1295 with 132 degrees of freedom, a CFI of .85, a TLI of .82, and a RMSEA of

Table 2Fit statistics for empirically-derived and conceptually-derived models, respectively

Model S-B χ2 df AIC CFI TLI RMSEA

1 903 132 639 .90 .88 .072 1371 132 1107 .85 .82 .10

Note. n=1015. All model fits had ps< .001. S-Bχ2 = Satorra–Benter scaled chi-square; df = degrees of freedom;AIC = Akaike information criterion; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = rootmean square error of approximation.

431J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

.10. In short, the conceptually-derived model poorly characterized children's performancesby all current standards. As such, there is no reason to detail the estimates of Model 2.

The two models evaluated with CFA were not nested, so they could not be statisticallycompared using chi-square difference tests. However, chi-square difference tests would betoo sensitive to be given much consideration in the present study anyway because of thisstudy's large sample size. Instead, comparisons of the two models' standardized relative fitsand standardized absolute fits were an appropriate alternative, and such comparisons weresimplified by the fact that the two models had the same degrees of freedom. Table 2 showsthat comparisons of the AICs, CFIs, TLIs, and RMSEAs from the two models all favoredthe empirically-derived model, i.e., Model 1, despite its marginally acceptable fit.

Multigroup confirmatory factor analysis testing measurement invariance across sex

To compare how well the two competing models generalize across sex, multigroupconfirmatory factor analyses were performed with the cross-validation sample. Measure-ment invariance across sex was examined in a series of four models, which progressed fromleast restrictive to most restrictive. First, we tested a completely unconstrained, two-groupmodel in which the factor loadings, factor correlations, and error variances were freelyestimated and were allowed to be different for boys and girls. This model was consideredthe baseline multigroup model against which subsequent, more restrictive multigroupmodels were compared. The second multigroup model constrained corresponding factorloadings to the same values for boys and girls. The third model constrained bothcorresponding factor loadings and corresponding factor correlations to be equal for boysand girls. Finally, a completely constrained multigroup model additionally restrictedcorresponding error variances to equality across sex. In all models, the factor varianceswere fixed to 1.0. Twenty-six cases were removed due to missing information on sex,leaving a group of 511 boys and a group of 478 girls.

Results of the four multigroup models that examined measurement invariance of theempirically-derived model across sex are reported in Table 3. Although the AIC indicatedthat the completely unconstrained model fit the data best, there were very little differencesamong these values of absolute fit (AICs=504 to 524). Moreover, measures of relative fitwere virtually identical for all four of the multigroup models that evaluated measurementinvariance of the empirically-derived model (TLIs= .88–.89, CFIs= .89–.90, andRMSEAs= .05). Therefore, the most stringent criteria for measurement invariance were

Table 3Fit statistics for empirically-derived multigroup models testing measurement invariance across sex

Multigroup model constraints S-B χ2 df AIC CFI TLI RMSEA

None 1032 264 504 .90 .88 .05Factor loadings 1088 282 524 .89 .88 .05Factor loadings and factor correlations 1090 285 520 .89 .89 .05Factor loadings, factor correlations, and error variances 1129 303 523 .89 .88 .05

Note. n=989. All model fits had ps< .001. S-B χ2 = Satorra–Benter scaled chi-square; df = degrees of freedom;AIC = Akaike information criterion; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = rootmean square error of approximation.

432 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

met and the completely constrained multigroup model was accepted in the interest ofparsimony. These results indicate that the empirically-derived model fit equally well forboys and girls and that all 39 of the parameter estimates were essentially identical for boysand girls.

Evaluation of measurement invariance of the conceptually-derived model across sexproceeded quite differently. The unconstrained multigroup model converged, but thestatistical software flagged one of the parameter estimates as needing further investigation.Examination of the estimates revealed a software imposed correlation of 1.0 between theConcepts factor and the Language factor for girls only. The corresponding correlation forboys was .91. To verify whether these estimates were reliable, a second multigroup modelthat explicitly constrained this correlation to 1.0 in the female group was conducted. Thismultigroup model converged with no estimation problems and yielded the same parameterestimates as that of the completely unconstrained multigroup model. A third multigroupmodel evaluated the plausibility that the correlations between Concepts and Languagemight actually be the same for boys and girls and that the observed difference between thesecorrelations may have been within the range attributable to error of estimation or samplingerror. This final multigroup model provided a noticeably poorer fit (e.g., TLI= .82 versus.85, CFI= .85 versus .87). In summary, these results indicate that the factor structure impliedby the conceptually-driven model is reliably different for boys and girls. Moreover, theconceptually-derived model actually implies different conceptual models for boys and girlsin that the Concepts and Language factors reflect different abilities in boys but reflect asingle ability in girls.

Discussion

The present study found limited evidence to support the validity of the scale structure ofthe DIAL-3. Although subtests from the Motor scale indeed empirically cluster together,subtests from the Language and Concepts scales cluster in such a fashion that they appear tomeasure constructs different than those intended by the authors of the DIAL-3 (Mardell-Czudnowski & Goldenberg, 1998). Specifically, preschool children's performances on thelater subtests reflect their verbal abilities and scholastic achievement. Although thedistinction between language and verbal ability and the distinction between concepts andachievement are subtle and may appear trivial, this is not the case from a psychometric pointof view or from a practitioner's point of view because many of the subtests on the DIAL-3scales are misspecified. That is, some of the subtests on the Language scale actually reflectchildren's achievement, and some of the subtests on the Concepts scale actually reflectchildren's verbal ability.

We used exploratory factor analysis of DIAL-3 subtest scores to find three factors thatcollectively accounted for 56% of the variance in preschool children's performances. Thesefactors were labeled Verbal Ability, Nonverbal Ability, and Achievement for reasonsdescribed below. In a separate sample of children, we used confirmatory factor analysis toevaluate this empirically-derived model and the conceptually-derived model of the authorsof the DIAL-3. Although neither model explained the data extremely well, the empirically-derived model characterized children's performances noticeably better than theconceptually-derived model. The empirically-derived model was also empirically and

433J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

conceptually more parsimonious. The empirically-derived model met strict criteria formeasurement invariance. This finding demonstrates that not only do boys and girls evincethe same pattern of relations among subtests but they evince quantitatively identicalrelations among the subtests and underlying abilities measured. In contrast, when theconceptually-derived model of the authors of the DIAL-3 was used to explain preschoolers'performances, this model yielded different empirical and conceptual models for boys andgirls. In fact, the authors' model characterized boys' performances as attributable to threeseparate abilities but characterized girls' performances as attributable to only two separateabilities, contrary to expectancy.

From our review of the literature, this appears to be the first independent, large scalestudy to examine the convergent and discriminant validity of the subtests and scales of theEnglish version of the DIAL-3. Interestingly though, the three factor structure identified inthe present study parallels the factor structure of the Spanish version of the DIAL-3(Anthony & Assel, in press). Parallel structure in the two versions of the DIAL-3 givesfurther credence to the robustness of the present findings and provides sound reason to beconcerned with the commonplace practice of reporting the scale scores obtained from theDIAL-3 manual.

With any new conceptualization of a published measure, it is important that each derivedfactor is not only empirically supported but also conceptually plausible and consistent withexpectancy based on the factor structure of other well researched measures. The itemcontent and tasks of subtests of the Verbal Ability factor of the DIAL-3 clearly resemblethose of other tests that purport to measure verbal reasoning abilities and language abilitiesin young children. Specifically, the Verbal Ability factor of the DIAL-3 includes thefollowing subtests: Actions, Body Parts, Objects, Personal Information, Solving Problems,Positions, Concepts, and Articulation. Some of these subtests parallel subtests from theStanford–Binet IV's Verbal Reasoning Area (Thorndike, Hagen, & Sattler, 1986), theStanford–Binet V's Verbal Fluid Reasoning Domain (Roid, 2003), and the WPPSI-3'sVerbal Comprehension factor (Wechsler, 2002). For example, items that inquire aboutchildren's knowledge of body parts and children's ability to describe how objects are usedcan be found on each of these verbal ability factors. Also, the Information and ProblemSolving subtests of the DIAL-3's Verbal Ability factor resemble the Information andComprehension subtests from the Verbal Comprehension factors on the WISC-IV(Wechsler, 2003) and WPPSI-3 (Wechsler, 2002). Drawing parallels among the DIAL-3Verbal Ability subtests with well regarded language tests, the Concepts and Positionssubtests are similar to items on the Preschool Language Scale—3rd and 4th Editions(Zimmerman, Steiner, & Pond, 1992, 2002) and the Clinical Evaluation of LanguageFundamentals—Preschool and its revision (Wiig, Secord, & Semel, 1992, 2004). Onecould also argue that the Objects subtest resembles the Expressive One Word PictureVocabulary Test (Brownell, 2000). Thus, the Verbal Ability factor of the DIAL-3 isessentially evaluating the verbal comprehension abilities that some intelligence theoristscharacterize as “crystallized intelligence” (Cattell, 1963; Gustaffon, 1984; Horn & Cattell,1967).

The Nonverbal Ability factor of the DIAL-3 contains the following subtests: Copying,Cutting, Thumbs and Fingers, Building, Shapes, Catching, and Jump, Hop, and Skip. TheNonverbal Ability factor includes a number of tasks that are typically used in the evaluation

434 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

of visual spatial perceptual ability. For example, Copying closely resembles the Copyingsubtest from the Stanford Binet IV (Thorndike et al., 1986) and Beery's test of Visual-Motor Integration (Beery, Buktenica, & Beery, 2004). Also, the Building subtest is verysimilar to the Block Design subtests found in the Wechsler series and the Pattern Analysissubtests found in the Stanford Binet series. Some of the subtests of the DIAL-3 NonverbalAbility factor require motoric skills commonly assessed by standardized measures of motorability (e.g., Bayley Scales of Infant Development—II and III Edition (Bayley, 1993, 2006);the Brigance Screens (Brigance, 1985, 1990), and Early Screening Profiles (Harrison et al.,1990)). In sum, the Nonverbal Ability factor contains items that evaluate the visual spatialproblem solving abilities that some intelligence theorists refer to as “fluid reasoning”,however, the Nonverbal Ability factor does have a relatively heavy emphasis on gross andfine motor control.

The Achievement factor of the English DIAL-3 includes Letters and Sounds, Counting,and Colors. The achievement factor of the Spanish adaptation of the DIAL-3 additionallyincludes Shapes, and all four subtests load cleanly on the Spanish Achievement factor (seeAnthony & Assel, in press). The item content of the Achievement factors of the English andSpanish DIAL-3 resemble that of other well regarded measures of academic achievement,keeping in mind that few achievement tests exist for use with preschool-age children. Forexample, children are asked to provide the names and sounds of letters on the Woodcock–Johnson Tests of Academic Achievement (Mather & Woodcock, 2001), the PreschoolComprehensive Test of Phonological and Print Processing (Lonigan, Wagner, & Rashotte,2002), Phonological Awareness Literacy Screening: PreK (Invernizzi, Sullivan, Meier, &Swank, 2004), and the Head Start National Reporting System. Items inquiring aboutchildren's knowledge of counting and knowledge of shapes can be found on the Test ofEarly Mathematics Abilities—3rd Edition (Ginsburg & Baroody, 2003), the Woodcock–Johnson Tests of Academic Achievement (Mather & Woodcock, 2001), and the Head StartNational Reporting System. The skills measured by the Achievement factor are routinelythe focus of published preschool curricula and researcher-developed preschool interventionprojects. Accordingly, the Achievement factor reflects children's crystallized intelligence oracquisition of this curricular material. One exciting implication related to this “new” factoris that it may improve the DIAL-3's predictive utility because it is comprised of thefoundational school-readiness skills that tend to be the best early predictors of children'sliteracy (Adams, 1990; Stevenson & Newman, 1986) and mathematics achievement(Fuson, 1988; Gelman & Meck, 1983; Wynn, 1992). A second exciting possibility is thatchildren's scores on the Achievement factor, as well as the other factors, could prove usefulto teachers when they plan instruction. This would be a significant change in educationalpractice, given that children's scores are typically used at an organizational level to identifychildren who require additional assessment rather than used by teachers to directly informinstruction.

The main findings of the present study have particularly important implications forassessment specialists and other users of the DIAL-3, which are vast in number. Thedelineation of an empirically supported, replicated, and conceptually cohesive factor modelthat differs from the scale structure of the measure should make users somewhat skeptical ofthe meaning of the publisher's scale scores. Children's performances on the DIAL-3 may bebetter explained bymodern theories of intelligence than by IDEA's (1997) multidimensional

435J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

conceptualization of developmental disabilities. For example, the findings are consistentwith the conceptualization of intelligence as including both crystallized and fluid abilities(Cattell, 1963; Gustaffon, 1984; Horn & Cattell, 1967) and with the conceptualization ofintelligence as a hierarchical arrangement with a general intelligence factor at the apex andvarious more specialized abilities, like verbal and nonverbal abilities, arrayed below(Carroll, 1993). In any respect, additional psychometric research is needed to validate thenature of the three general abilities or skill sets that are measured by the DIAL-3.

In addition to elucidating the factor structure of the DIAL-3, this study also revealedsome important findings about individual subtests. First, the Articulation and Catchingsubtests poorly indexed their respective latent abilities. There are two possible explanationsfor such findings. One, systematic variation in children's performances on these subtests iscaused by something other than the latent ability measured by the other subtests that load onthe same factor. Two, these two subtests are unreliable either under certain administrationconditions or under all administration conditions. For example, the Articulation subtestcould be (a) an unreliable measure altogether, or (b) an unreliable measure whenadministered by examiners who are not formally trained in phonetics and speech disorders,or (c) a valid and reliable measure of children's speech even though it inadequatelymeasures children's general verbal ability. Likewise the Catching subtest may reliablyreflect children's specific, latent catching ability—if there is such a thing—or it may reflecta lot of error variance introduced by examiners, such as how well examiners toss the beanbag to children. Further investigation of the reliability and validity of the Articulation andCatching subtests under standard administration procedures and under alternativeadministration procedures is probably warranted. Additionally, the authors of the DIAL-3 may want to consider weighting subtests according to their reliabilities, depending on theextent to which the authors desire to index the latent abilities measured by the DIAL-3versus the specific abilities measured by the individual subtests.

This study also revealed important findings about the phonological processing subtestsof the DIAL-3. First, the authors of the DIAL-3 should be commended for recently addingphonological processing measures, given the critical roles that phonological processingabilities play in literacy acquisition (Adams, 1990; Clarke, Hulme, & Snowling, 2005;Griffiths & Snowling, 2001; Wagner & Torgesen, 1987). However, the new phonologicalprocessing measures lack sensitivity in the lower range of ability. Specifically, the tasksused to measure phonological awareness and efficiency of phonological accessdemonstrated significant floor effects. These findings were not surprising in light ofresearch on the development of children's phonological processing (Anthony & Francis,2005; Anthony & Lonigan, 2004; Anthony et al., 2002; Anthony, Lonigan, Driscoll,Phillips, & Burgess, 2003). However, this same body of research has identified a number ofvalid and reliable tasks that measure preschoolers' phonological processing abilities.Examples of phonological awareness tasks that are sensitive to lower levels of abilityinclude multiple choice syllable blending, multiple choice onset-rime blending, and freeresponse syllable blending to name only a few (see Anthony et al., 2003; Lonigan, Burgess,& Anthony, 2000; Lonigan, Burgess, Anthony, & Barker, 1998). Similarly, rapid naming ofpictures that illustrate common objects (e.g., car, tree, ball), as opposed to rapid naming ofcolors which many preschoolers do not know yet, is a more developmentally appropriateand reliable means to measure preschooler's efficiency of phonological access (Anthony

436 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

et al., 2006; Wagner et al., 1987). In short, revisions to the phonological processing subtestsare likely to improve the predictive utility of the DIAL-3.

One limitation of the present findings is that the factor analyses explained less thandesirable amounts of covariation in children's performances. Such findings can be expectedof a developmental screener. That is, developmental screeners are designed to provide cursoryevaluations of a broad range of specific skills, rather than to provide thorough and preciseevaluation of broad abilities, as in the case of intelligence tests. Therefore, it should come asno surprise that overarching general abilities did not account for large amounts of covarianceamong subtests that are supposed to measure various, specific skills. This design feature ofdevelopmental screeners raises the issue that future research is needed to determine whetherDIAL-3 subtests with low communalities and low factor loadings are reliably measuringspecific constructs of importance or whether they are simply unreliable measures of generalabilities, as noted above. Also, the generalizability of our findings is limited to preschool-age,minority children from economically disadvantaged backgrounds. The unique demographiccharacteristics of the present sample, which differ substantially from those of thestandardization sample, may partially explain why the present findings did not conform tothe authors' presumed factor structure. However, the present results are certainly germane toassessment specialists, educators, and program directors of Head Start programs, Even Startprograms, and Title 1 state funded pre-kindergarten programs that routinely use the DIAL-3.Nonetheless, additional research is needed to replicate or falsify the present findings in a moreheterogeneous, nationally representative sample. Finally, it will be important for futureresearch to compare the predictive validities of the empirically-derived model and theconceptually-derived model in terms of how well they specifically predict verbal abilities,nonverbal abilities, various developmental domains, and academic outcomes.

In conclusion, children's performances on the DIAL-3 can be reduced to three skill setsthat differ from the developmental domains that the authors of the screener intended tomeasure. Although we believe the present findings lead one to question whether thecommonplace practice of reporting DIAL-3 scale scores is optimal, we do not believe thepresent findings call into question the utility of the DIAL-3 as a development screener. Onthe contrary, the underlying factors of the DIAL-3 differentiate abilities and competenciesthat (a) reflect areas of potential developmental delay, (b) may be the focus of earlyintervention, (c) indicate how much children have learned through formal and informalinstruction, and (d) are likely to be predictive of later school success. Thus, there is noreason for the DIAL-3 to become any less relevant to psychologists, educators, programdirectors, or policy makers. Instead, if additional research confirms the present findings,then perhaps what should change is the nature of the scoring and reporting of children'sperformances, given that use of factor scores may improve identification of children at riskfor school failure and may better inform instructional planning.

References

Adams, M. J. (1990). Learning to read: Thinking and learning about print. Cambridge, MA: MIT Press.Anthony, J. L. and Assel, M. A. (in press). A first look at the validity of the Spanish version of the DIAL-3. Journal

of Psychoeducational Assessment.Anthony, J. L., & Francis, D. (2005). Development of phonological awareness. Current Directions in

Psychological Science, 14, 255−259.

437J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

Anthony, J. L., & Lonigan, C. J. (2004). The nature of phonological sensitivity: Converging evidence from fourstudies of preschool and early grade-school children. Journal of Educational Psychology, 96, 43−55.

Anthony, J. L., Lonigan, C. J., Burgess, S. R., Driscoll, K., Phillips, B. M., & Bloomfield, B. G. (2002). Structureof preschool phonological sensitivity: Overlapping sensitivity to rhyme, words, syllables, and phonemes.Journal of Experimental Child Psychology, 82, 65−92.

Anthony, J. L., Lonigan, C. J., Driscoll, K., Phillips, B. M., & Burgess, S. R. (2003). Preschool phonologicalsensitivity: A quasi-parallel progression of word structure units and cognitive operations. Reading ResearchQuarterly, 38, 470−487.

Anthony, J. L., Williams, J. M., McDonald, R., Corbitt-Shindler, D., Carlson, C. D., & Francis, D. J. (2006).Phonological processing and emergent literacy in Spanish speaking preschool children. Annals of Dyslexia,56, 239−270.

Bayley, N. (1993). Bayley scales of infant development, second edition: Manual. New York: PsychologicalCorporation.

Bayley, N. (2006). Bayley scales of infant and toddler development, third edition: Administration manual. SanAntonio, TX: Harcourt Assessment, Inc.

Beery, K. E., Buktenica, N. A., & Beery, N. A. (2004). The Beery-Buktenica developmental test of visual motorintegration-fifth edition. NCS Pearson, Inc.

Bentler, P., & Bonnett, D. (1980). Significance tests and goodness of fit in the analysis of covariance structures.Psychological Bulletin, 88, 588−606.

Bracken, B. A. (1984). Bracken basic concept scale. San Antonio, TX: Psychological Corporation.Brigance, A. H. (1985). Brigance preschool screen. North Billerica, MA: Curriculum Associates.Brigance, A. H. (1990). Brigance early preschool screen. North Billerica, MA: Curriculum Associates.Brownell, R. (2000). Expressive one-word picture vocabulary test, 3rd Ed. Novato, CA: Academic Therapy

Publications.Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, England:

University of Cambridge Press.Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational

Psychology, 54, 1−22.Clarke, P., Hulme, C., & Snowling, M. (2005). Individual differences in RAN and reading: A response timing

analysis. Journal of Research in Reading, 28, 73−86.Elliott, C. D. (1990). DAS administration and scoring manual. San Antonio: The Psychological Corporation.Fuson, K. C. (1988). Children's counting and concepts of number. New York: Springer-Verlag.Gelman, R., & Meck, E. (1983). Preschoolers' counting: Principles before skill. Cognition, 13, 343−360.Ginsburg, H. P., & Baroody, A. J. (2003). Test of early mathematics ability, Third edition Austin, TX: Pro-ed,

Inc.Griffiths, Y. M., & Snowling, M. J. (2001). Auditory word identification and phonological skills in dyslexic and

average readers. Applied Psycholinguistics, 22, 419−439.Gustaffon, J. E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8, 179−203.Harrison, P. L., Kaufman, A. S., Kaufman, N. L., Bruininks, R. H., Rynders, J., Ilmer, S., et al. (1990). AGS early

screening profiles. Journal of Psychoeducational Assessment, 13, 101−104.Horn, J. L., & Cattell, R. B. (1967). Age differences in fluid and crystallized intelligence. Acta Psychologica, 26,

107−129.Hu, L. T, & Bentler, P. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria

versus new alternatives. Structural Equation Modeling, 6(1), 1−55.Individuals with Disabilities Education Act, Pub.L. No. 105–117, 111 Stat. 23–157 (1997).Invernizzi, M., Sullivan, A., Meier, J., & Swank, L. (2004). Phonological awareness literacy system: PreK

teacher's manual. Richmond, VA: University of Virginia.Lonigan, C. J., Burgess, S. R., & Anthony, J. L. (2000). Development of emergent literacy and early reading skills

in preschool children: Evidence from a latent variable longitudinal study. Developmental Psychology, 36,596−613.

Lonigan, C. J., Burgess, S. R., Anthony, J. L., & Barker, T. A. (1998). Development of phonological awareness intwo- to five-year-old children. Journal of Educational Psychology, 90, 294−311.

Lonigan, C., Wagner, J., & Rashotte, C. (2002). The preschool comprehensive test of phonological and printprocessing. : Florida State University.

438 J.L. Anthony et al. / Journal of School Psychology 45 (2007) 423–438

Mardell, C., & Goldenberg, D. (1975). Developmental indicators for the assessment of learning (DIAL). Edison,NJ: Childcraft Educational Corporation.

Mardell-Czudnowski, C., & Goldenberg, D. (1990). Developmental indicators for the assessment of learning—Revised edition. Circle Pines, MN: American Guidance Service, Inc.

Mardell-Czudnowski, C., & Goldenberg, D. S. (1998). Developmental indicators for the assessment of learning—Third edition. Circle Pines, MN: American Guidance Service, Inc.

Mather, N., & Woodcock, R. W. (2001). Examiner's manual. Woodcock–Johnson III tests of achievement. Itasca,IL: Riverside Publishing.

Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Battelle developmental inventory.Chicago: Riverside.

Roid, G. H. (2003). Stanford–Binet intelligence scales, fifth edition, examiner's manual. Itasca, IL: Riverside.Stevenson, H. W., & Newman, R. S. (1986). Long-term prediction of achievement and attitudes in mathematics

and reading. Child Development, 57, 646−659.Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). Guide for administering and scoring the Stanford–Binet

intelligence scale, Fourth edition Chicago: Riverside Publishing.Wagner, R. K., Balthazor, M., Hurley, S., Morgan, S., Rashotte, C. A., Shaner, R., et al. (1987). The nature of

prereaders' phonological-processing abilities. Cognitive Development, 2, 355−373.Wagner, R. K., & Torgesen, J. K. (1987). The natural of phonological processing and its causal role in the

acquisition of reading skills. Psychological Bulletin, 101, 192−212.Wechsler, D. (2002). WPPSI-III administration and scoring manual. San Antonio, TX: The Psychological

Corporation.Wechsler, D. (2003). Wechsler intelligence scale for children, Fourth edition San Antonio, TX: Harcourt

Assessment, Inc.Wiig, E., Secord, W., & Semel, E. (1992). Clinical evaluation of language fundamentals—Preschool. New York:

The Psychological Corporation.Wiig, E., Secord, W., & Semel, E. (2004). Clinical evaluation of language fundamentals—Preschool—Second

edition. San Antonio, TX: Harcourt Assessment, Inc.Wynn, K. (1992). Children's understanding of counting. Cognition, 36, 155−193.Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (1992). Preschool language scale, Third edition San Antonio, TX:

The Psychological Corporation.Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (2002). Preschool language scale—Fourth Edition. San Antonio,

TX: The Psychological Corporation.