Examining the structure of emotional intelligence at the item level: New perspectives, new...
Transcript of Examining the structure of emotional intelligence at the item level: New perspectives, new...
This article was downloaded by: [University of Colorado at Boulder Libraries]On: 19 December 2012, At: 13:48Publisher: Psychology PressInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Cognition & EmotionPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/pcem20
Examining the structure of emotionalintelligence at the item level: Newperspectives, new conclusionsAndrew Maul aa Unit for Quantitative Analysis in Education, University of Oslo, Oslo,NorwayVersion of record first published: 26 Jul 2011.
To cite this article: Andrew Maul (2012): Examining the structure of emotional intelligence at the item level:New perspectives, new conclusions, Cognition & Emotion, 26:3, 503-520
To link to this article: http://dx.doi.org/10.1080/02699931.2011.588690
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that thecontents will be complete or accurate or up to date. The accuracy of any instructions, formulae,and drug doses should be independently verified with primary sources. The publisher shall notbe liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever orhowsoever caused arising directly or indirectly in connection with or arising out of the use of thismaterial.
Examining the structure of emotional intelligence at theitem level: New perspectives, new conclusions
Andrew Maul
Unit for Quantitative Analysis in Education, University of Oslo, Oslo, Norway
Despite twenty years of research, many unknowns remain regarding the Mayer�Salovey (e.g., 1997)model of emotional intelligence (EI) and the validity of tests that have been designed to measure it.Evidence relevant to the internal structure of EI has come mainly from factor-analytic studies of theMSCEIT and the MEIS, utilising parcelled task scores rather than individual test items. Thisapproach has several deficiencies: in addition to the loss of item-level information, it results in aninsufficient number of observed variables per factor and an inability to separate structural sources oflocal item dependence (i.e., method variance) from construct-related variance. The present study(N�707) employed multidimensional item response modelling to investigate the dimensionalstructure of the MSCEIT, at the item level, for the first time. It is shown that item format and thespecific choice of task explain far more of the variance in response patterns than does thehypothesised dimensional structure of EI, to the point that there is no empirical reason to prefer ahigher-dimensional model of EI over a unidimensional model. It is argued that the advantage of anitem-level perspective can be fundamental, rather than merely incremental.
Keywords: Emotional intelligence; Multidimensional item response theory; Local dependence;Facets; Testlets.
Many unknowns remain about the psychological
construct of emotional intelligence (EI) as originally
defined by Mayer and Salovey (e.g., 1997). Empirical
evidence concerning the nature and structure of EI
and its relationship to other constructs and outcomes
has primarily come from the Mayer�Salovey�Caruso
Emotional Intelligence Test (MSCEIT V.2; Mayer,
Salovey, & Caruso, 2002), and its predecessor, the
Multifactor Emotional Intelligence Scale (MEIS;
Mayer, Caruso, & Salovey, 1999). One issue of
particular importance to construct validity is whether
the internal structure of tests designed to measure the
construct conform to theory-based expectations;
empirical evidence concerning this issue has come
from factor-analytic studies of the MEIS (Ciarrochi,
Chan, & Caputi, 2000; Mayer et al., 1999; Roberts,
Zeidner, & Matthews, 2001) and the MSCEIT
(Day & Carroll, 2004; Gignac, 2005; Keele & Bell,
2008; Maul, 2011; Mayer, Salovey, Caruso, &
Sitarenios, 2003; Palmer, Gignac, Manocha, &
Correspondence should be addressed to: Andrew Maul, Unit for Quantitative Analysis in Education, University of Oslo,
Postboks 1099 Blindern, Oslo, 0317 Norway. E-mail: [email protected]
COGNITION AND EMOTION
2012, 26 (3), 503�520
503# 2012 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business
http://www.psypress.com/cogemotion http://dx.doi.org/10.1080/02699931.2011.588690
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
Stough, 2005; Roberts et al., 2006; Rode et al.,2008; Rossen, Kranzler, & Algina, 2008), investi-gating whether the observed associations betweenparts of these tests conform to the expectations ofthe theory of EI. However, conclusions from thesestudies have been equivocal and at times difficult tointerpret, and certain analytic choices may havecontributed to this lack of clarity. In particular,these studies have used aggregated (‘‘parcelled’’) taskscores rather than individual items as observedvariables; this approach leads to a significant lossof information (on the MSCEIT this reduces thenumber of observed variables from 122 to 8), aninsufficient number of observed variables relative tothe number of hypothesised factors, and an inabilityto determine to what extent variation in theresponses to individual items is driven by theparticular choice of task rather than the EIconstruct.
The present study sought to shed new lighton the internal structure of the MSCEIT byemploying item-level analyses. After a shortintroduction to the Mayer�Salovey EI modeland the MSCEIT, past studies of this test’sfactor structure are briefly reviewed, with specialattention to key analytic limitations of thesestudies. Next, a short conceptual overview ofthis study’s analytic approach is given. Followingthis, results from a new study (N�707) arepresented in which sources of method-relatedvariance in response patterns to the MSCEITare explicitly modelled. It is concluded thatwhen the MSCEIT is modelled at the itemlevel, and task- and item-format-related localitem dependence is taken into account, there isno reason to prefer higher-dimensional modelsto a unidimensional model of EI. Implicationsare discussed both for the field of EIresearch and for analytic choices made in theinvestigation of the structure of psychologicalvariables.
A brief overview of the MSCEIT
Mayer and Salovey (1997; Mayer et al., 2002)articulated EI as a four-dimensional constructcontaining the abilities to: (1) accurately perceive
and express emotions (Perceiving Emotions); (2)use emotions effectively in problem solving (UsingEmotions); (3) understand the dynamic nature ofemotions (Understanding Emotions); and (4) man-age one’s own and others’ emotions (ManagingEmotions). Since it was introduced in 2002, theMSCEIT has been the flagship test of this EImodel. The MSCEIT contains 141 items (ofwhich 122 are scored) divided into eight tasks,two of which are designed to measure each of theproposed four branches of EI. Each task containsbetween 9 and 30 items, and shares a commontype of prompt and stimulus material and acommon response format. In some cases, the tasksare comprised of several small groups of itemswith common prompts. For example, the Picturestask, which is designed to measure the PerceivingEmotions branch of EI, requires respondents toexamine six photographs of abstract art or land-scape scenes and rate the extent to which fivedifferent emotions (e.g., happiness, surprise, fear)are present on each picture on 5-point ratingscales ranging from ‘‘definitely not present’’ to‘‘definitely present’’.
MSCEIT test items are scored via a techniqueknown as consensus-based scoring. Rather thanhaving response options deliberately written to bemore or less correct (i.e., to reflect lower or higheramounts of EI), each response is scored accordingto the degree to which it corresponds with theconsensus response from a calibration group ofrespondents. In the general consensus approach,answers for the MSCEIT were determined on thebasis of response patterns from a large (N�5,000)standardisation sample from English-speakingcountries (Mayer et al., 2003; Mayer, Roberts,& Barsade, 2008). For instance, when respondentsare asked to rate the intensity of a particularemotion in a piece of artwork, if the five optionsrelating to the emotion are endorsed by 10%, 20%,20%, 40%, and 10% of the sample respectively, arespondent endorsing option 3 would be creditedwith a score of .20 while a respondent endorsingoption 4 would be credited with a score of .40.Respondents’ scores on each task, branch, and areaare the average of their weighted scores for theitems that make up that section of the test. The
MAUL
504 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
expert consensus scoring method is similar,although the calibration group consisted of 21volunteer members of the International Societyfor Research on Emotion (ISRE) at their con-ference in 2000. Scores derived from expert andconsensus weights appear to correlate so highly(e.g., r�.96; Mayer et al., 2003) that they can beregarded as essentially a single scoring method.
This scoring method has been one of the largestimpediments to item-level analysis of the MSCEIT.One consequence of this method is that individualitems are worth different amounts in the calculationof a total score; specifically, easy items are worthmuch more than difficult items. For example, in thecase of a very easy item, the five possible responseoptions might be endorsed by 90%, 5%, 3%, 1%,and 1% of the sample, and selecting an incorrectresponse would results in a loss of .85 or more in thefinal tally. On a more difficult item, in which theresponse options were endorsed by 25%, 21%, 20%,19%, and 15% of the sample, selection of anincorrect option can result in a loss of only up to.10 in one’s final score. Thus item difficulty isdirectly and negatively associated with item discri-mination, with the most difficult items having thelowest discrimination and the lowest impact on totaltest score.1 The present study did not seek to explorethe philosophical foundations of this approach toscoring, but merely to be consistent with the testauthors’ intentions while modelling the test at theitem level.
Empirical investigations of the MSCEIT
Several studies have employed factor2-analytic tech-niques to examine the internal structure of theMSCEIT and the extent to which it conforms tothe expectations of the Mayer and Salovey model ofEI. These studies share in common that they haveproceeded by (1) taking the average of respondents’weighted scores on the items within each of the
eight tasks to produce parcelled task scores, asdescribed above, and then (2) submitting a correla-tion or covariance matrix of the task scores for factoranalysis.
Results from these studies have been varied.Studies by Mayer et al. (2003) and Day and Carroll(2004) reported that the factor structure of theMSCEIT was well-described by a one-factor,‘‘EI(g)’’, solution, a two-factor, ‘‘Experiential’’ and‘‘Strategic’’, solution, and a four-factor solution, withtwo tasks each loading onto Perceiving, Using,Understanding, and Managing Emotions factors.However, Gignac (2005) and Palmer et al. (2005),through reanalysis of the Mayer et al. (2003)covariance matrices and through new data collec-tion, found one- and two-factor models to be illfitting, and a four-factor model to be implausibledue to a high (r�.90) correlation between theUsing and Managing Emotions branches. Theyhypothesised that the Using Emotions factor didnot ‘‘measure any unique construct-related variance,independent of a general factor and the otherbranches’’ (p. 299), which was supported by resultsfrom nested factor models. Palmer and colleaguesalso noted problems with using only two indicatorsper factor, identifying this as a ‘‘fundamentallimitation’’ (p. 286).
Studies by Roberts et al. (2006), Rode et al.(2008), Rosen et al. (2008), Keele and Bell (2008),and Maul (2011) each used somewhat different setsof factor models but came to similar final conclu-sions, finding only partial support for the proposedfactor structure of the MSCEIT and largely findingthe Using Emotions branch non-identifiable.A meta-analysis by Fan, Jackson, Yang, Tang, andZhang (2010) looked across factor-analytic studiesof the MSCEIT and found that a three-factormodel, with Perceiving and Using Emotions com-bining to form a single factor, was the model mostconsistently supported in the literature, but again
1 Arguably, the use of the words ‘‘easy’’ and ‘‘difficult,’’ while consistent with modern measurement theory, may not be entirely
appropriate here. Scoring items by the use of consensus weights implies that there may be items on which multiple answers are close
to being equally as good. What is termed a more ‘‘difficult’’ item in this paper may be alternatively seen as an item for which there is
simply a lack of clear consensus, and therefore an item that does not discriminate as well between subjects.2 The terms ‘‘factor’’ and ‘‘dimension’’ are taken here to be conceptually synonymous, albeit arising from distinct research
traditions.
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 505
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
noted only partial support for the existence of ageneral EI factor.
Thus, only equivocal support exists for thestructural integrity of the four-branch model of EIproposed by Mayer and Salovey, and the number ofindicators used per factor has been a major analyticlimitation. The actual construction of these indicatorvariables is another possible (and less-discussed)limitation. Each of these points is developed in turn.
The number of indicators per factor. Models withtwo indicators for each of multiple correlatedfactors are technically identified, but are moreprone to problems such as failure to converge,unstable or inadmissible solutions, and interpre-tive difficulties (Bentler & Jamshidian, 1994;Gorsuch, 1997; Worthke, 1993). Palmer andcolleagues (2005) and Wilhelm (2005) have notedthis issue in the context of EI research and havesuggested as a solution that the MSCEIT havemore tasks. However, it is worth noting that aninadequate number of tasks for factor analysis isan analytic concern, while the number of differenttasks used to measure a construct is a concernabout appropriate domain coverage. Moreover,item-based approaches, such as are given in thisstudy, can entirely remove the (analytic) lowerlimit on tasks, so long as there is a sufficientnumber of test items for each branch.
The use of averaged task scores as indicatorvariables. As noted earlier, each of these factor-analytic studies has used the averaged scores ofitems within tasks as indicator variables. This ispotentially problematic for a number of reasons.
First, at a conceptual level, aggregating scores ongroups of individual items can be seen as assumingpart of the very thing that multidimensional analysesare meant to investigate. The use of parcels of itemsas indicator variables requires that the items in eachparcel are unidimensional (that is, known tomeasure a single construct). Bandalos (2001) and
Bandalos and Finney (2002) noted that summing oraveraging scores on groups of items in this mannercan mask or distort a multidimensional structure insuch a way that, as Kline (2005) summarises,‘‘a seriously misspecified model may nonetheless fitthe data reasonably well’’ (p. 198), and furtherargued that such parcelling should not be part ofany analysis aimed at testing the dimensionalstructure of a test.
Second, organising items into parcels loses alarge amount of information. The number ofdistinct pieces of information about a respondent’semotional intelligence is reduced from 122 downto 8. This obviously contributes to the problemwith an inadequate number of observed variablesper factor noted previously. Further, such anapproach will not be helpful in identifying specificitems that may be unusual or problematic, whichis often a primary purpose of test analysis.
Third, there appears to be considerable varia-tion in the extent to which items within each taskform relatively homogenous groups, or, statedalternatively, there appears to be variation in themagnitude of the within-task dependence amongitem responses. The studies cited in the lastsection reported estimated internal consistencycoefficients for MSCEIT tasks ranging from .80down to the .30s.3 Averaging scores on each tasktogether ignores this variability, treating all tasksas essentially internally homogeneous, and prohi-bits investigation of the degree to which variationin item responses can be explained by specificchoice of task (rather than by the hypothesiseddimensions of emotional intelligence).
Modelling the MSCEIT at the item level
The main difference between the analytic approachemployed in the present study and those employedby prior studies is that the present study sought tomodel the relationship between individualMSCEIT item responses (rather than aggregatedtask scores) and the hypothesised dimensions of
3 A generalisability theory study by Føllesdal and Hagtvet (2009) provided evidence that many of these reported coefficients are
likely to be overestimates of the actual internal consistencies of the tasks, since many tasks contain items that are further grouped
into clusters around common prompts or stimuli, and the additional dependence imposed by these subclusters is not taken into
account in commonly used estimates of internal consistency (e.g., split-half or Cronbach’s alpha coefficients).
MAUL
506 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
emotional intelligence. Technical details of the
multidimensional item response theory (MIRT)
models employed in this study and accompanying
references are given in the appendix. Briefly, one can
think of MIRT models as confirmatory factor-
analytic (CFA) models with an added logit link
function, which makes the models appropriate
for ordinally-(rather than continuously-)scaled
observed variables (and thus permits the use of
individual test items as observed variables).Modelling the MSCEIT at the item level
opens up new analytic opportunities and chal-
lenges. The organisation of items into tasks
violates the measurement modelling requirement
of local independence; specifically, items within a
given task are likely to share much more in
common with one another than items sampled
at random from the target domain of the EI
construct, and, therefore, there is likely to be co-
variation in subjects’ responses to these items in
excess of what is expected given only that they are
related to a common underlying construct.4
It is well-established that ignoring local de-
pendence can lead to overestimation of measure-
ment precision (reflected, for example, by inflated
reliability estimates; e.g., Sireci, Thissen, &
Wainer, 1991). Local dependence may affect
multidimensional investigations as well, especially
when sources of local dependence exist primarily
within hypothesised dimensions. Within the
Perceiving Emotions branch, for example, there
are four pictures of faces and six pictures of
landscapes with five items each, for a total of 50
items; ignoring the local dependence caused by
common task type and common prompt treats
responses to these 50 items as independent
observations conditional only on the Perceiving
Emotions dimension, and thus risks misunder-
standing these sources of local dependence as
dependence due to a common ability, leading to a
biased view of the test’s dimensional structure.
Thus, as is described in the appendix, a multi-dimensional extension of Wang and Wilson’s(2005b) random-effects facet model was employedto simultaneously model the dimensional structureof the test and the local dependence among itemsof a common task in Models 5 and onward, asdescribed in the following section. Figures 1and 2, and Figure 3, which correspond to Models5, 6, and 7, visually represent how this wasaccomplished.
Introduction to the present study
The present study was designed to re-analyse theinternal structure of the MSCEIT at the item
level, modelling structural sources of inter-item
dependence. In this way, the present study sought
to overcome the two major methodological lim-
itations of previous investigations of the dimen-
sionality of the MSCEIT noted above.
Additionally, this study provided an opportunity
to examine the extent to which within-task
variance could mask or distort assessments of the
dimensionality of the MSCEIT.
METHOD
Participants
Subjects for this study were drawn from twosecondary sources. Data collected by the authorfrom another study (N�241; Maul, in press)were combined with data collected by Roberts andcolleagues in Australia (N�517; personal com-munication, 2006), for a total sample size ofN�758. In the former case, subjects were drawnfrom the student population at the University ofCalifornia at Berkeley and from the generalcommunity in Berkeley through flyers and websiterecruiting. Subjects (149 female, 73 male, 19unreported) ranged in age from 18 to 71 years(M�29.6, SD�11.9). Of the 223 subjects who
4 In contrast to how the totality of EI is held to be comprised of and defined by its four component dimensions, the tasks are not
regarded as representing the whole of the dimensions they target; rather, they can be regarded as sampled from a domain of tasks
appropriate to that dimension, similar to how individual items are often regarded as sampled from a relevant domain of possible
items.
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 507
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
reported their ethnicity, 44% were CaucasianAmerican, 34% were Asian American, 11% wereAfrican American, 7% were Latin American, and5% were other or mixed. A primary languageother than English was reported by 20%. In termsof education, 5% reported having a high schooldiploma or less, 41% had completed high schoolbut not (yet) received a 4-year degree, 38% hadreceived a 4-year degree, and 17% had a graduatedegree. The Roberts data (N�517) were col-lected from undergraduate students at the Uni-versity of Sydney, of whom 355 were female and114 were male (with 48 unreported), and whoseages ranged from 18 to 59 years (M�20.48,SD�5.02).
Mayer et al. (2002) considered participants’responses to be invalid if responses to 10% ormore of a given subscale’s items were missing; 18cases were excluded on these grounds. An addi-tional 29 subjects were excluded due to a sig-nificant amount of missing data within aparticular task. All further results are for theremaining 707 subjects. Missing data in theremaining cases were minimal (around 1%) andwere left as missing in the maximum likelihood-based analyses described here.
Materials
All subjects completed the MSCEIT ResearchVersion 2.0 in accordance with the guidelines forremote administration given by Mayer et al.(2002).
Scoring of the MSCEIT
Multi-Health Systems (MHS) owns and keepsproprietary the consensus and expert scoring keysfor the MSCEIT, and does not provide item-levelscoring services. Thus the suggestion of the testauthors to develop local consensus scoring normswas followed.
Nineteen test items are regularly excluded byMHS in analyses of the MSCEIT (Mayer et al.,2002, p. 63). The first author of the MSCEITprovided a list of these items for the purpose ofthis study (John Mayer, personal communication,8 April 2010). These items were excluded fromanalysis.
Subject response patterns were examined foreach of the remaining 122 items and translatedinto a scoring key. First, the response distributionof the full sample was examined for the five possibleresponses to each item. The response receiving thesmallest number of endorsements from the fullsample was coded as ‘‘0’’. Next, the category receivingthe next-highest number of endorsements wasidentified, and a difference-in-proportion test wasconducted between the two adjacent categories, todetermine if the proportion of subjects selectingthe second least-popular category was significantlyhigher than the proportion of subjects selectingthe lowest category. If the proportion of subjectsin the two groups were significantly different atthe a�.0055 level, the second least-popularcategory was assigned a score of ‘‘1’’; otherwisethe category was also assigned a score of ‘‘0’’. Thisprocedure commonly led to scoring categoriestogether when the difference in proportion select-ing the two adjacent categories was less than 7%of the total sample. However, a very small numberof subjects in a given category could presentproblems with estimation. Thus, if fewer thanten subjects out of the full sample selected theleast-popular category, both it and the secondleast-popular category were assigned a score of‘‘0’’.
This procedure was then repeated, comparingthe second least- and third least-popular cate-gories, and so on, assigning a consecutive integerto each category. Thus items could have differentnumbers of categories: an item whose optionswere selected by 80%, 6%, 6%, 5%, and 3% of the
5 Given the very large number of comparisons (122 items with five categories each), a fairly stringent alpha level is appropriate;
on the other hand, the strict Bonferroni level of .05/408�.00012 would result in too many items that cannot be scored. It should
also be noted that all analyses described in this paper were re-run with categories assigned using both more (.0001) and less (.01 and
.05) stringent alpha levels. Although these models obviously had different numbers of model parameters and different values of
these parameters, the ordering of fit of the models was identical to what is described in this paper.
MAUL
508 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
sample, for example, would be scored dichoto-
mously as [1, 0, 0, 0, 0], whereas an item whose
options were endorsed by 40%, 30%, 15%, 5%,
and 5%, of the sample would be scored as [3, 2, 1,
0, 0]. There were two test items for which the five
response options were selected nearly equally
often and thus could not be scored, and were
removed from further analysis. Thus this scoring
procedure resulted in 120 items, of which 34 had
two categories, 73 had three categories, and 13
had four categories.As discussed previously, the scoring method of
this test specifies different discriminations for
each item based on the degree of consensus.
Thus the contents of the ai vector from Equation
3 in the appendix were manipulated to reflect this
contention. The response pattern for each item
was examined again. The relevant element of ai
was then fixed to reflect the average difference in
popularity between the adjacent categories of the
item. Thus the two examples in the last paragraph
were given discrimination values of .74 and .12,
respectively, which reflects the contention that the
first item is more discriminating (and has a higher
impact on total score) than the second item. This
allowed MSCEIT items to be modelled in a
manner congruent with the test design.
As a cross-check on the fidelity of thistranslation of the scoring method to the officialMSCIET scoring method (Mayer et al., 2002),EAP person estimates generated from a simpleunidimensional IRT model (Model 1, as de-scribed below) were correlated with scores derivedin the traditional manner. The correlation be-tween these sets of scores was r�.96, which is inthe same order as the correlation between theexpert and consensus scoring methods (r�.96;Mayer et al., 2003) and helps assuage concernsthat the adaptation of the scoring method em-ployed here could be substantively different fromthe official method.
Statistical analyses
Analyses were conducted using ACER Con-Quest 2.0 (Wu, Adams, Wilson, & Haldane,2007). Differences in the final value of �2times the log-likelihood6 for nested models arecompared using a likelihood ratio test; values ofthe Bayes Information Criteria (BIC; Raftery,1993) are also presented. Two runs using aMonte Carlo approach to estimation were usedfor all models. On the first run, 400 nodes wereused for integration and estimation was set toterminate when the maximum change in para-
Figure 1. Model 5.
6 The symbol DLLðnÞ can be read as ‘‘the difference in �2(log-likelihood) values of the two models being compared, with n
more parameters in the more-complex model’’.
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 509
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
meter estimates became less than .01. Theparameter estimates from this run were thenused as starting values in a second run, in which
4,000 nodes were used estimation was set toterminate when the maximum change inparameters became less than .0001. The meansof all random effects were constrained to be
zero, although in some cases these randomeffects are centred around a (non-zero) esti-mated mean fixed effect of a facet.
RESULTS
The first set of models fit to the data ignore local
dependence among items using standard one-,
two-, and four-dimensional models, as well as a
three-dimensional model described below. The
second set duplicates these models but adds in
fixed and random effects for task format
(i.e., facet). The third set additionally includes
random effects for testlets. Figures 1 and 2, and 3
Figure 2. Model 6.
Figure 3. Model 7.
MAUL
510 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
correspond to Models 5, 6, and 7, respectively,described below, but also correspond to Models 1,2, and 3 if one removes the random effects of tasks(the g parameters).
Test-, area-, and branch-level models
The first three models closely replicate the spiritof the confirmatory factor models fit by Mayeret al. (2003), upon which the original empiricalarguments concerning the dimensionality of theEI construct measured by the MSCEIT werebuilt.
Model 1 treats the MSCEIT as a test of asingle latent variable, here termed EI(g). Model 2treats the MSCEIT as a test of two correlateddimensions, Experiential EI and Strategic EI.Model 3 treats the MSCEIT as a test of fourcorrelated dimensions: Perceiving, Using, Under-standing, and Managing Emotions.
An examination of item parameter-level fitstatistics for these three models, as well as allother models reported in this paper, revealed noitems parameters that appeared to severely misfitthe model.7 Infit (weighted) meansquare statisticsfor the 219 item parameters in Model 1, forexample, ranged from 0.94 to 1.14, with theexception of a single parameter with an infitmeansquare of 1.28. In no models were therevalues below 0.75 or above 1.33. It must be noted,however, that item-parameter level fit statistics arenot necessarily sensitive to systematic misfit due to
unmodelled dimensionality or sources of local
dependence.As can be seen in Table 1, there was successive
improvement in overall model fit from Model 1 toModel 3 (Model 2 compared to Model1: DLLð2Þ�1,177, pB.001; Model 3 compared
to Model 2: DLLð7Þ�611, pB.001). Thus thenull hypotheses that a unidimensional modeldescribed the data as well as a two-dimensionalmodel, and that a two-dimensional model de-
scribed the data as well as the four-dimensionalmodel, were both rejected. This seems to provideinitial support for the construct structure hypothe-
sised by Mayer and colleagues.In the two-dimensional model, the Experien-
tial and Strategic areas correlated with oneanother at r�.48. The estimated correlationsbetween the dimensions in the four-dimensional
model are given in Table 2.Model 4 represents the three-factor model
preferred by Fan et al. (2010) in theirmeta-analysis of factor-analytic studies of theMSCEIT, with Perceiving and Using Emotions
Table 1. Likelihood and reliability for one-, two-, four-, and three-dimensional models
Model
Final likelihood; estimated
parameters
Person-separation
(EAP) reliability
Chi-square test vs.
previous model BIC
Model 1: Unidimensional (general EI) 123,794; 220 .863 * 125,237
Model 2: Two-dimensional (experiential and
strategic EI)
122,607; 222 .849; .779; r�.477 Change: 1,187;
df: 2; pB.01
124,063
Model 3: Four-dimensional (perceiving, using,
understanding, and managing)
121,996; 229 .853; .717; .777; .759 Change: 611; df: 7;
pB.01
123,498
Model 4: Three-dimensional (experiential,
understanding, and managing)
122,473; 225 .841; .755; .813 * 123,949
7 There are no absolute standards for what constitutes a misfitting item. Adams and Khoo (1996) suggested a rule of thumb of
flagging items with infit meansquares less than 0.75 or greater than 1.33. Bond and Fox (2007) gave several rules of thumb for
acceptable values in different situations, the most conservative of which (for high-stakes educational tests) is 0.8 to 1.2. Fit statistics
are sensitive to sample size, but the present sample (n�707) is not unreasonably large or small.
Table 2. Correlations between dimensions for Model 3
Perceiving Using Understanding Managing
Perceiving 1
Using .561 1
Understanding .325 .459 1
Managing .294 .583 .700 1
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 511
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
combining to form a single factor. This modelalso fits better than Model 2 (DLLð3Þ�134,pB.001); however, Model 3 (four dimensions)fits better than the three-factor model(DLLð4Þ�526, pB.001), indicating that thethree-factor solution is not preferred in thisstudy.
Thus, when local dependence due to struc-tural features of the test is not taken intoaccount, there appears to be support for thefour-dimensional model of EI proposed by theMSCEIT’s authors.
Test-, area-, and branch-level models withrandom effects of task
Including a fixed effect of each MSCEIT taskreflects the idea that there might be, on average,more agreement among test takers about thecorrect answers to items on some tasks thanothers (that is, it will be on average easier toselect a more-correct answer to items on sometasks than others). Including a random effect ofeach MSCEIT task reflects the idea that theextent to which items on particular tasks are moreor less difficult interacts with the person.
As is presented in Table 3, a model with bothrandom and fixed effects of task did not appear tofit as well as a model with only random effects.Furthermore, although the main effects of thedimensions were statistically significantly differentfrom zero, their magnitude (ranging from �0.04to 0.11 logits) was trivial compared to the range ofdifficulty of items (from approximately �3.8logits to �0.8 logits). Thus, it did not appearthat the inclusion of fixed effects of facets
provided an advantage when random facets effectswere included, and thus all further models includeonly random effects of task.
Models 5, 6, and 7 parallel Models 1, 2, and 3,respectively, with the addition of eight randomeffects of facets. Each item now loads onto bothits primary construct factor and a specific methodsfactor. These models fit eight more parametersthan their counterpart models above.
Model 5 fits significantly better than Model 1(DLLð8Þ�2,591, pB.001), Model 6 fits signifi-cantly better than Model 2 (DLLð8Þ�1,511,pB.001), and Model 7 fits significantly betterthan Model 3 (DLLð8Þ�911, pB.001), indicatingthat in all three cases we can reject the nullhypothesis that the model without random effectsof tasks fits the data as well as the model withthese random effects. Thus, it appears that asignificant amount of model variance is withintask, even controlling for construct-relatedvariance.
The differences in fit between Models 5, 6, and7 (shown in Table 4) are telling. Model 6 fits thedata better than Model 5 (DLLð2Þ�107,pB.001), although this difference is trivial (a0.09% reduction in deviance) compared to thedifference between Models 1 and 2 (a 0.96%reduction in deviance; thus, this difference is overten times larger when facets are not modelled).Model 7 does not fit the data better than Model 6(DLLð7Þ�11, p�.05), quite unlike the compar-ison of Models 3 and 2. Thus it appears that thesuccessive better fit of higher-dimensional modelsobserved in the comparisons of Models 1, 2, and 3all but vanishes when facet-related variance iscontrolled.
Table 3. Comparison of a model with fixed and random facets effects and a model with only random effects
Model
Final likelihood; estimated
parameters
Person-separation
(EAP) reliability
Chi-square test vs.
previous model BIC
Unidimensional (general EI) model with both
fixed and random facet effects
121,206; 236 .789 * 122,754
Unidimensional (general EI) with only random
facet effects (Model 5)
121,203; 228 .789 * 122,698
MAUL
512 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
The correlation between the two primary
dimensions in Model 6 was estimated as r�.64
(compared to r�.48 for Model 2). Estimated
correlations between the four dimensions in
Model 7 are given in Table 5, and are also
uniformly higher than those from Model 3. This
provides further evidence that the distinction
between the primary dimensions observed in
Models 2 and 3 was in large part due to
differences in testing method; when these differ-
ences are held constant, the primary dimensions
appear far less distinct.The variance of the primary dimension and the
random task effects are shown in Table 6 for the
unidimensional and four-dimensional models.
The variances of the random facet effects are
quite large compared to the variance of the
primary dimensions, and in some cases consider-
ably larger, especially in the case of the Faces task,
and to a lesser extent the Pictures task. This again
highlights the extent to which variance in re-
sponse patterns is dominated by the specific type
and format of the item, rather than the general EI
construct or constructs. Why this seems to be
especially true for the tasks involving pictures is
not clear; an investigation of examinees’ cognitiveprocesses in responding to test items may behelpful in clarifying this issue.
Models for testlet- and facet-based iteminterdependence for the perceivingemotions branch
As discussed in an earlier section, there are (atleast) two potential sources of local item depen-dence on the MSCEIT: the eight item formats(‘‘facets’’) and twenty-five groups of three to fiveitems related to a common prompt (‘‘testlets’’). Sofar only the former has been modelled, and thusthese analyses still fail to account for local itemdependence due to the bundling of items underspecific prompts.
Unfortunately, the program ConQuest canhandle only fifteen random effects in a givenmodel, and thus modelling random effects for alltestlets is beyond the limits of this program(and, to the author’s knowledge, all otheritem-response modelling programs). However,a more constrained analysis is possible, whichmay shed light on the extent to which itemtestlets further distort estimation of modelfeatures.
The Perceiving Emotions branch of theMSCEIT was chosen for this demonstration,because it has the largest number of testlets(four pictures of faces and six pictures of land-scapes or art), as well as the biggest testlets (fiveitems associated with each picture). As is shown inTable 7, a unidimensional model fit just to thePerceiving Emotions items provides an estimated
Table 5. Correlations between dimensions for Model 7
Perceiving Using Understanding Managing
Perceiving 1
Using .778 1
Understanding .376 .687 1
Managing .324 .684 .810 1
Table 4. Likelihood and reliability for one-, two-, and four-dimensional models with random facet effects
Model
Final likelihood;
estimated parameters
Person-separation
(EAP) reliability
Chi-square test vs.
previous model BIC
Model 5: Unidimensional (general EI) plus
eight orthogonal task dimensions
121,203; 228 .810 * 122,698
Model 6: Two-dimensional (experiential and
strategic EI) plus eight orthogonal task
dimensions
121,096; 230 .764; .799; r�.642 Change: 107; df: 2; pB.01 122,604
Model 7: Four-dimensional (perceiving, using,
understanding, and managing EI) plus
eight orthogonal task dimensions
121,085; 237 .702; .740; .735; .761 Change: 11; df: 7; p�.14 122,642
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 513
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
person-separation reliability8 of .85. A model with
ten additional random effects for the item
testlets fits significantly better than the unidimen-
sional model (DLLð10Þ�757, pB.001) and has
an estimated reliability of .84. Adding in two
additional random effects for the facet of task
(faces versus artwork/landscapes) results in still
better fit (DLLð2Þ�417, pB.001) and a signifi-
cant drop in the estimated reliability, to .61.Results were similar for the other branches of
the MSCEIT. A unidimensional model fit to
the items on the Using Emotions branch
provided an estimated reliability of .68. This
was reduced to .64 when random effects of
testlets were included, and further to .49 when
random effects of the two facets were addition-
ally included. For the Managing Emotions
items, a unidimensional model yielded an esti-
mated reliability of .78, which was reduced only
to .77 when testlets were modelled and to .72
when the facets were additionally modelled.
There were no testlets on the Understanding
Emotions branch.These findings suggest that, in the case of the
MSCEIT, ignoring local dependence due to task
results in a much greater overestimation of
measurement precision than ignoring local de-
pendence due to testlets. It is logical that this
finding may extend to the investigation of di-
mensionality as well: in other words, properly
Table 7. Likelihood and reliability for perceiving emotions models
Model
Final likelihood;
estimated parameters
Person-separation
(EAP) reliability
Chi-square test vs.
simpler model BIC
Model P1: Unidimensional 41,432; 78 .850 * 41,945
Model P2: Unidimensional plus ten random
testlet effects
40,675; 88 .844 vs. model P1: Change:
757; df: 10; pB.001
41,254
Model P3: Unidimensional plus two random
facet effects and ten random testlet effects
40,258; 90 .606 vs. model P2: Change:
417; df: 2; pB.001
40,850
Table 6. Variance estimates of the theta and random facet variables under the random-effects facet models and
standard models
Unidimensional
Unidimensional
plus facets Four dimensional
Four dimensional
plus facets
Theta 1 .203 .191 .452 .410
Theta 2 .179 .130
Theta 3 .281 .235
Theta 4 .374 .340
Facet 1: Faces .559 .385
Facet 2: Facil .168 .135
Facet 3: Changes .110 .028
Facet 4: Mgmnt .210 .051
Facet 5: Pictures .355 .250
Facet 6: Sensat .121 .083
Facet 7: Blends .172 .100
Facet 8: Rltshps .298 .077
8 Person-separation reliability in IRT is directly analogous to familiar internal consistency measures in true score theory (i.e.,
Cronbach’s alpha/KR-20/KR-21) and can be interpreted on the same scale. Details of its computation are given in many sources
(see, for example, Wilson, 2005, pp. 146�147).
MAUL
514 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
modelling local dependence due to testlets wouldprobably not affect the model-fit comparisons orestimated dimension correlations of multidimen-sional models nearly as much as did accountingfor local dependence due to task type. Never-theless, the magnitude of the difference betweenModels 5 and 6 (the unidimensional versus two-dimensional models with random effects of task;DLLð2Þ�80, pB.001) was so trivial compared toother model comparisons that it is possiblemodelling random effects of testlets would movethis comparison into nonsignificance.
Finally, it should be noted that the analysespresented in this paper conceptually treat the eighttasks as independent methods, but, in fact, manyspecific measurement techniques are shared incommon by many or all of the tasks. Six of theeight tasks involve multiple ratings of a singlestimulus; six out of eight also involve the use of1-to-5 rating scales. Thus, even these analyses arelikely to underestimate the degree to whichsimilarity in item format (rather than a commonunderlying psychological variable) causes sharedvariance in item responses.
DISCUSSION
This study has, for the first time, taken a ‘‘full-information’’ approach to the modelling of theMSCEIT. By modelling the test at the item level,it was possible to simultaneously model*andthus differentiate*hypothesised construct andstructural sources of common variance in itemresponse patterns. These models suggest that thechoice of task format contributes to variance inresponse patterns to a far greater extent than thedimensionality hypothesised to exist in the under-lying EI variable, and that when local dependenceamong test items due to a common tasktype and item response format is taken intoaccount, there is little reason to prefer a higher-dimensional model of EI over a unidimensionalmodel. Thus there is insufficient statistical evi-dence to interpret the specific abilities sampled onthe MSCEIT as forming relationships in linewith the predictions of the four-dimensional EI
model. Notably, this finding diverges from thefindings of many earlier studies, which analysedcovariance matrices of parcelled task scores.
This divergence suggests that substantive re-sults from an item-level analysis can differfundamentally, rather than merely incrementally,from results of an analysis conducted on parcels ofitems. It stands to reason that this may beespecially likely to be true when there arestructural sources of inter-item dependence thatcannot be modelled once the items are parcelled.In the case of the MSCEIT, parcelling task scoresnot only reduces the number of data points persubject from 141 to 8, but also precludes thespecific investigation of the relative contributionsof structural features of the test to variance in itemresponses.
It does not appear that parcel-level analyses arepreferred to item-level analyses in the case of theMSCEIT for any theory-based reason. Out of thedozen previous studies of the MEIS andMSCEIT’s internal structure cited in this paper,not one of them explicitly addresses the possibilityof item-level analyses. Given this, it seems likelythat researchers have chosen to use linear factoranalysis (which requires covariance matrices ofcontinuously arranged variables) rather thanmodes of analysis appropriate for categoricalvariables (which would allow modelling at theitem level) based mainly on its greater accessibilityand familiarity, rather than strong theoreticalrationale.
The findings presented in this paper highlightthe importance of carefully modelling sources oflocal item dependence when conducting investi-gations of the dimensional structure of a psycho-logical variable. Any structural source ofdependence among item-response patterns canbe mistaken for dependence due to a variable’sdimensional structure if ignored. Results fromitem-level investigations can paint a less encoura-ging picture of the internal structure of a testthan one might want to find in the (naturally)confirmation-focused initial process of testvalidation. Nevertheless, it is important tofully understand the role of multidimensionalstatistical investigations in the process of construct
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 515
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
validation, and not mistake one source of sharedvariance for another.
These findings converge with those of Følles-dal and Hagtvet (2009), in terms of the extent towhich estimates of reliability of measurement areattenuated by modelling multiple sources ofvariance, but also extend those findings to theinvestigation of test dimensionality, essentiallybridging the methodological gap between theirstudy, which employed generalisability theory toexamine measurement reliability of the MSCEIT,and other studies, cited above, which have usedfactor analysis on averaged task scores to examinethe dimensional structure of the MSCEIT.
Given these results and that the MSCEIT isthe primary source of information concerning thedimensional structure of the Mayer�Saloveymodel of EI, there does not presently appear tobe compelling empirical support for the idea thatEI can be conceptualised as a group of distinct butrelated abilities. The four-branch model of EIremains an intriguing theory without clear em-pirical support. Furthermore, it should be notedthat even though these analyses appear to supporta model with a single primary dimension (e.g.,Model 5) over alternative models, this by itselfcannot be taken as evidence of the existence ofemotional intelligence, as the primary dimensionof individual differences (the h parameter) absorbsevery source of shared variance in responsepatterns across the entire test, which could includesocial deviance, familiarity with response scales,test-taking motivation, test-wiseness, agreeable-ness, and many other possibilities.
Conclusions
The dimensional structure of the MSCEITappears to be driven to a much greater extentthan previously realised by the choice of specificmeasurement techniques (the facets of the test), tothe point that the evidence for the validity of thefour-dimensional model of emotional intelligencebased on the internal structure of the MSCEITappears insufficient. However, this study, like allstudies of the MSCEIT, is limited by the specifictasks and items that have been sampled on that
test. Future work on the emotional intelligence
construct may either chose to revise the dimen-
sional theory, or to sample more broadly from
each of the dimensions of emotional intelligence,
which could further help clarify the extent to
which the specific choice of measurement techni-
ques has influenced the more general discussion of
EI, and the extent to which specific emotion-
related tasks can be regarded as related under a
common, ability-based theoretical framework.
Broader sampling from each of the four proposed
branches of emotional intelligence would also
provide the opportunity for more nuanced inves-
tigation of the extent to which individual item
formats (of which there are currently only eight,
the variances of which varied fairly significantly in
the present study), rather than proposed branches
of EI, account for variance in item response
patterns. Future analytic work on the dimensional
structure of new psychological constructs will
benefit from modelling techniques that can
account for test structure- or method-related
violations of the conditional independence re-
quirement of measurement.Just as the idea of EI has much to offer to our
understanding of cognitive and affective indivi-
dual differences, the principles and analytic frame-
works of modern measurement theory have much
to offer to the exploration of EI, and of new
psychological constructs more broadly. It will be
exciting to see what is learned as both efforts
move forward.
Manuscript received 5 September 2010
Revised manuscript received 19 April 2011
Manuscript accepted 5 May 2011
First published online 26 July 2011
REFERENCES
Adams, R. J., & Khoo, S. T. (1996). Quest. Melbourne,
Australia: ACER.Adams, R. J., Wilson, M., & Wang, W. (1997). The
multidimensional random coefficients multinomial
logit model. Applied Psychological Measurement, 21,
1�23.
MAUL
516 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
Bandalos, D. L. (2001). The effects of item parceling
on goodness-of-fit and parameter estimate bias in
structural equation modeling. Structural Equation
Modeling, 9, 78�102.Bandalos, D. L., & Finney, S. J. (2002). Item parceling
issues in structural equation modeling. In G. A.
Marcoulides & R. E. Schumacker (Eds.), New
developments and techniques in structural equation
modeling (pp. 269�296). Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.Bentler, P. M., & Jamshidian, M. (1994). Gramian
matrices in covariance structural equation models.
Applied Psychological Measurement, 18(1), 79�94.Birnbaum, A. (1968). Some latent trait models and
their use in inferring an examinee’s ability. In F. M.
Lord & M. R. Novick (Eds.), Statistical theories of
mental test scores (pp. 397�479). Reading, MA:
Addison-Wesley.Bond, T. G., & Fox, C. M. (2007). Applying the Rasch
model: Fundamental measurement in the human
sciences. Mahwah, NJ: Lawrence Erlbaum Associ-
ates, Inc.Ciarrochi, J., Chan, A. Y. C., & Caputi, P. (2000). A
critical evaluation of the emotional intelligence
construct. Personality and Individual Differences, 28,
539�561.Day, A. L., & Carroll, S. A. (2004). Using an ability-
based measure of emotional intelligence to predict
individual performance, group performance, and
group citizenship behaviors. Personality and Indivi-
dual Differences, 26, 1443�1458.Fan, H., Jackson, T., Yang, X., Tang, W., & Zhang, J.
(2010). The factor structure of the Mayer�Salovey�Caruso Emotional Intelligence Test V 2.0
(MSCEIT): A meta-analytic structural equation
modeling approach. Personality and Individual Dif-
ferences, 48, 781�785.Føllesdal, H., & Hagtvet, K. A. (2009). Emotional
intelligence: The MSCEIT from the perspective of
generalizability theory. Intelligence, 37, 94�105.Gignac, G. E. (2005). Evaluating the MSCEIT 2.0 via
CFA: Corrections to Mayer et al., 2003. Emotion, 5,
233�235.Gorsuch, R. L. (1997). Exploratory factor analysis: Its
role in item analysis. Journal of Personality Assessment,
68(3), 532�560.Joreskog, K. G. (1971). Statistical analysis of sets of
congeneric tests. Psychometrika, 36, 109�133.Keele, S. M., & Bell, R. C. (2008). The factorial
validity of emotional intelligence: An unresolved
issue. Personality and Individual Differences, 44, 487�500.
Kline, R. B. (2005). Principles and practice of structural
equation modeling. New York, NY: Guilford Press.Lazarsfeld, P. F., & Henry, N. W. (1968). Latent
structure analysis. Boston, MA: Houghton Mifflin.Legree, P. J., Psotka, J., Tremble, T., & Bourne, D. R.
(2005). Using consensus based measurement toassess emotional intelligence. In R. Schulze & R.D. Roberts (Eds.), Emotional intelligence: An inter-
national handbook (pp. 155�179). Cambridge, MA:Hogrefe .
Linacre, M. (1989). Many-faceted Rasch measurement.Chicago, IL: MESA Press.
Masters, G. N. (1982). A Rasch model for partial creditscoring. Psychometrika, 47, 149�174.
Maul, A. (2011). The factor structure and cross-testconvergence of the Mayer�Salovey�Caruso modelof emotional intelligence. Personality and Individual
Differences, 50, 457�463.Maul, A. (in press). The validity of the Mayer�
Salovey�Caruso Emotional Intelligence Test(MSCEIT) as a measure of emotional intelligence.Emotion Review.
Mayer, J. D., Caruso, D. R., & Salovey, P. (1999).Emotional intelligence meets traditional standardsfor an intelligence. Intelligence, 27, 267�298.
Mayer, J. D., Roberts, R. D., & Barsade, S. G. (2008).Human abilities: Emotional intelligence. Annual
Review of Psychology, 59, 507�536.Mayer, J. D., & Salovey, P. (1997). What is emotional
intelligence? In P. Salovey & D. Sluyter (Eds.),Emotional development and emotional intelligence:
Educational implications. New York, NY: BasicBooks.
Mayer, J. D., Salovey, P., & Caruso, D. R. (2002).Mayer�Salovey�Caruso Emotional Intelligence Test
(MSCEIT) user’s manual. Toronto, ON: MHS.Mayer, J. D., Salovey, P., Caruso, D. R., & Sitarenios,
G. (2003). Measuring emotional intelligence withthe MSCEIT V 2.0. Emotion, 3, 97�105.
Mellenbergh, G. J. (1994). Generalized linear itemresponse theory. Psychological Bulletin, 115, 300�307.
Palmer, B., Gignac, G., Manocha, R., & Stough, C.(2005). A psychometric evaluation of the Mayer�Salovey�Caruso Emotional Intelligence Test Ver-sion 2.0. Intelligence, 33, 285�305.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004).Generalized multilevel structural equation model-ing. Psychometrika, 69, 167�190.
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 517
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
Raftery, A. E. (1993). Bayesian model selection in
structural equation models. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models
(pp. 163�180). Newbury Park, CA: Sage.Rasch, G. (1960). Probabilistic models for some intelli-
gence and achievement tests. Copenhagen, Denmark:
Danish Institute for Educational Research (Ex-
panded edition, 1980. Chicago, IL: University of
Chicago Press).Roberts, R., Schulze, R., O’Brien, K., Reid, J.,
MacCann, C., & Maul, A. (2006). Exploring the
validity of the Mayer�Salovey�Caruso Emotional
Intelligence Test (MSCEIT) with established emo-
tions measures. Emotion, 6, 663�669.Roberts, R., Zeidner, M., & Matthews, G. (2001).
Does emotional intelligence meet traditional stan-
dards for an intelligence? Some new data and
conclusions. Emotion, 1, 196�231.Rode, J. C., Mooney, C. H., Arthaud-Day, M. L.,
Near, J. P., Rubin, R. S, Baldwin, T. T., et al.
(2008). An examination of the structural, discrimi-
nant, nomological, and incremental predictive
validity of the MSCEIT C V2.0. Intelligence, 36,
350�366.Rosenbaum, P. R. (1988). Item bundles. Psychometrika,
53, 349�359.Rossen, E., Kranzler, J. H., & Algina, J. (2008).
Confirmatory factor analysis of the Mayer�Salovey�Caruso Emotional Intelligence Test v 2.0
(MSCEIT). Personality and Individual Differences,
44, 1258�1269.Sireci, S. G., Thissen, D., & Wainer, H. (1991). On
the reliability of testlet-based tests. Journal of
Educational Measurement, 28, 237�247.Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized
latent variable modeling: Multilevel, longitudinal, and
structural equation models. Boca Raton, FL: Chap-
man & Hall/CRC.Wang, W., & Wilson, M. (2005a). The Rasch testlet
model. Applied Psychological Measurement, 29, 296�318.
Wang, W., & Wilson, M. (2005b). Exploring local
item dependence using a random-effects facet
model. Applied Psychological Measurement, 29, 126�149.
Wilhelm, O. (2005). Measures of emotional intelli-
gence: Practice and standards. In R. Schulze & R.
Roberts (Eds.), Emotional intelligence: An interna-
tional handbook (pp. 131�154). Cambridge, MA:
Hogrefe & Huber.
Wilson, M. (2005). Constructing measures: An item
response modeling approach. Mahwah, NJ: LawrenceErlbaum Associates, Inc.
Worthke, W. (1993). Nonpositive definite matrices instructural equation modeling. In K. A. Bollen & J.S. Long (Eds.), Testing structural equation models
(pp. 256�293). Newbury Park, CA: Sage.Wu, M. L., Adams, R. J., Wilson, M., & Haldane, S.
A. (2007). ACER ConQuest Version 2.0 [computerprogram]. Hawthorn, Australia: ACER.
Yen, W. M., & Fitzpatrick, A. R. (2006). Itemresponse theory. In R. L. Brennan (Ed.), Educational
measurement (4th ed., pp. 111�154). Santa Barbara,CA: Greenwood Publishing.
APPENDIX
Any statistical model of the relationship betweenobserved variables and one or more underlying, ‘‘latent’’variables hypothesised to be responsible for variation inthose observed variables can be termed a latent variable
model. Latent variable models include factor models forcontinuous observed variables and continuous latentvariables (e.g., Joreskog, 1971), item response modelsfor categorical observed variables and continuous latentvariables (e.g., Birnbaum, 1968; Rasch, 1960), latentclass models for categorical observed variables andcategorical latent variables (e.g., Lazarsfeld & Henry,1968), and latent profile models for continuous ob-served variables and categorical latent variables (e.g.,Lazarsfeld & Henry, 1968). All of these models can beformulated as special cases of a generalised latentvariable model (Skrondal & Rabe-Hesketh, 2004; seealso Mellenbergh, 1994; Rabe-Hesketh, Skrondal, &Pickles, 2004).
Prior empirical investigations of the MSCEIT havecommonly employed confirmatory factor analysis(CFA) models. A linear CFA model takes the follow-ing form:
yin ¼ bi þ k1ig1n þ � � � þ kqigqn þ ein (1Þ
Where yin is the value of the ith indicator variable forperson n, bi is the intercept for indicator i, k1i, � � �, kqi
are the factor loadings for the q common factorsg1j � � �gqn, with variance-covariance matrix W, and ein
is the unique factor (i.e., measurement error) forindicator i. For purposes of identification, commonand unique factors are typically set to have zero means,and the variances of common factors are constrained to1 (i.e., wii ¼ 1). It is usually assumed that the common
MAUL
518 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
factors are uncorrelated with the unique factors, andthat the correlations among the common factors andamong the unique factors can be freely estimated orconstrained to zero depending on the model. In thisformulation, common and unique factors are generallyassumed to be normally distributed, making the modelappropriate for continuous indicators.
Responses to individual test items are typicallyscored ordinally rather than continuously. Thus CFArequires some adaptation to be applicable to itemanalysis. There are a number of ways that suchadaptation can be accomplished. One is the inclusionof a link function relating the linear predictor to themean of a probability distribution function for eachitem. If a logit (or cumulative normal, i.e., probit) linkfunction is selected, a multidimensional IRT model isobtained, as in equation (2).
lnðPnij=Pniðj�1ÞÞ ¼ bij þ k1ig1n þ � � � þ kqigqn (2Þ
In this formulation, bij are parameters relating to the‘‘difficulty’’ of the items or steps within each item, g1n isinterpreted as the ‘‘ability’’ of person n on the first latenttrait (i.e., the first common factor or dimension), andk1i is interpreted as the slope (also termed ‘‘discrimina-tion’’ or ‘‘loading’’) of item i on the first dimension (andanalogously for the other dimensions).
This notation illustrates the correspondence be-tween CFA and IRT models. However, it is morecommon for item response models to be presentedusing different notation. Equation (2) can be rewrittenas:
lnðPnij=Pniðj�1ÞÞ ¼ aihn þ dij (3Þ
Where ai replaces the row vector of factor loadings(k1i; � � � ; kqi), hn replaces the column vector of commonfactors gn ¼ ðg1n; � � � ;gqnÞ
0, and dij replaces bij . The
formulation given here corresponds to a multidimen-sional extension of the Masters (1982) partial creditmodel, in which category difficulties are freely esti-mated for each item.
In the 2PL formulation of an IRT model, thecontents of the ai vector are estimated from the data.However, the contents of this vector could also bespecified a priori by the test developer, if there istheory-based reason to do so (as is argued to be the case,for example, in some applications of multidimensionalextensions of the Rasch model; see Adams, Wilson, &Wang, 1997). Additionally, when at least one elementof the ai vector is fixed, variances of dimensions can beestimated as w instead of fixed to 1. Often the elementsof ai are fixed at simply 1 or 0, but can take on other
values when there is theory-based reason to do so, as inthe case of the present study. Somewhat differentanalytic purposes are served by fixing the contents ofai based on theory versus allowing it to be freelyestimated; sometimes this is described as the differencebetween ‘‘fitting the data to the model’’ versus the moretraditional statistical orientation on ‘‘fitting the modelto the data’’ (e.g., Yen & Fitzpatrick, 2006, p. 124). Inthe former case, the focus is not on finding a statisticalmodel that maximally explains variation in observations,but rather in testing specified (and possibly competing)theory-based models against observed data to determineto what extent the data support the theory behind thosemodels. In the present case, it is the theory ofconsensus-based scoring articulated by Mayer et al.(e.g., 2002) and others (Legree, Psotka, Tremble, &Bourne, 2005) that provides the theoretical foundationfor the specification of the contents of the ai vector. Inthe models described in this paper, the ai vectorcontents are held constant across models, and thepurpose of the model comparisons is to determinewhich of several hypothesised dimensional structuresbest explains variation in item responses, rather thanjust to maximally explain such variation.
One purpose of this brief exposition was to illustratehow item response models are not fundamentallydifferent from confirmatory factor models, but rathercan simply be viewed as confirmatory factor models plusa logit link function that relates the linear predictor toordinally scored item responses. Thus, the analysesdescribed in the present paper are fully in the CFAtradition of other analyses of the MSCEIT, but havemoved these analyses to the item level with theaccompanying adjustments.
Modelling local dependence. A standard assumptionof all measurement models is that what is shared incommon among test items is the latent variable(s), andnothing else. Formally this is known as the requirementof conditional or local independence. This requirementis violated to the extent to which dependencies existamong specific item responses that are not due to thecommon latent variable.
A common source of such local dependence is thepresence of ‘‘testlets’’ (or, alternatively, ‘‘item bundles’’;e.g., Rosenbaum, 1988; Wang & Wilson, 2005a), thetypical example of which is a group of items that sharecommon stimulus material. An example from theMSCEIT would be a picture of a landscape, whichrequires respondents to make judgments about thedegree of expression of five different emotions (e.g.,happiness, fear, etc.). In this case responses to the five
THE STRUCTURE OF EI AT THE ITEM LEVEL
COGNITION AND EMOTION, 2012, 26 (3) 519
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012
items that refer to a single landscape are likely to not beconditionally independent, as all depend on the re-spondent’s interpretation of a single image.
A second kind of bundling also occurs on theMSCEIT, although it will be more consistent withthe wider literature to refer here to ‘‘facets’’ of themeasurement situation, in that items are organised intoeight distinct tasks. Responses to items within aparticular task will share more variance in commonthan items sampled randomly from the possible targetdomain of the construct, possibly biasing investigationsof the dimensional structure of the test.
In the context of IRT modelling, the effects of facetscan be modelled as fixed effects (e.g., Linacre, 1989), inwhich the effect of the facet in question is held to beone that is constant over persons, or as effects that arerandom over persons (Wang & Wilson, 2005a, 2005b),in addition to or instead of a main fixed effect. Forpolytomous items, a testlet (or facet) model can take thefollowing form (Wang & Wilson, 2005b):
logðPnij=Pniðj�1ÞÞ ¼ hn � dij � ckðiÞ þ cnkðiÞ (4Þ
In which dij is the item parameter associated with step jin item i, ck is the overall difficulty of facet k, and cnkðiÞ isthe random effect of facet k. The random effects offacets are constrained to be orthogonal to one anotherand to the random intercept hn, for both technical andinterpretational reasons. Thus cnkðiÞ can be interpretedas a dimension of individual differences that affects theprobability of success on items on a specific testlet,independent of the primary dimension and othertestlets. Conceptually, cnkðiÞ partials out only variancein item response patterns that can be uniquely attrib-uted to the fact that items share a common task;response variance shared in common with other itemsmeant to measure a common dimension is stillattributed to hn. Wang and Wilson’s formulation ofthese models assumed a single primary dimension, butthey are readily extended into the multidimensionalcase.
MAUL
520 COGNITION AND EMOTION, 2012, 26 (3)
Dow
nloa
ded
by [
Uni
vers
ity o
f C
olor
ado
at B
ould
er L
ibra
ries
] at
13:
48 1
9 D
ecem
ber
2012