Examining the structure of emotional intelligence at the item level: New perspectives, new...

19
This article was downloaded by: [University of Colorado at Boulder Libraries] On: 19 December 2012, At: 13:48 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Cognition & Emotion Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/pcem20 Examining the structure of emotional intelligence at the item level: New perspectives, new conclusions Andrew Maul a a Unit for Quantitative Analysis in Education, University of Oslo, Oslo, Norway Version of record first published: 26 Jul 2011. To cite this article: Andrew Maul (2012): Examining the structure of emotional intelligence at the item level: New perspectives, new conclusions, Cognition & Emotion, 26:3, 503-520 To link to this article: http://dx.doi.org/10.1080/02699931.2011.588690 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Examining the structure of emotional intelligence at the item level: New perspectives, new...

This article was downloaded by: [University of Colorado at Boulder Libraries]On: 19 December 2012, At: 13:48Publisher: Psychology PressInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cognition & EmotionPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/pcem20

Examining the structure of emotionalintelligence at the item level: Newperspectives, new conclusionsAndrew Maul aa Unit for Quantitative Analysis in Education, University of Oslo, Oslo,NorwayVersion of record first published: 26 Jul 2011.

To cite this article: Andrew Maul (2012): Examining the structure of emotional intelligence at the item level:New perspectives, new conclusions, Cognition & Emotion, 26:3, 503-520

To link to this article: http://dx.doi.org/10.1080/02699931.2011.588690

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that thecontents will be complete or accurate or up to date. The accuracy of any instructions, formulae,and drug doses should be independently verified with primary sources. The publisher shall notbe liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever orhowsoever caused arising directly or indirectly in connection with or arising out of the use of thismaterial.

Examining the structure of emotional intelligence at theitem level: New perspectives, new conclusions

Andrew Maul

Unit for Quantitative Analysis in Education, University of Oslo, Oslo, Norway

Despite twenty years of research, many unknowns remain regarding the Mayer�Salovey (e.g., 1997)model of emotional intelligence (EI) and the validity of tests that have been designed to measure it.Evidence relevant to the internal structure of EI has come mainly from factor-analytic studies of theMSCEIT and the MEIS, utilising parcelled task scores rather than individual test items. Thisapproach has several deficiencies: in addition to the loss of item-level information, it results in aninsufficient number of observed variables per factor and an inability to separate structural sources oflocal item dependence (i.e., method variance) from construct-related variance. The present study(N�707) employed multidimensional item response modelling to investigate the dimensionalstructure of the MSCEIT, at the item level, for the first time. It is shown that item format and thespecific choice of task explain far more of the variance in response patterns than does thehypothesised dimensional structure of EI, to the point that there is no empirical reason to prefer ahigher-dimensional model of EI over a unidimensional model. It is argued that the advantage of anitem-level perspective can be fundamental, rather than merely incremental.

Keywords: Emotional intelligence; Multidimensional item response theory; Local dependence;Facets; Testlets.

Many unknowns remain about the psychological

construct of emotional intelligence (EI) as originally

defined by Mayer and Salovey (e.g., 1997). Empirical

evidence concerning the nature and structure of EI

and its relationship to other constructs and outcomes

has primarily come from the Mayer�Salovey�Caruso

Emotional Intelligence Test (MSCEIT V.2; Mayer,

Salovey, & Caruso, 2002), and its predecessor, the

Multifactor Emotional Intelligence Scale (MEIS;

Mayer, Caruso, & Salovey, 1999). One issue of

particular importance to construct validity is whether

the internal structure of tests designed to measure the

construct conform to theory-based expectations;

empirical evidence concerning this issue has come

from factor-analytic studies of the MEIS (Ciarrochi,

Chan, & Caputi, 2000; Mayer et al., 1999; Roberts,

Zeidner, & Matthews, 2001) and the MSCEIT

(Day & Carroll, 2004; Gignac, 2005; Keele & Bell,

2008; Maul, 2011; Mayer, Salovey, Caruso, &

Sitarenios, 2003; Palmer, Gignac, Manocha, &

Correspondence should be addressed to: Andrew Maul, Unit for Quantitative Analysis in Education, University of Oslo,

Postboks 1099 Blindern, Oslo, 0317 Norway. E-mail: [email protected]

COGNITION AND EMOTION

2012, 26 (3), 503�520

503# 2012 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business

http://www.psypress.com/cogemotion http://dx.doi.org/10.1080/02699931.2011.588690

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

Stough, 2005; Roberts et al., 2006; Rode et al.,2008; Rossen, Kranzler, & Algina, 2008), investi-gating whether the observed associations betweenparts of these tests conform to the expectations ofthe theory of EI. However, conclusions from thesestudies have been equivocal and at times difficult tointerpret, and certain analytic choices may havecontributed to this lack of clarity. In particular,these studies have used aggregated (‘‘parcelled’’) taskscores rather than individual items as observedvariables; this approach leads to a significant lossof information (on the MSCEIT this reduces thenumber of observed variables from 122 to 8), aninsufficient number of observed variables relative tothe number of hypothesised factors, and an inabilityto determine to what extent variation in theresponses to individual items is driven by theparticular choice of task rather than the EIconstruct.

The present study sought to shed new lighton the internal structure of the MSCEIT byemploying item-level analyses. After a shortintroduction to the Mayer�Salovey EI modeland the MSCEIT, past studies of this test’sfactor structure are briefly reviewed, with specialattention to key analytic limitations of thesestudies. Next, a short conceptual overview ofthis study’s analytic approach is given. Followingthis, results from a new study (N�707) arepresented in which sources of method-relatedvariance in response patterns to the MSCEITare explicitly modelled. It is concluded thatwhen the MSCEIT is modelled at the itemlevel, and task- and item-format-related localitem dependence is taken into account, there isno reason to prefer higher-dimensional modelsto a unidimensional model of EI. Implicationsare discussed both for the field of EIresearch and for analytic choices made in theinvestigation of the structure of psychologicalvariables.

A brief overview of the MSCEIT

Mayer and Salovey (1997; Mayer et al., 2002)articulated EI as a four-dimensional constructcontaining the abilities to: (1) accurately perceive

and express emotions (Perceiving Emotions); (2)use emotions effectively in problem solving (UsingEmotions); (3) understand the dynamic nature ofemotions (Understanding Emotions); and (4) man-age one’s own and others’ emotions (ManagingEmotions). Since it was introduced in 2002, theMSCEIT has been the flagship test of this EImodel. The MSCEIT contains 141 items (ofwhich 122 are scored) divided into eight tasks,two of which are designed to measure each of theproposed four branches of EI. Each task containsbetween 9 and 30 items, and shares a commontype of prompt and stimulus material and acommon response format. In some cases, the tasksare comprised of several small groups of itemswith common prompts. For example, the Picturestask, which is designed to measure the PerceivingEmotions branch of EI, requires respondents toexamine six photographs of abstract art or land-scape scenes and rate the extent to which fivedifferent emotions (e.g., happiness, surprise, fear)are present on each picture on 5-point ratingscales ranging from ‘‘definitely not present’’ to‘‘definitely present’’.

MSCEIT test items are scored via a techniqueknown as consensus-based scoring. Rather thanhaving response options deliberately written to bemore or less correct (i.e., to reflect lower or higheramounts of EI), each response is scored accordingto the degree to which it corresponds with theconsensus response from a calibration group ofrespondents. In the general consensus approach,answers for the MSCEIT were determined on thebasis of response patterns from a large (N�5,000)standardisation sample from English-speakingcountries (Mayer et al., 2003; Mayer, Roberts,& Barsade, 2008). For instance, when respondentsare asked to rate the intensity of a particularemotion in a piece of artwork, if the five optionsrelating to the emotion are endorsed by 10%, 20%,20%, 40%, and 10% of the sample respectively, arespondent endorsing option 3 would be creditedwith a score of .20 while a respondent endorsingoption 4 would be credited with a score of .40.Respondents’ scores on each task, branch, and areaare the average of their weighted scores for theitems that make up that section of the test. The

MAUL

504 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

expert consensus scoring method is similar,although the calibration group consisted of 21volunteer members of the International Societyfor Research on Emotion (ISRE) at their con-ference in 2000. Scores derived from expert andconsensus weights appear to correlate so highly(e.g., r�.96; Mayer et al., 2003) that they can beregarded as essentially a single scoring method.

This scoring method has been one of the largestimpediments to item-level analysis of the MSCEIT.One consequence of this method is that individualitems are worth different amounts in the calculationof a total score; specifically, easy items are worthmuch more than difficult items. For example, in thecase of a very easy item, the five possible responseoptions might be endorsed by 90%, 5%, 3%, 1%,and 1% of the sample, and selecting an incorrectresponse would results in a loss of .85 or more in thefinal tally. On a more difficult item, in which theresponse options were endorsed by 25%, 21%, 20%,19%, and 15% of the sample, selection of anincorrect option can result in a loss of only up to.10 in one’s final score. Thus item difficulty isdirectly and negatively associated with item discri-mination, with the most difficult items having thelowest discrimination and the lowest impact on totaltest score.1 The present study did not seek to explorethe philosophical foundations of this approach toscoring, but merely to be consistent with the testauthors’ intentions while modelling the test at theitem level.

Empirical investigations of the MSCEIT

Several studies have employed factor2-analytic tech-niques to examine the internal structure of theMSCEIT and the extent to which it conforms tothe expectations of the Mayer and Salovey model ofEI. These studies share in common that they haveproceeded by (1) taking the average of respondents’weighted scores on the items within each of the

eight tasks to produce parcelled task scores, asdescribed above, and then (2) submitting a correla-tion or covariance matrix of the task scores for factoranalysis.

Results from these studies have been varied.Studies by Mayer et al. (2003) and Day and Carroll(2004) reported that the factor structure of theMSCEIT was well-described by a one-factor,‘‘EI(g)’’, solution, a two-factor, ‘‘Experiential’’ and‘‘Strategic’’, solution, and a four-factor solution, withtwo tasks each loading onto Perceiving, Using,Understanding, and Managing Emotions factors.However, Gignac (2005) and Palmer et al. (2005),through reanalysis of the Mayer et al. (2003)covariance matrices and through new data collec-tion, found one- and two-factor models to be illfitting, and a four-factor model to be implausibledue to a high (r�.90) correlation between theUsing and Managing Emotions branches. Theyhypothesised that the Using Emotions factor didnot ‘‘measure any unique construct-related variance,independent of a general factor and the otherbranches’’ (p. 299), which was supported by resultsfrom nested factor models. Palmer and colleaguesalso noted problems with using only two indicatorsper factor, identifying this as a ‘‘fundamentallimitation’’ (p. 286).

Studies by Roberts et al. (2006), Rode et al.(2008), Rosen et al. (2008), Keele and Bell (2008),and Maul (2011) each used somewhat different setsof factor models but came to similar final conclu-sions, finding only partial support for the proposedfactor structure of the MSCEIT and largely findingthe Using Emotions branch non-identifiable.A meta-analysis by Fan, Jackson, Yang, Tang, andZhang (2010) looked across factor-analytic studiesof the MSCEIT and found that a three-factormodel, with Perceiving and Using Emotions com-bining to form a single factor, was the model mostconsistently supported in the literature, but again

1 Arguably, the use of the words ‘‘easy’’ and ‘‘difficult,’’ while consistent with modern measurement theory, may not be entirely

appropriate here. Scoring items by the use of consensus weights implies that there may be items on which multiple answers are close

to being equally as good. What is termed a more ‘‘difficult’’ item in this paper may be alternatively seen as an item for which there is

simply a lack of clear consensus, and therefore an item that does not discriminate as well between subjects.2 The terms ‘‘factor’’ and ‘‘dimension’’ are taken here to be conceptually synonymous, albeit arising from distinct research

traditions.

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 505

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

noted only partial support for the existence of ageneral EI factor.

Thus, only equivocal support exists for thestructural integrity of the four-branch model of EIproposed by Mayer and Salovey, and the number ofindicators used per factor has been a major analyticlimitation. The actual construction of these indicatorvariables is another possible (and less-discussed)limitation. Each of these points is developed in turn.

The number of indicators per factor. Models withtwo indicators for each of multiple correlatedfactors are technically identified, but are moreprone to problems such as failure to converge,unstable or inadmissible solutions, and interpre-tive difficulties (Bentler & Jamshidian, 1994;Gorsuch, 1997; Worthke, 1993). Palmer andcolleagues (2005) and Wilhelm (2005) have notedthis issue in the context of EI research and havesuggested as a solution that the MSCEIT havemore tasks. However, it is worth noting that aninadequate number of tasks for factor analysis isan analytic concern, while the number of differenttasks used to measure a construct is a concernabout appropriate domain coverage. Moreover,item-based approaches, such as are given in thisstudy, can entirely remove the (analytic) lowerlimit on tasks, so long as there is a sufficientnumber of test items for each branch.

The use of averaged task scores as indicatorvariables. As noted earlier, each of these factor-analytic studies has used the averaged scores ofitems within tasks as indicator variables. This ispotentially problematic for a number of reasons.

First, at a conceptual level, aggregating scores ongroups of individual items can be seen as assumingpart of the very thing that multidimensional analysesare meant to investigate. The use of parcels of itemsas indicator variables requires that the items in eachparcel are unidimensional (that is, known tomeasure a single construct). Bandalos (2001) and

Bandalos and Finney (2002) noted that summing oraveraging scores on groups of items in this mannercan mask or distort a multidimensional structure insuch a way that, as Kline (2005) summarises,‘‘a seriously misspecified model may nonetheless fitthe data reasonably well’’ (p. 198), and furtherargued that such parcelling should not be part ofany analysis aimed at testing the dimensionalstructure of a test.

Second, organising items into parcels loses alarge amount of information. The number ofdistinct pieces of information about a respondent’semotional intelligence is reduced from 122 downto 8. This obviously contributes to the problemwith an inadequate number of observed variablesper factor noted previously. Further, such anapproach will not be helpful in identifying specificitems that may be unusual or problematic, whichis often a primary purpose of test analysis.

Third, there appears to be considerable varia-tion in the extent to which items within each taskform relatively homogenous groups, or, statedalternatively, there appears to be variation in themagnitude of the within-task dependence amongitem responses. The studies cited in the lastsection reported estimated internal consistencycoefficients for MSCEIT tasks ranging from .80down to the .30s.3 Averaging scores on each tasktogether ignores this variability, treating all tasksas essentially internally homogeneous, and prohi-bits investigation of the degree to which variationin item responses can be explained by specificchoice of task (rather than by the hypothesiseddimensions of emotional intelligence).

Modelling the MSCEIT at the item level

The main difference between the analytic approachemployed in the present study and those employedby prior studies is that the present study sought tomodel the relationship between individualMSCEIT item responses (rather than aggregatedtask scores) and the hypothesised dimensions of

3 A generalisability theory study by Føllesdal and Hagtvet (2009) provided evidence that many of these reported coefficients are

likely to be overestimates of the actual internal consistencies of the tasks, since many tasks contain items that are further grouped

into clusters around common prompts or stimuli, and the additional dependence imposed by these subclusters is not taken into

account in commonly used estimates of internal consistency (e.g., split-half or Cronbach’s alpha coefficients).

MAUL

506 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

emotional intelligence. Technical details of the

multidimensional item response theory (MIRT)

models employed in this study and accompanying

references are given in the appendix. Briefly, one can

think of MIRT models as confirmatory factor-

analytic (CFA) models with an added logit link

function, which makes the models appropriate

for ordinally-(rather than continuously-)scaled

observed variables (and thus permits the use of

individual test items as observed variables).Modelling the MSCEIT at the item level

opens up new analytic opportunities and chal-

lenges. The organisation of items into tasks

violates the measurement modelling requirement

of local independence; specifically, items within a

given task are likely to share much more in

common with one another than items sampled

at random from the target domain of the EI

construct, and, therefore, there is likely to be co-

variation in subjects’ responses to these items in

excess of what is expected given only that they are

related to a common underlying construct.4

It is well-established that ignoring local de-

pendence can lead to overestimation of measure-

ment precision (reflected, for example, by inflated

reliability estimates; e.g., Sireci, Thissen, &

Wainer, 1991). Local dependence may affect

multidimensional investigations as well, especially

when sources of local dependence exist primarily

within hypothesised dimensions. Within the

Perceiving Emotions branch, for example, there

are four pictures of faces and six pictures of

landscapes with five items each, for a total of 50

items; ignoring the local dependence caused by

common task type and common prompt treats

responses to these 50 items as independent

observations conditional only on the Perceiving

Emotions dimension, and thus risks misunder-

standing these sources of local dependence as

dependence due to a common ability, leading to a

biased view of the test’s dimensional structure.

Thus, as is described in the appendix, a multi-dimensional extension of Wang and Wilson’s(2005b) random-effects facet model was employedto simultaneously model the dimensional structureof the test and the local dependence among itemsof a common task in Models 5 and onward, asdescribed in the following section. Figures 1and 2, and Figure 3, which correspond to Models5, 6, and 7, visually represent how this wasaccomplished.

Introduction to the present study

The present study was designed to re-analyse theinternal structure of the MSCEIT at the item

level, modelling structural sources of inter-item

dependence. In this way, the present study sought

to overcome the two major methodological lim-

itations of previous investigations of the dimen-

sionality of the MSCEIT noted above.

Additionally, this study provided an opportunity

to examine the extent to which within-task

variance could mask or distort assessments of the

dimensionality of the MSCEIT.

METHOD

Participants

Subjects for this study were drawn from twosecondary sources. Data collected by the authorfrom another study (N�241; Maul, in press)were combined with data collected by Roberts andcolleagues in Australia (N�517; personal com-munication, 2006), for a total sample size ofN�758. In the former case, subjects were drawnfrom the student population at the University ofCalifornia at Berkeley and from the generalcommunity in Berkeley through flyers and websiterecruiting. Subjects (149 female, 73 male, 19unreported) ranged in age from 18 to 71 years(M�29.6, SD�11.9). Of the 223 subjects who

4 In contrast to how the totality of EI is held to be comprised of and defined by its four component dimensions, the tasks are not

regarded as representing the whole of the dimensions they target; rather, they can be regarded as sampled from a domain of tasks

appropriate to that dimension, similar to how individual items are often regarded as sampled from a relevant domain of possible

items.

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 507

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

reported their ethnicity, 44% were CaucasianAmerican, 34% were Asian American, 11% wereAfrican American, 7% were Latin American, and5% were other or mixed. A primary languageother than English was reported by 20%. In termsof education, 5% reported having a high schooldiploma or less, 41% had completed high schoolbut not (yet) received a 4-year degree, 38% hadreceived a 4-year degree, and 17% had a graduatedegree. The Roberts data (N�517) were col-lected from undergraduate students at the Uni-versity of Sydney, of whom 355 were female and114 were male (with 48 unreported), and whoseages ranged from 18 to 59 years (M�20.48,SD�5.02).

Mayer et al. (2002) considered participants’responses to be invalid if responses to 10% ormore of a given subscale’s items were missing; 18cases were excluded on these grounds. An addi-tional 29 subjects were excluded due to a sig-nificant amount of missing data within aparticular task. All further results are for theremaining 707 subjects. Missing data in theremaining cases were minimal (around 1%) andwere left as missing in the maximum likelihood-based analyses described here.

Materials

All subjects completed the MSCEIT ResearchVersion 2.0 in accordance with the guidelines forremote administration given by Mayer et al.(2002).

Scoring of the MSCEIT

Multi-Health Systems (MHS) owns and keepsproprietary the consensus and expert scoring keysfor the MSCEIT, and does not provide item-levelscoring services. Thus the suggestion of the testauthors to develop local consensus scoring normswas followed.

Nineteen test items are regularly excluded byMHS in analyses of the MSCEIT (Mayer et al.,2002, p. 63). The first author of the MSCEITprovided a list of these items for the purpose ofthis study (John Mayer, personal communication,8 April 2010). These items were excluded fromanalysis.

Subject response patterns were examined foreach of the remaining 122 items and translatedinto a scoring key. First, the response distributionof the full sample was examined for the five possibleresponses to each item. The response receiving thesmallest number of endorsements from the fullsample was coded as ‘‘0’’. Next, the category receivingthe next-highest number of endorsements wasidentified, and a difference-in-proportion test wasconducted between the two adjacent categories, todetermine if the proportion of subjects selectingthe second least-popular category was significantlyhigher than the proportion of subjects selectingthe lowest category. If the proportion of subjectsin the two groups were significantly different atthe a�.0055 level, the second least-popularcategory was assigned a score of ‘‘1’’; otherwisethe category was also assigned a score of ‘‘0’’. Thisprocedure commonly led to scoring categoriestogether when the difference in proportion select-ing the two adjacent categories was less than 7%of the total sample. However, a very small numberof subjects in a given category could presentproblems with estimation. Thus, if fewer thanten subjects out of the full sample selected theleast-popular category, both it and the secondleast-popular category were assigned a score of‘‘0’’.

This procedure was then repeated, comparingthe second least- and third least-popular cate-gories, and so on, assigning a consecutive integerto each category. Thus items could have differentnumbers of categories: an item whose optionswere selected by 80%, 6%, 6%, 5%, and 3% of the

5 Given the very large number of comparisons (122 items with five categories each), a fairly stringent alpha level is appropriate;

on the other hand, the strict Bonferroni level of .05/408�.00012 would result in too many items that cannot be scored. It should

also be noted that all analyses described in this paper were re-run with categories assigned using both more (.0001) and less (.01 and

.05) stringent alpha levels. Although these models obviously had different numbers of model parameters and different values of

these parameters, the ordering of fit of the models was identical to what is described in this paper.

MAUL

508 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

sample, for example, would be scored dichoto-

mously as [1, 0, 0, 0, 0], whereas an item whose

options were endorsed by 40%, 30%, 15%, 5%,

and 5%, of the sample would be scored as [3, 2, 1,

0, 0]. There were two test items for which the five

response options were selected nearly equally

often and thus could not be scored, and were

removed from further analysis. Thus this scoring

procedure resulted in 120 items, of which 34 had

two categories, 73 had three categories, and 13

had four categories.As discussed previously, the scoring method of

this test specifies different discriminations for

each item based on the degree of consensus.

Thus the contents of the ai vector from Equation

3 in the appendix were manipulated to reflect this

contention. The response pattern for each item

was examined again. The relevant element of ai

was then fixed to reflect the average difference in

popularity between the adjacent categories of the

item. Thus the two examples in the last paragraph

were given discrimination values of .74 and .12,

respectively, which reflects the contention that the

first item is more discriminating (and has a higher

impact on total score) than the second item. This

allowed MSCEIT items to be modelled in a

manner congruent with the test design.

As a cross-check on the fidelity of thistranslation of the scoring method to the officialMSCIET scoring method (Mayer et al., 2002),EAP person estimates generated from a simpleunidimensional IRT model (Model 1, as de-scribed below) were correlated with scores derivedin the traditional manner. The correlation be-tween these sets of scores was r�.96, which is inthe same order as the correlation between theexpert and consensus scoring methods (r�.96;Mayer et al., 2003) and helps assuage concernsthat the adaptation of the scoring method em-ployed here could be substantively different fromthe official method.

Statistical analyses

Analyses were conducted using ACER Con-Quest 2.0 (Wu, Adams, Wilson, & Haldane,2007). Differences in the final value of �2times the log-likelihood6 for nested models arecompared using a likelihood ratio test; values ofthe Bayes Information Criteria (BIC; Raftery,1993) are also presented. Two runs using aMonte Carlo approach to estimation were usedfor all models. On the first run, 400 nodes wereused for integration and estimation was set toterminate when the maximum change in para-

Figure 1. Model 5.

6 The symbol DLLðnÞ can be read as ‘‘the difference in �2(log-likelihood) values of the two models being compared, with n

more parameters in the more-complex model’’.

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 509

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

meter estimates became less than .01. Theparameter estimates from this run were thenused as starting values in a second run, in which

4,000 nodes were used estimation was set toterminate when the maximum change inparameters became less than .0001. The meansof all random effects were constrained to be

zero, although in some cases these randomeffects are centred around a (non-zero) esti-mated mean fixed effect of a facet.

RESULTS

The first set of models fit to the data ignore local

dependence among items using standard one-,

two-, and four-dimensional models, as well as a

three-dimensional model described below. The

second set duplicates these models but adds in

fixed and random effects for task format

(i.e., facet). The third set additionally includes

random effects for testlets. Figures 1 and 2, and 3

Figure 2. Model 6.

Figure 3. Model 7.

MAUL

510 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

correspond to Models 5, 6, and 7, respectively,described below, but also correspond to Models 1,2, and 3 if one removes the random effects of tasks(the g parameters).

Test-, area-, and branch-level models

The first three models closely replicate the spiritof the confirmatory factor models fit by Mayeret al. (2003), upon which the original empiricalarguments concerning the dimensionality of theEI construct measured by the MSCEIT werebuilt.

Model 1 treats the MSCEIT as a test of asingle latent variable, here termed EI(g). Model 2treats the MSCEIT as a test of two correlateddimensions, Experiential EI and Strategic EI.Model 3 treats the MSCEIT as a test of fourcorrelated dimensions: Perceiving, Using, Under-standing, and Managing Emotions.

An examination of item parameter-level fitstatistics for these three models, as well as allother models reported in this paper, revealed noitems parameters that appeared to severely misfitthe model.7 Infit (weighted) meansquare statisticsfor the 219 item parameters in Model 1, forexample, ranged from 0.94 to 1.14, with theexception of a single parameter with an infitmeansquare of 1.28. In no models were therevalues below 0.75 or above 1.33. It must be noted,however, that item-parameter level fit statistics arenot necessarily sensitive to systematic misfit due to

unmodelled dimensionality or sources of local

dependence.As can be seen in Table 1, there was successive

improvement in overall model fit from Model 1 toModel 3 (Model 2 compared to Model1: DLLð2Þ�1,177, pB.001; Model 3 compared

to Model 2: DLLð7Þ�611, pB.001). Thus thenull hypotheses that a unidimensional modeldescribed the data as well as a two-dimensionalmodel, and that a two-dimensional model de-

scribed the data as well as the four-dimensionalmodel, were both rejected. This seems to provideinitial support for the construct structure hypothe-

sised by Mayer and colleagues.In the two-dimensional model, the Experien-

tial and Strategic areas correlated with oneanother at r�.48. The estimated correlationsbetween the dimensions in the four-dimensional

model are given in Table 2.Model 4 represents the three-factor model

preferred by Fan et al. (2010) in theirmeta-analysis of factor-analytic studies of theMSCEIT, with Perceiving and Using Emotions

Table 1. Likelihood and reliability for one-, two-, four-, and three-dimensional models

Model

Final likelihood; estimated

parameters

Person-separation

(EAP) reliability

Chi-square test vs.

previous model BIC

Model 1: Unidimensional (general EI) 123,794; 220 .863 * 125,237

Model 2: Two-dimensional (experiential and

strategic EI)

122,607; 222 .849; .779; r�.477 Change: 1,187;

df: 2; pB.01

124,063

Model 3: Four-dimensional (perceiving, using,

understanding, and managing)

121,996; 229 .853; .717; .777; .759 Change: 611; df: 7;

pB.01

123,498

Model 4: Three-dimensional (experiential,

understanding, and managing)

122,473; 225 .841; .755; .813 * 123,949

7 There are no absolute standards for what constitutes a misfitting item. Adams and Khoo (1996) suggested a rule of thumb of

flagging items with infit meansquares less than 0.75 or greater than 1.33. Bond and Fox (2007) gave several rules of thumb for

acceptable values in different situations, the most conservative of which (for high-stakes educational tests) is 0.8 to 1.2. Fit statistics

are sensitive to sample size, but the present sample (n�707) is not unreasonably large or small.

Table 2. Correlations between dimensions for Model 3

Perceiving Using Understanding Managing

Perceiving 1

Using .561 1

Understanding .325 .459 1

Managing .294 .583 .700 1

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 511

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

combining to form a single factor. This modelalso fits better than Model 2 (DLLð3Þ�134,pB.001); however, Model 3 (four dimensions)fits better than the three-factor model(DLLð4Þ�526, pB.001), indicating that thethree-factor solution is not preferred in thisstudy.

Thus, when local dependence due to struc-tural features of the test is not taken intoaccount, there appears to be support for thefour-dimensional model of EI proposed by theMSCEIT’s authors.

Test-, area-, and branch-level models withrandom effects of task

Including a fixed effect of each MSCEIT taskreflects the idea that there might be, on average,more agreement among test takers about thecorrect answers to items on some tasks thanothers (that is, it will be on average easier toselect a more-correct answer to items on sometasks than others). Including a random effect ofeach MSCEIT task reflects the idea that theextent to which items on particular tasks are moreor less difficult interacts with the person.

As is presented in Table 3, a model with bothrandom and fixed effects of task did not appear tofit as well as a model with only random effects.Furthermore, although the main effects of thedimensions were statistically significantly differentfrom zero, their magnitude (ranging from �0.04to 0.11 logits) was trivial compared to the range ofdifficulty of items (from approximately �3.8logits to �0.8 logits). Thus, it did not appearthat the inclusion of fixed effects of facets

provided an advantage when random facets effectswere included, and thus all further models includeonly random effects of task.

Models 5, 6, and 7 parallel Models 1, 2, and 3,respectively, with the addition of eight randomeffects of facets. Each item now loads onto bothits primary construct factor and a specific methodsfactor. These models fit eight more parametersthan their counterpart models above.

Model 5 fits significantly better than Model 1(DLLð8Þ�2,591, pB.001), Model 6 fits signifi-cantly better than Model 2 (DLLð8Þ�1,511,pB.001), and Model 7 fits significantly betterthan Model 3 (DLLð8Þ�911, pB.001), indicatingthat in all three cases we can reject the nullhypothesis that the model without random effectsof tasks fits the data as well as the model withthese random effects. Thus, it appears that asignificant amount of model variance is withintask, even controlling for construct-relatedvariance.

The differences in fit between Models 5, 6, and7 (shown in Table 4) are telling. Model 6 fits thedata better than Model 5 (DLLð2Þ�107,pB.001), although this difference is trivial (a0.09% reduction in deviance) compared to thedifference between Models 1 and 2 (a 0.96%reduction in deviance; thus, this difference is overten times larger when facets are not modelled).Model 7 does not fit the data better than Model 6(DLLð7Þ�11, p�.05), quite unlike the compar-ison of Models 3 and 2. Thus it appears that thesuccessive better fit of higher-dimensional modelsobserved in the comparisons of Models 1, 2, and 3all but vanishes when facet-related variance iscontrolled.

Table 3. Comparison of a model with fixed and random facets effects and a model with only random effects

Model

Final likelihood; estimated

parameters

Person-separation

(EAP) reliability

Chi-square test vs.

previous model BIC

Unidimensional (general EI) model with both

fixed and random facet effects

121,206; 236 .789 * 122,754

Unidimensional (general EI) with only random

facet effects (Model 5)

121,203; 228 .789 * 122,698

MAUL

512 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

The correlation between the two primary

dimensions in Model 6 was estimated as r�.64

(compared to r�.48 for Model 2). Estimated

correlations between the four dimensions in

Model 7 are given in Table 5, and are also

uniformly higher than those from Model 3. This

provides further evidence that the distinction

between the primary dimensions observed in

Models 2 and 3 was in large part due to

differences in testing method; when these differ-

ences are held constant, the primary dimensions

appear far less distinct.The variance of the primary dimension and the

random task effects are shown in Table 6 for the

unidimensional and four-dimensional models.

The variances of the random facet effects are

quite large compared to the variance of the

primary dimensions, and in some cases consider-

ably larger, especially in the case of the Faces task,

and to a lesser extent the Pictures task. This again

highlights the extent to which variance in re-

sponse patterns is dominated by the specific type

and format of the item, rather than the general EI

construct or constructs. Why this seems to be

especially true for the tasks involving pictures is

not clear; an investigation of examinees’ cognitiveprocesses in responding to test items may behelpful in clarifying this issue.

Models for testlet- and facet-based iteminterdependence for the perceivingemotions branch

As discussed in an earlier section, there are (atleast) two potential sources of local item depen-dence on the MSCEIT: the eight item formats(‘‘facets’’) and twenty-five groups of three to fiveitems related to a common prompt (‘‘testlets’’). Sofar only the former has been modelled, and thusthese analyses still fail to account for local itemdependence due to the bundling of items underspecific prompts.

Unfortunately, the program ConQuest canhandle only fifteen random effects in a givenmodel, and thus modelling random effects for alltestlets is beyond the limits of this program(and, to the author’s knowledge, all otheritem-response modelling programs). However,a more constrained analysis is possible, whichmay shed light on the extent to which itemtestlets further distort estimation of modelfeatures.

The Perceiving Emotions branch of theMSCEIT was chosen for this demonstration,because it has the largest number of testlets(four pictures of faces and six pictures of land-scapes or art), as well as the biggest testlets (fiveitems associated with each picture). As is shown inTable 7, a unidimensional model fit just to thePerceiving Emotions items provides an estimated

Table 5. Correlations between dimensions for Model 7

Perceiving Using Understanding Managing

Perceiving 1

Using .778 1

Understanding .376 .687 1

Managing .324 .684 .810 1

Table 4. Likelihood and reliability for one-, two-, and four-dimensional models with random facet effects

Model

Final likelihood;

estimated parameters

Person-separation

(EAP) reliability

Chi-square test vs.

previous model BIC

Model 5: Unidimensional (general EI) plus

eight orthogonal task dimensions

121,203; 228 .810 * 122,698

Model 6: Two-dimensional (experiential and

strategic EI) plus eight orthogonal task

dimensions

121,096; 230 .764; .799; r�.642 Change: 107; df: 2; pB.01 122,604

Model 7: Four-dimensional (perceiving, using,

understanding, and managing EI) plus

eight orthogonal task dimensions

121,085; 237 .702; .740; .735; .761 Change: 11; df: 7; p�.14 122,642

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 513

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

person-separation reliability8 of .85. A model with

ten additional random effects for the item

testlets fits significantly better than the unidimen-

sional model (DLLð10Þ�757, pB.001) and has

an estimated reliability of .84. Adding in two

additional random effects for the facet of task

(faces versus artwork/landscapes) results in still

better fit (DLLð2Þ�417, pB.001) and a signifi-

cant drop in the estimated reliability, to .61.Results were similar for the other branches of

the MSCEIT. A unidimensional model fit to

the items on the Using Emotions branch

provided an estimated reliability of .68. This

was reduced to .64 when random effects of

testlets were included, and further to .49 when

random effects of the two facets were addition-

ally included. For the Managing Emotions

items, a unidimensional model yielded an esti-

mated reliability of .78, which was reduced only

to .77 when testlets were modelled and to .72

when the facets were additionally modelled.

There were no testlets on the Understanding

Emotions branch.These findings suggest that, in the case of the

MSCEIT, ignoring local dependence due to task

results in a much greater overestimation of

measurement precision than ignoring local de-

pendence due to testlets. It is logical that this

finding may extend to the investigation of di-

mensionality as well: in other words, properly

Table 7. Likelihood and reliability for perceiving emotions models

Model

Final likelihood;

estimated parameters

Person-separation

(EAP) reliability

Chi-square test vs.

simpler model BIC

Model P1: Unidimensional 41,432; 78 .850 * 41,945

Model P2: Unidimensional plus ten random

testlet effects

40,675; 88 .844 vs. model P1: Change:

757; df: 10; pB.001

41,254

Model P3: Unidimensional plus two random

facet effects and ten random testlet effects

40,258; 90 .606 vs. model P2: Change:

417; df: 2; pB.001

40,850

Table 6. Variance estimates of the theta and random facet variables under the random-effects facet models and

standard models

Unidimensional

Unidimensional

plus facets Four dimensional

Four dimensional

plus facets

Theta 1 .203 .191 .452 .410

Theta 2 .179 .130

Theta 3 .281 .235

Theta 4 .374 .340

Facet 1: Faces .559 .385

Facet 2: Facil .168 .135

Facet 3: Changes .110 .028

Facet 4: Mgmnt .210 .051

Facet 5: Pictures .355 .250

Facet 6: Sensat .121 .083

Facet 7: Blends .172 .100

Facet 8: Rltshps .298 .077

8 Person-separation reliability in IRT is directly analogous to familiar internal consistency measures in true score theory (i.e.,

Cronbach’s alpha/KR-20/KR-21) and can be interpreted on the same scale. Details of its computation are given in many sources

(see, for example, Wilson, 2005, pp. 146�147).

MAUL

514 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

modelling local dependence due to testlets wouldprobably not affect the model-fit comparisons orestimated dimension correlations of multidimen-sional models nearly as much as did accountingfor local dependence due to task type. Never-theless, the magnitude of the difference betweenModels 5 and 6 (the unidimensional versus two-dimensional models with random effects of task;DLLð2Þ�80, pB.001) was so trivial compared toother model comparisons that it is possiblemodelling random effects of testlets would movethis comparison into nonsignificance.

Finally, it should be noted that the analysespresented in this paper conceptually treat the eighttasks as independent methods, but, in fact, manyspecific measurement techniques are shared incommon by many or all of the tasks. Six of theeight tasks involve multiple ratings of a singlestimulus; six out of eight also involve the use of1-to-5 rating scales. Thus, even these analyses arelikely to underestimate the degree to whichsimilarity in item format (rather than a commonunderlying psychological variable) causes sharedvariance in item responses.

DISCUSSION

This study has, for the first time, taken a ‘‘full-information’’ approach to the modelling of theMSCEIT. By modelling the test at the item level,it was possible to simultaneously model*andthus differentiate*hypothesised construct andstructural sources of common variance in itemresponse patterns. These models suggest that thechoice of task format contributes to variance inresponse patterns to a far greater extent than thedimensionality hypothesised to exist in the under-lying EI variable, and that when local dependenceamong test items due to a common tasktype and item response format is taken intoaccount, there is little reason to prefer a higher-dimensional model of EI over a unidimensionalmodel. Thus there is insufficient statistical evi-dence to interpret the specific abilities sampled onthe MSCEIT as forming relationships in linewith the predictions of the four-dimensional EI

model. Notably, this finding diverges from thefindings of many earlier studies, which analysedcovariance matrices of parcelled task scores.

This divergence suggests that substantive re-sults from an item-level analysis can differfundamentally, rather than merely incrementally,from results of an analysis conducted on parcels ofitems. It stands to reason that this may beespecially likely to be true when there arestructural sources of inter-item dependence thatcannot be modelled once the items are parcelled.In the case of the MSCEIT, parcelling task scoresnot only reduces the number of data points persubject from 141 to 8, but also precludes thespecific investigation of the relative contributionsof structural features of the test to variance in itemresponses.

It does not appear that parcel-level analyses arepreferred to item-level analyses in the case of theMSCEIT for any theory-based reason. Out of thedozen previous studies of the MEIS andMSCEIT’s internal structure cited in this paper,not one of them explicitly addresses the possibilityof item-level analyses. Given this, it seems likelythat researchers have chosen to use linear factoranalysis (which requires covariance matrices ofcontinuously arranged variables) rather thanmodes of analysis appropriate for categoricalvariables (which would allow modelling at theitem level) based mainly on its greater accessibilityand familiarity, rather than strong theoreticalrationale.

The findings presented in this paper highlightthe importance of carefully modelling sources oflocal item dependence when conducting investi-gations of the dimensional structure of a psycho-logical variable. Any structural source ofdependence among item-response patterns canbe mistaken for dependence due to a variable’sdimensional structure if ignored. Results fromitem-level investigations can paint a less encoura-ging picture of the internal structure of a testthan one might want to find in the (naturally)confirmation-focused initial process of testvalidation. Nevertheless, it is important tofully understand the role of multidimensionalstatistical investigations in the process of construct

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 515

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

validation, and not mistake one source of sharedvariance for another.

These findings converge with those of Følles-dal and Hagtvet (2009), in terms of the extent towhich estimates of reliability of measurement areattenuated by modelling multiple sources ofvariance, but also extend those findings to theinvestigation of test dimensionality, essentiallybridging the methodological gap between theirstudy, which employed generalisability theory toexamine measurement reliability of the MSCEIT,and other studies, cited above, which have usedfactor analysis on averaged task scores to examinethe dimensional structure of the MSCEIT.

Given these results and that the MSCEIT isthe primary source of information concerning thedimensional structure of the Mayer�Saloveymodel of EI, there does not presently appear tobe compelling empirical support for the idea thatEI can be conceptualised as a group of distinct butrelated abilities. The four-branch model of EIremains an intriguing theory without clear em-pirical support. Furthermore, it should be notedthat even though these analyses appear to supporta model with a single primary dimension (e.g.,Model 5) over alternative models, this by itselfcannot be taken as evidence of the existence ofemotional intelligence, as the primary dimensionof individual differences (the h parameter) absorbsevery source of shared variance in responsepatterns across the entire test, which could includesocial deviance, familiarity with response scales,test-taking motivation, test-wiseness, agreeable-ness, and many other possibilities.

Conclusions

The dimensional structure of the MSCEITappears to be driven to a much greater extentthan previously realised by the choice of specificmeasurement techniques (the facets of the test), tothe point that the evidence for the validity of thefour-dimensional model of emotional intelligencebased on the internal structure of the MSCEITappears insufficient. However, this study, like allstudies of the MSCEIT, is limited by the specifictasks and items that have been sampled on that

test. Future work on the emotional intelligence

construct may either chose to revise the dimen-

sional theory, or to sample more broadly from

each of the dimensions of emotional intelligence,

which could further help clarify the extent to

which the specific choice of measurement techni-

ques has influenced the more general discussion of

EI, and the extent to which specific emotion-

related tasks can be regarded as related under a

common, ability-based theoretical framework.

Broader sampling from each of the four proposed

branches of emotional intelligence would also

provide the opportunity for more nuanced inves-

tigation of the extent to which individual item

formats (of which there are currently only eight,

the variances of which varied fairly significantly in

the present study), rather than proposed branches

of EI, account for variance in item response

patterns. Future analytic work on the dimensional

structure of new psychological constructs will

benefit from modelling techniques that can

account for test structure- or method-related

violations of the conditional independence re-

quirement of measurement.Just as the idea of EI has much to offer to our

understanding of cognitive and affective indivi-

dual differences, the principles and analytic frame-

works of modern measurement theory have much

to offer to the exploration of EI, and of new

psychological constructs more broadly. It will be

exciting to see what is learned as both efforts

move forward.

Manuscript received 5 September 2010

Revised manuscript received 19 April 2011

Manuscript accepted 5 May 2011

First published online 26 July 2011

REFERENCES

Adams, R. J., & Khoo, S. T. (1996). Quest. Melbourne,

Australia: ACER.Adams, R. J., Wilson, M., & Wang, W. (1997). The

multidimensional random coefficients multinomial

logit model. Applied Psychological Measurement, 21,

1�23.

MAUL

516 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

Bandalos, D. L. (2001). The effects of item parceling

on goodness-of-fit and parameter estimate bias in

structural equation modeling. Structural Equation

Modeling, 9, 78�102.Bandalos, D. L., & Finney, S. J. (2002). Item parceling

issues in structural equation modeling. In G. A.

Marcoulides & R. E. Schumacker (Eds.), New

developments and techniques in structural equation

modeling (pp. 269�296). Mahwah, NJ: Lawrence

Erlbaum Associates, Inc.Bentler, P. M., & Jamshidian, M. (1994). Gramian

matrices in covariance structural equation models.

Applied Psychological Measurement, 18(1), 79�94.Birnbaum, A. (1968). Some latent trait models and

their use in inferring an examinee’s ability. In F. M.

Lord & M. R. Novick (Eds.), Statistical theories of

mental test scores (pp. 397�479). Reading, MA:

Addison-Wesley.Bond, T. G., & Fox, C. M. (2007). Applying the Rasch

model: Fundamental measurement in the human

sciences. Mahwah, NJ: Lawrence Erlbaum Associ-

ates, Inc.Ciarrochi, J., Chan, A. Y. C., & Caputi, P. (2000). A

critical evaluation of the emotional intelligence

construct. Personality and Individual Differences, 28,

539�561.Day, A. L., & Carroll, S. A. (2004). Using an ability-

based measure of emotional intelligence to predict

individual performance, group performance, and

group citizenship behaviors. Personality and Indivi-

dual Differences, 26, 1443�1458.Fan, H., Jackson, T., Yang, X., Tang, W., & Zhang, J.

(2010). The factor structure of the Mayer�Salovey�Caruso Emotional Intelligence Test V 2.0

(MSCEIT): A meta-analytic structural equation

modeling approach. Personality and Individual Dif-

ferences, 48, 781�785.Føllesdal, H., & Hagtvet, K. A. (2009). Emotional

intelligence: The MSCEIT from the perspective of

generalizability theory. Intelligence, 37, 94�105.Gignac, G. E. (2005). Evaluating the MSCEIT 2.0 via

CFA: Corrections to Mayer et al., 2003. Emotion, 5,

233�235.Gorsuch, R. L. (1997). Exploratory factor analysis: Its

role in item analysis. Journal of Personality Assessment,

68(3), 532�560.Joreskog, K. G. (1971). Statistical analysis of sets of

congeneric tests. Psychometrika, 36, 109�133.Keele, S. M., & Bell, R. C. (2008). The factorial

validity of emotional intelligence: An unresolved

issue. Personality and Individual Differences, 44, 487�500.

Kline, R. B. (2005). Principles and practice of structural

equation modeling. New York, NY: Guilford Press.Lazarsfeld, P. F., & Henry, N. W. (1968). Latent

structure analysis. Boston, MA: Houghton Mifflin.Legree, P. J., Psotka, J., Tremble, T., & Bourne, D. R.

(2005). Using consensus based measurement toassess emotional intelligence. In R. Schulze & R.D. Roberts (Eds.), Emotional intelligence: An inter-

national handbook (pp. 155�179). Cambridge, MA:Hogrefe .

Linacre, M. (1989). Many-faceted Rasch measurement.Chicago, IL: MESA Press.

Masters, G. N. (1982). A Rasch model for partial creditscoring. Psychometrika, 47, 149�174.

Maul, A. (2011). The factor structure and cross-testconvergence of the Mayer�Salovey�Caruso modelof emotional intelligence. Personality and Individual

Differences, 50, 457�463.Maul, A. (in press). The validity of the Mayer�

Salovey�Caruso Emotional Intelligence Test(MSCEIT) as a measure of emotional intelligence.Emotion Review.

Mayer, J. D., Caruso, D. R., & Salovey, P. (1999).Emotional intelligence meets traditional standardsfor an intelligence. Intelligence, 27, 267�298.

Mayer, J. D., Roberts, R. D., & Barsade, S. G. (2008).Human abilities: Emotional intelligence. Annual

Review of Psychology, 59, 507�536.Mayer, J. D., & Salovey, P. (1997). What is emotional

intelligence? In P. Salovey & D. Sluyter (Eds.),Emotional development and emotional intelligence:

Educational implications. New York, NY: BasicBooks.

Mayer, J. D., Salovey, P., & Caruso, D. R. (2002).Mayer�Salovey�Caruso Emotional Intelligence Test

(MSCEIT) user’s manual. Toronto, ON: MHS.Mayer, J. D., Salovey, P., Caruso, D. R., & Sitarenios,

G. (2003). Measuring emotional intelligence withthe MSCEIT V 2.0. Emotion, 3, 97�105.

Mellenbergh, G. J. (1994). Generalized linear itemresponse theory. Psychological Bulletin, 115, 300�307.

Palmer, B., Gignac, G., Manocha, R., & Stough, C.(2005). A psychometric evaluation of the Mayer�Salovey�Caruso Emotional Intelligence Test Ver-sion 2.0. Intelligence, 33, 285�305.

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004).Generalized multilevel structural equation model-ing. Psychometrika, 69, 167�190.

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 517

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

Raftery, A. E. (1993). Bayesian model selection in

structural equation models. In K. A. Bollen & J. S.

Long (Eds.), Testing structural equation models

(pp. 163�180). Newbury Park, CA: Sage.Rasch, G. (1960). Probabilistic models for some intelli-

gence and achievement tests. Copenhagen, Denmark:

Danish Institute for Educational Research (Ex-

panded edition, 1980. Chicago, IL: University of

Chicago Press).Roberts, R., Schulze, R., O’Brien, K., Reid, J.,

MacCann, C., & Maul, A. (2006). Exploring the

validity of the Mayer�Salovey�Caruso Emotional

Intelligence Test (MSCEIT) with established emo-

tions measures. Emotion, 6, 663�669.Roberts, R., Zeidner, M., & Matthews, G. (2001).

Does emotional intelligence meet traditional stan-

dards for an intelligence? Some new data and

conclusions. Emotion, 1, 196�231.Rode, J. C., Mooney, C. H., Arthaud-Day, M. L.,

Near, J. P., Rubin, R. S, Baldwin, T. T., et al.

(2008). An examination of the structural, discrimi-

nant, nomological, and incremental predictive

validity of the MSCEIT C V2.0. Intelligence, 36,

350�366.Rosenbaum, P. R. (1988). Item bundles. Psychometrika,

53, 349�359.Rossen, E., Kranzler, J. H., & Algina, J. (2008).

Confirmatory factor analysis of the Mayer�Salovey�Caruso Emotional Intelligence Test v 2.0

(MSCEIT). Personality and Individual Differences,

44, 1258�1269.Sireci, S. G., Thissen, D., & Wainer, H. (1991). On

the reliability of testlet-based tests. Journal of

Educational Measurement, 28, 237�247.Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized

latent variable modeling: Multilevel, longitudinal, and

structural equation models. Boca Raton, FL: Chap-

man & Hall/CRC.Wang, W., & Wilson, M. (2005a). The Rasch testlet

model. Applied Psychological Measurement, 29, 296�318.

Wang, W., & Wilson, M. (2005b). Exploring local

item dependence using a random-effects facet

model. Applied Psychological Measurement, 29, 126�149.

Wilhelm, O. (2005). Measures of emotional intelli-

gence: Practice and standards. In R. Schulze & R.

Roberts (Eds.), Emotional intelligence: An interna-

tional handbook (pp. 131�154). Cambridge, MA:

Hogrefe & Huber.

Wilson, M. (2005). Constructing measures: An item

response modeling approach. Mahwah, NJ: LawrenceErlbaum Associates, Inc.

Worthke, W. (1993). Nonpositive definite matrices instructural equation modeling. In K. A. Bollen & J.S. Long (Eds.), Testing structural equation models

(pp. 256�293). Newbury Park, CA: Sage.Wu, M. L., Adams, R. J., Wilson, M., & Haldane, S.

A. (2007). ACER ConQuest Version 2.0 [computerprogram]. Hawthorn, Australia: ACER.

Yen, W. M., & Fitzpatrick, A. R. (2006). Itemresponse theory. In R. L. Brennan (Ed.), Educational

measurement (4th ed., pp. 111�154). Santa Barbara,CA: Greenwood Publishing.

APPENDIX

Any statistical model of the relationship betweenobserved variables and one or more underlying, ‘‘latent’’variables hypothesised to be responsible for variation inthose observed variables can be termed a latent variable

model. Latent variable models include factor models forcontinuous observed variables and continuous latentvariables (e.g., Joreskog, 1971), item response modelsfor categorical observed variables and continuous latentvariables (e.g., Birnbaum, 1968; Rasch, 1960), latentclass models for categorical observed variables andcategorical latent variables (e.g., Lazarsfeld & Henry,1968), and latent profile models for continuous ob-served variables and categorical latent variables (e.g.,Lazarsfeld & Henry, 1968). All of these models can beformulated as special cases of a generalised latentvariable model (Skrondal & Rabe-Hesketh, 2004; seealso Mellenbergh, 1994; Rabe-Hesketh, Skrondal, &Pickles, 2004).

Prior empirical investigations of the MSCEIT havecommonly employed confirmatory factor analysis(CFA) models. A linear CFA model takes the follow-ing form:

yin ¼ bi þ k1ig1n þ � � � þ kqigqn þ ein (1Þ

Where yin is the value of the ith indicator variable forperson n, bi is the intercept for indicator i, k1i, � � �, kqi

are the factor loadings for the q common factorsg1j � � �gqn, with variance-covariance matrix W, and ein

is the unique factor (i.e., measurement error) forindicator i. For purposes of identification, commonand unique factors are typically set to have zero means,and the variances of common factors are constrained to1 (i.e., wii ¼ 1). It is usually assumed that the common

MAUL

518 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

factors are uncorrelated with the unique factors, andthat the correlations among the common factors andamong the unique factors can be freely estimated orconstrained to zero depending on the model. In thisformulation, common and unique factors are generallyassumed to be normally distributed, making the modelappropriate for continuous indicators.

Responses to individual test items are typicallyscored ordinally rather than continuously. Thus CFArequires some adaptation to be applicable to itemanalysis. There are a number of ways that suchadaptation can be accomplished. One is the inclusionof a link function relating the linear predictor to themean of a probability distribution function for eachitem. If a logit (or cumulative normal, i.e., probit) linkfunction is selected, a multidimensional IRT model isobtained, as in equation (2).

lnðPnij=Pniðj�1ÞÞ ¼ bij þ k1ig1n þ � � � þ kqigqn (2Þ

In this formulation, bij are parameters relating to the‘‘difficulty’’ of the items or steps within each item, g1n isinterpreted as the ‘‘ability’’ of person n on the first latenttrait (i.e., the first common factor or dimension), andk1i is interpreted as the slope (also termed ‘‘discrimina-tion’’ or ‘‘loading’’) of item i on the first dimension (andanalogously for the other dimensions).

This notation illustrates the correspondence be-tween CFA and IRT models. However, it is morecommon for item response models to be presentedusing different notation. Equation (2) can be rewrittenas:

lnðPnij=Pniðj�1ÞÞ ¼ aihn þ dij (3Þ

Where ai replaces the row vector of factor loadings(k1i; � � � ; kqi), hn replaces the column vector of commonfactors gn ¼ ðg1n; � � � ;gqnÞ

0, and dij replaces bij . The

formulation given here corresponds to a multidimen-sional extension of the Masters (1982) partial creditmodel, in which category difficulties are freely esti-mated for each item.

In the 2PL formulation of an IRT model, thecontents of the ai vector are estimated from the data.However, the contents of this vector could also bespecified a priori by the test developer, if there istheory-based reason to do so (as is argued to be the case,for example, in some applications of multidimensionalextensions of the Rasch model; see Adams, Wilson, &Wang, 1997). Additionally, when at least one elementof the ai vector is fixed, variances of dimensions can beestimated as w instead of fixed to 1. Often the elementsof ai are fixed at simply 1 or 0, but can take on other

values when there is theory-based reason to do so, as inthe case of the present study. Somewhat differentanalytic purposes are served by fixing the contents ofai based on theory versus allowing it to be freelyestimated; sometimes this is described as the differencebetween ‘‘fitting the data to the model’’ versus the moretraditional statistical orientation on ‘‘fitting the modelto the data’’ (e.g., Yen & Fitzpatrick, 2006, p. 124). Inthe former case, the focus is not on finding a statisticalmodel that maximally explains variation in observations,but rather in testing specified (and possibly competing)theory-based models against observed data to determineto what extent the data support the theory behind thosemodels. In the present case, it is the theory ofconsensus-based scoring articulated by Mayer et al.(e.g., 2002) and others (Legree, Psotka, Tremble, &Bourne, 2005) that provides the theoretical foundationfor the specification of the contents of the ai vector. Inthe models described in this paper, the ai vectorcontents are held constant across models, and thepurpose of the model comparisons is to determinewhich of several hypothesised dimensional structuresbest explains variation in item responses, rather thanjust to maximally explain such variation.

One purpose of this brief exposition was to illustratehow item response models are not fundamentallydifferent from confirmatory factor models, but rathercan simply be viewed as confirmatory factor models plusa logit link function that relates the linear predictor toordinally scored item responses. Thus, the analysesdescribed in the present paper are fully in the CFAtradition of other analyses of the MSCEIT, but havemoved these analyses to the item level with theaccompanying adjustments.

Modelling local dependence. A standard assumptionof all measurement models is that what is shared incommon among test items is the latent variable(s), andnothing else. Formally this is known as the requirementof conditional or local independence. This requirementis violated to the extent to which dependencies existamong specific item responses that are not due to thecommon latent variable.

A common source of such local dependence is thepresence of ‘‘testlets’’ (or, alternatively, ‘‘item bundles’’;e.g., Rosenbaum, 1988; Wang & Wilson, 2005a), thetypical example of which is a group of items that sharecommon stimulus material. An example from theMSCEIT would be a picture of a landscape, whichrequires respondents to make judgments about thedegree of expression of five different emotions (e.g.,happiness, fear, etc.). In this case responses to the five

THE STRUCTURE OF EI AT THE ITEM LEVEL

COGNITION AND EMOTION, 2012, 26 (3) 519

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012

items that refer to a single landscape are likely to not beconditionally independent, as all depend on the re-spondent’s interpretation of a single image.

A second kind of bundling also occurs on theMSCEIT, although it will be more consistent withthe wider literature to refer here to ‘‘facets’’ of themeasurement situation, in that items are organised intoeight distinct tasks. Responses to items within aparticular task will share more variance in commonthan items sampled randomly from the possible targetdomain of the construct, possibly biasing investigationsof the dimensional structure of the test.

In the context of IRT modelling, the effects of facetscan be modelled as fixed effects (e.g., Linacre, 1989), inwhich the effect of the facet in question is held to beone that is constant over persons, or as effects that arerandom over persons (Wang & Wilson, 2005a, 2005b),in addition to or instead of a main fixed effect. Forpolytomous items, a testlet (or facet) model can take thefollowing form (Wang & Wilson, 2005b):

logðPnij=Pniðj�1ÞÞ ¼ hn � dij � ckðiÞ þ cnkðiÞ (4Þ

In which dij is the item parameter associated with step jin item i, ck is the overall difficulty of facet k, and cnkðiÞ isthe random effect of facet k. The random effects offacets are constrained to be orthogonal to one anotherand to the random intercept hn, for both technical andinterpretational reasons. Thus cnkðiÞ can be interpretedas a dimension of individual differences that affects theprobability of success on items on a specific testlet,independent of the primary dimension and othertestlets. Conceptually, cnkðiÞ partials out only variancein item response patterns that can be uniquely attrib-uted to the fact that items share a common task;response variance shared in common with other itemsmeant to measure a common dimension is stillattributed to hn. Wang and Wilson’s formulation ofthese models assumed a single primary dimension, butthey are readily extended into the multidimensionalcase.

MAUL

520 COGNITION AND EMOTION, 2012, 26 (3)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

olor

ado

at B

ould

er L

ibra

ries

] at

13:

48 1

9 D

ecem

ber

2012