Psychometric limitations of the Personality Assessment Inventory: A reply to Morey's (1995)...

15
Bond University ePublications@bond Humanities & Social Sciences papers Faculty of Humanities and Social Sciences 6-1-1996 Psychometric limitations of the Personality Assessment Inventory: A reply to Morey's (1995) rejoinder Gregory J. Boyle Bond University, [email protected] Follow this and additional works at: hp://epublications.bond.edu.au/hss_pubs Part of the Psychology Commons is Journal Article is brought to you by the Faculty of Humanities and Social Sciences at ePublications@bond. It has been accepted for inclusion in Humanities & Social Sciences papers by an authorized administrator of ePublications@bond. For more information, please contact Bond University's Repository Coordinator. Recommended Citation Gregory J. Boyle. (1996) "Psychometric limitations of the Personality Assessment Inventory: A reply to Morey's (1995) rejoinder" Journal of Psychopathology and Behavioral Assessment, 18 (2), 197-203: ISSN 1573-3505. hp://epublications.bond.edu.au/hss_pubs/791

Transcript of Psychometric limitations of the Personality Assessment Inventory: A reply to Morey's (1995)...

Bond UniversityePublications@bond

Humanities & Social Sciences papers Faculty of Humanities and Social Sciences

6-1-1996

Psychometric limitations of the PersonalityAssessment Inventory: A reply to Morey's (1995)rejoinderGregory J. BoyleBond University, [email protected]

Follow this and additional works at: http://epublications.bond.edu.au/hss_pubsPart of the Psychology Commons

This Journal Article is brought to you by the Faculty of Humanities and Social Sciences at ePublications@bond. It has been accepted for inclusion inHumanities & Social Sciences papers by an authorized administrator of ePublications@bond. For more information, please contact Bond University'sRepository Coordinator.

Recommended CitationGregory J. Boyle. (1996) "Psychometric limitations of the Personality Assessment Inventory: A replyto Morey's (1995) rejoinder" Journal of Psychopathology and Behavioral Assessment, 18 (2), 197-203:ISSN 1573-3505.

http://epublications.bond.edu.au/hss_pubs/791

1

Psychometric Limitations of the Personality Assessment

Inventory: A Reply to Morey's (1995) Rejoinder

Gregory J. Boyle

Department of Psychology

Bond University

Note. Morey referred to the PAI instrument as a "test." However, it is not really

appropriate to refer to self-report personality scales as "tests," particularly when

there are no right or wrong answers, as such. Only objective, performance

measures (T-data) can rightly claim the title "test" (cf. Cattell & Kline, 1977).

2

Some psychometric problems with the Personality Assessment Inventory

(PAl) were observed by Boyle and Lennon (1994). However, Morey (1995)

asserted there were "several methodological and conceptual limits" and alternative

explanations of the Boyle and Lennon data. Although Morey asserted that age and

clinical status were confounded, Boyle and Lennon statistically partialled out

variance due to age, using ANCOVA procedures. Morey's description of Boyle

and Lennon's sample as "unusual" was strange-schizophrenic and alcoholic

patients in a psychiatric hospital comprised the clinical groups-the PAl was

designed specifically to assess psychopathology in such patients. Although Morey

claimed that alpha coefficients were misinterpreted, Boyle and Lennon based their

conclusions solely on the obtained coefficients. Morey's attempt to downplay the

finding of suboptimal stability for several PAI scales also runs counter to the

empirical results actually observed. Finally, Morey attempted to minimize the role

of factor analysis in investigating construct validity, apparently to deflect attention

from deficiencies in the factor analysis of a clinical sample reported in the PAl

manual.

Morey (1995) criticized Boyle and Lennon's (1994) finding that the

median test-retest reliability coefficient for the Personality Assessment Inventory

(PAI) 2 scales measured over a 28-day interval was only .73. Morey claimed that

there was a restriction in the range of PAl scores, so that the less than optimal

reliability was largely a statistical artifact. Yet perusal of actual scores obtained by

a large subsample of 70 normal individuals from the general adult population at

large (in whom personality characteristics would be expected to be relatively

stable across an interval of only 4 weeks) revealed a wide range of PAl scores,

despite Morey's suggestion to the con-trary (mean scale scores alone ranged from

3

2.80 to 22.16, and standard deviations ranged from 2.75 to 9.67). In any event,

restriction of variance was not a particularly problematic issue in Boyle and

Lennon's study, because the primary concern was whether or not PAl scale scores

were ranked similarly or dissimilarly by the normal group on each measurement

occasion. Had the PAl scales been more stable, a greater similarity of rankings

would have been observed, than was the case. Although Morey regarded his

argument as "self-explanatory," the alleged "restriction of range" of PAl scores

within Boyle and Lennon's normal sample was not supported by the actual

empirical evidence.

Morey argued about minimum reliability "cutoff" points, despite the

absence of any such claim by Boyle and Lennon. Ideally, if perfectly reliable, PAl

scales would have exhibited stability coefficients of 1.00. A median coefficient of

.73 indicates that half the observed stability coefficients were actually lower than

this value-a clearly undesirable finding for a personality trait inventory. Indeed,

only 53% of the variance was common across both measurement occasions (i.e., at

least 47% of the measurement variance was associated with dissimilar rankings of

PAl scores).

In calculating test-retest coefficients, Morey stated, “admittedly, in the

psychopathology field this is rather difficult, since ethical considerations preclude

withholding treatment for purposes of generating reliability estimates." It was

precisely because of this dilemma that Boyle and Lennon chose to assess stability

in a normal sample-which was expected to produce upper-bound estimates of

reliability. There is no a priori reason not to investigate the psychometric

properties of a clinical personality instrument in more readily accessible, non-

psychiatric samples. For Morey to claim that, "this artifact must be recognized as a

4

complication" seems strange. This assertion was not based on empirical evidence,

but was merely presumed. Morey set up "a straw man" and then proceeded to

"knock it down." Scientific discourse should be based on concrete evidence, rather

than on unsupported allegations.

Morey stated that, "the PAl includes measures of anxiety, depressed affect,

and suicidal ideation, features that might be expected to fluctuate widely over the

course of one month." That might be true for certain clinical samples but would

not be expected among psychologically stable normal individuals. Indeed, the

proportion of variance common across both measurement occasions for the

anxiety scale was only 38%, so that no less than 62% of variance was due to

dissimilar rankings of scores, even among normal, psychologically stable

individuals. Likewise, for the Paranoia (PAR), and Antisocial Features (ANT)

scales, 58% and 60% of the variance was discrepant-i.e., more of the variance was

unreliable than reliable!

Using normal samples himself (despite his criticism of Boyle & Lennon),

Morey (1991) reported a stability coefficient of .90 for the Antisocial Features

(ANT) scale, thereby contradicting his assertion that normal samples are

inappropriate for evaluating the psychometric properties of the PAl instrument

and, by implication, contradicting his (1995) claim that there is undue restriction

of range in scores for normal samples. Evidently, Morey's argument about

"restriction of range" was refuted by his own empirical findings! In stating that the

WAIS-R scales do not all exhibit retest coefficients of .80, Morey referred to

intelligence as "a more stable construct" but failed to acknowledge that intellectual

functioning fluctuates markedly in response to biological, neuropsychological, and

situational factors (see Stankov, Boyle, & Cattell, 1995). In any event, sub-optimal

5

stability of WAIS-R scales would suggest that the instrument could stand

considerable refinement, especially in light of the work showing that such models

of intelligence are overly narrow (e.g., Boyle, 1995a; Cattell, 1987; Gardner,

1993; and Sternberg, 1994).

The finding of less than optimal test-retest coefficients led Boyle and Lennon to

point to the possibility that the PAl scales exhibit only relative stability, somewhat

akin to dynamic traits (cf. Cattell & Child, 1975), rather than more enduring

personality traits. Although Morey claimed that Boyle and Lennon had confused

dynamic traits with states, Morey himself failed to make the necessary conceptual

discriminations. As Boyle had pointed out in many previous papers (e.g., Boyle,

1979, 1983, 1985, 1988; Boyle & Cattell, 1984), stable trait measures should

exhibit high immediate test-re- test (dependability) coefficients, as well as high

longer-term retest (stability) coefficients, whereas transitory state measures should

exhibit high depend- ability but considerably lower stability-if the measures are

truly sensitive to situational fluctuations on different measurement occasions (cf.

Fernan- dez, 1990; Fernandez, Nygren, & Thorn, 1991). Dynamic traits [including

motivational dynamic traits such as those measured in the objective IPAT

Motivation Analysis Test (see Cattell, 1992)] theoretically should exhibit test-retest

coefficients intermediate in magnitude between those observed for stable traits, on

the one hand, and transitory states, on the other (see Cattell, 1973, p. 354) for a

detailed account of consistency coefficients and their implications for state,

dynamic trait, and enduring trait scales]. Without considering the complex issues

regarding test-retest reliability, as a function of the state-dynamic-trait continuum,

Morey's denial of the observed instability of several of the PAl scales cannot be

sustained logically.

6

With regard to item homogeneity estimates, Morey claimed that Boyle and

Lennon interpreted high alpha coefficients as "bad." Despite being a rather

dogmatic assertion, Morey contradicted himself when admitting that "internal

consistency can be too high." In fact, as Boyle (1991) has discussed, both "internal

consistency" and "item redundancy" are value judgments. Cronbach alpha

coefficients merely indicate the level of item homogeneity of a scale. Morey

further claimed that Boyle and Lennon's findings only raised the possibility of

excessive item redundancy in the PAl scales. Yet the obtained median coefficient

was no less than .83, indicating that for half of the PAl scales, alpha coefficients

were exceptionally high, exceeding this figure (cf. Boyle, 1991). Morey asserted

that, "There are simply too many influences upon the alpha coefficient to warrant

the conclusion that high alphas invariably indicate problems." However, Boyle

and Lennon only stated (p. 182) that the observed high alpha coefficients

"suggested the possibility of rather narrow scales, with excessive item

redundancy." (cf. Boyle, 1991).

Morey subsequently asserted that "high internal consistency is obviously

not a problem if a scale can be validated against external criteria." However,

Morey has not considered the implications of the breadth of measurement of a

construct. As Boyle (1991) pointed out, high alpha coefficients can be obtained if

all items in a scale are merely paraphrases of each other but the construct is

therefore being measured in a very limited, narrow fashion, and many of the items

are redundant and could be dispensed with. Just because an instrument is popular

(Morey referred to the Beck Depression Inventory and the Hamilton Rating Scale

as good measures against which to assess the external or concurrent validity of the

PAl), it does not follow that such instruments necessarily make good external

7

criterion measures, especially since both the BDI and the HRS also have some

psychometric deficiencies (see Boyle, 1985).

As for prevalence rates of alcoholic problems, it is unlikely that there are

major differences between the United States and Australia. Thus, 18% would be

the comparable figure for Queensland. Nevertheless, Morey was correct in

highlighting the particular composition of the non-psychiatric sample, since the

sample composition may have contributed to the apparent number of false

positives on the Alcohol Problems (ALC) scale. In these circumstances, it seems

likely that the ALC scale did not seriously overestimate the number of alcoholic

cases.

Morey questioned Boyle and Lennon's finding that several PAl scales did

not discriminate between schizophrenic and alcoholic clinical samples, and

asserted that there is no need for all PAl scales to exhibit discriminative validity.

However, it would seem reasonable to expect that schizophrenic and alcoholic

patients should differ significantly on the PAl scales labelled Schizophrenia

(SCZ), Dominance (DOM), and Warmth (WRM). That these scales failed to

differentiate between the two clinical groups raises questions concerning their

validity. Alluding to the inadequate discriminative validity of the MMPI clinical

scales (see the comprehensive critique by Helmes & Reddon, 1993) does not

detract from the poor discriminative validity of several of the PAl scales (and of

abnormal personality instruments, in general). Furthermore, Morey's suggestion

about removing protocols with high Negative Impression (NIM) may not

necessarily improve PAl scale discriminative validities.

Morey's criticism of the possible confounding effect of age differences in

the clinical and non-psychiatric groups ignored the very careful consideration

8

given the statistical handling of this variable. According to Boyle and Lennon (p.

178), "MANCOVAs were carried out on the data with gender and age as

covariates, in order to correct statistically for distorting effects due to those

variables. The main effect across groups was still significant ... after the effects of

age and gender were partialled out." Partialling out the variance of demographic

variables by treating them statistically as covariates clearly refutes Morey's claim

that demography and clinical status were confounded.

Morey also asserted that differing numbers of subjects in the cells of the

design was problematic for the discriminative validity of PAl scales. However,

Boyle and Lennon (p. 178) specifically pointed out that "in order to counteract the

effects of heterogeneity of variance due to unequal group sizes, a more stringent

criterion of statistical significance was used .... In addition . . . Bonferroni

corrections were applied to minimize finding significant between-group

differences due to chance alone." In any event, heterogeneity of variance due to

unequal group sizes should have increased the likelihood of finding significant

differences on the PAl scales. Since several of the scales still failed to discriminate

between the clinical groups, this further suggests inadequate discriminative

validities.

Morey attempted to downplay the finding that his factor analytic solution

for the clinical subjects was poorly replicated when identical "Little Jiffy"

procedures were employed (cf. Comrey & Lee, 1992). He suggested that a minor

typographical error in a single correlation coefficient was responsible for the

discrepant results. Thus, Morey suggested that, "I would encourage them [Boyle &

Lennon] to rerun their analyses correcting this value and I suspect they will

replicate the solution provided in the manual." Accordingly, the suggested minor

9

correction was made and the factor analysis rerun. This minor change made no

significant difference to the outcome (factor loadings, in general, differed only at

the third or fourth decimal place from the previous analysis by Boyle and Lennon).

Again, SPSS gave a warning message that the correlation matrix was "ill-

conditioned," that it was "not positive definite," and again, the Bartlett test of

sphericity could not be calculated, indicating that the matrix for the clinical sample

reported in Table 10.1 of the PAl manual fails to satisfy multivariate normality

assumptions required for a valid factor analysis.

Morey criticized Boyle and Lennon's factor analytic methodology, but his

own procedures and results were to some extent deficient. It is irrelevant how

many other studies Morey cites to support his higher-order factor solutions (e.g.,

Deisinger, 1995). Philosophical debate about theory and factor analysis and

seeking support from Costa and McCrae (1985) does not overcome the problem.

Indeed, the factor analytic work of Costa and McCrae itself leaves much to be

desired from a methodological and psychometric standpoint (see detailed critiques

by Block, 1995; and Boyle, Stankov, & Cattell, 1995, pp. 431-433).

Morey stated that "using a principal components-varimax factor

technique… does not imply that I believe psychopathological constructs are

orthogonal…" Nevertheless, construct validation of an instrument such as the PAl

is a detailed, extensive process, part of which includes consideration of its factor

analytic validity [see Grossarth-Maticek, Eysenck, & Boyle (1995) for a technical

discussion of construct validity in relation to personality instruments]. Morey's

higher-order factoring procedure unnecessarily precluded the possibility of

checking on the PAl factor validity.

10

Factor analytic procedures have been well discussed in several authorative

publications (e.g., Cattell, 1978; Comrey & Lee, 1992; Gorsuch, 1983; McDonald,

1985). Consideration of what constitutes appropriate factor analytic methodology

is critically important if valid factor solutions are to be obtained (cf. Boyle, 1988,

1993; Boyle & Stanley, 1986). For example, Morey recommended factor analyses

at the item-subscale level, yet it is well documented that item correlations are

notoriously unreliable. That is why Comrey (e.g., Hahn & Comrey, 1994) has

advocated the use of factored homogeneous item dimensions (FHIDs), Cattell

(1978) has recommended the use of item parcels, and Marsh (see Boyle, 1994) has

employed item-dyads as his smallest correlational units. As for Morey's call for

non-linear factor analytic procedures, the comments of Gorsuch (pp. 118-120) and

McDonald (1981) seem germane.

Finally, Morey concluded that "an unreflective application of criteria such

as ‘coefficient alpha is bad’ or ‘simple structure is good’ can be quite misleading

depending on the issue in question." Yet Boyle and Lennon never made such

dogmatic claims. It appears that Morey has misread the evidence in attempting to

deflect attention away from the empirically observed shortcomings highlighted in

Boyle and Lennon's study (cf. Boyle, 1995; and Boyle, Ward, & Steindl, 1994).

Discussion grounded more firmly in the empirical evidence, rather than on

supposition or semantics about the definition of terms such as moderator variables,

would do greater justice to the scientific issues.

References

Block, J. (1995). A contrarian view of the five-factor approach to personality

description. Psychological Bulletin, 117, 187-215.

11

Boyle, G. J. (1979). Delimitation of state-trait curiosity in relation to state anxiety

and learning task performance. Australian Journal of Education, 23, 70-82.

Boyle, G. J. (1983). Effects on academic learning of manipulating emotional states

and motivational dynamics. British Journal of Educational Psychology, 53,

347-357.

Boyle, G. J. (1985). Self-report measures of depression: Some psychometric

considerations. British Journal of Clinical Psychology, 24, 45-59.

Boyle, G. J. (1988). Exploratory factor analytic principles in motivation research.

In J. R. Nesselroade & R. B. Cattell (Eds.), Handbook of multivariate

experimental psychology (pp. 742-745). New York: Plenum.

Boyle, G. J. (1991). Does item homogeneity indicate internal consistency or item

redundancy in psychometric scales? Personality and Individual

Differences, 12, 291-294.

Boyle, G. J. (1993). Special review: Evaluation of the exploratory factor analysis

programs provided in SPSSX and SPSS/PC+. Multivariate Experimental

Clinical Research, 10, 129-135.

Boyle, G. J. (1994). Self-Description Questionnaire II. In D. J. Keyser & R. C.

Sweetland (Eds.), Test Critiques (Vol. 10, pp. 632-643). Kansas City, MO:

Test Corporation of America.

Boyle, G. J. (1995a). Measurement of intelligence and personality within the

Cattellian psychometric model. Multivariate Experimental Clinical

Research, 11, 47-59.

Boyle, G. J. (1995b). Review of the Personality Assessment Inventory. In J. C.

Conoley & J. lmpara (Eds.), Twelfth mental measurements yearbook.

Lincoln, NE: Buros Institute of Mental Measurements.

12

Boyle, G. J., & Cattell, R. B. (1984). Proof of situational sensitivity of mood states

and dynamic traits-ergs and sentiments-to disturbing stimuli. Personality

and Individual Differences, 5, 541-548.

Boyle, G. J., & Lennon, T. J. (1994). Examination of the reliability and validity of

the Personality Assessment Inventory. Journal of Psychopathology and

Behavioral Assessment, 16, 173-187.

Boyle, G. J., & Stanley, G. V. (1986). Application of factor analysis in

psychological research: Improvement of simple structure by computer

assisted graphic oblique transformation: A brief note. Multivariate

Experimental Clinical Research, 8, 175-182.

Boyle, G. J., Ward, J., & Lennon, T. J. (1994). Personality assessment inventory:

A confirmatory factor analysis. Perceptual and Motor Skills, 79, 1441-

1442.

Boyle, G. J., Stankov, L., & Cattell, R. B. (1995). Measurement and statistical

models in the study of personality and intelligence. In D. H. Saklofske &

M. Zeidner (Eds.), International Handbook of Personality and Intelligence.

New York: Plenum.

Cattell, R. B. (1973). Personality and mood by questionnaire. San

Francisco: Jossey-Bass. Cattell, R. B. (1978). The scientific use of factor

analysis in behavioral and life sciences. New York: Plenum.

Cattell, R. B. (1987). Intelligence: Its stmcture, growth and action. Amsterdam:

North Holland. Cattell, R. B. (1992). Human motivation objectively,

experimentally analyzed. British Journal of Medical Psychology, 65, 237-

243.

13

Cattell, R. B., & Child, D. (1975). Motivation and dynamic stmcture. London:

Academic. Cattell, R. B., & Kline, P. (1977). The scientific analysis of

personality and motivation. New York: Academic.

Comrey, A L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.).

Hillsdale, NJ: Erlbaum.

Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual.

Odessa, FL:

Psychological Assessment Resources. Deisinger, J. A (1995). Exploring the factor

structure of the Personality Assessment Inventory. Assessment, 2, 173-179.

Fernandez, E. (1990). Artifact in pain ratings, its implications for test-retest

reliability, and correction by a new scaling procedure. Journal of

Psychopathology and Behavioral Assessment, 12, 1-15.

Fernandez, E., Nygren, T. E., & Thorn, B. E. (1991). An open-transformed scale

for correcting ceiling effects and enhancing retest reliability: The example

of pain. Perception and Psychophysics, 49, 572-578.

Gardner, H. 0. (1993). Intelligence and intelligences: Universal principles and

individual differences. Archives de Psychologie, 61, 169-172.

Gorsuch, R. L. (1983). Factor analysis (rev. 2nd ed.). Hillsdale, NJ: Erlbaum.

Grossarth-Maticek, R., Eysenck, H. J., & Boyle, G. J. (1995). Method of test

administration as a factor in test validity: The use of a personality

questionnaire in the prediction of cancer and coronary heart disease.

Behaviour Research and Therapy, 33, 705-710.

Hahn, R., & Comrey, A L. (1994). Factor analysis of the NEO-PI and the Comrey

Personality Scales. Psychological Reports, 75, 355-365.

Helmes, E., & Reddon, J. R. (1993). A perspective on developments in assessing

14

psychopathology: A critical review of the MMPI and MMPI-

Psychological Bulletin, 113, 453-471.

McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of

Mathematical and Statistical Psychology, 34, 110-117.

McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale, NJ:

Erlbaum.

Morey, L. C. (1991). The Personality Assessment Inventory Professional Manual.

Odessa, FL: Psychological Assessment Resources.

Morey, L. C. (1995). Critical issues in construct validation: Comment on Boyle

and Lennon (1994). Journal of Psychopathology and Behavioral

Assessment, 17, 393-401.

Stankov, L., Boyle, G. J., & Cattell, R. B. (1995}. Models and paradigms in

personality and intelligence research. In D. H. Saklofske & M. Zeidner

(Eds.), International handbook of personality and intelligence. New York:

Plenum.

Sternberg, R. J. (1994). Experimental approaches to human intelligence. European

Journal of Psychological Assessment, 10, 153-161.