Assessing DSM–5 Level of Personality Functioning From Videotaped Clinical Interviews: A Pilot...

14
This article was downloaded by: [Universitaetsbibliothek Heidelberg] On: 15 November 2013, At: 05:50 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Personality Assessment Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hjpa20 Assessing DSM–5 Level of Personality Functioning From Videotaped Clinical Interviews: A Pilot Study With Untrained and Clinically Inexperienced Students Johannes Zimmermann a , Cord Benecke a , Donna S. Bender b , Andrew E. Skodol b , Henning Schauenburg c , Manfred Cierpka d & Daniel Leising e a Department of Psychology , University of Kassel , Germany b Department of Psychiatry , University of Arizona College of Medicine c Clinic for General Internal Medicine and Psychosomatics , University of Heidelberg , Germany d Institute for Psychosomatic Cooperation Research and Family Therapy , University of Heidelberg , Germany e Department of Psychology , Technical University of Dresden , Germany Published online: 13 Nov 2013. To cite this article: Johannes Zimmermann , Cord Benecke , Donna S. Bender , Andrew E. Skodol , Henning Schauenburg , Manfred Cierpka & Daniel Leising , Journal of Personality Assessment (2013): Assessing DSM–5 Level of Personality Functioning From Videotaped Clinical Interviews: A Pilot Study With Untrained and Clinically Inexperienced Students, Journal of Personality Assessment, DOI: 10.1080/00223891.2013.852563 To link to this article: http://dx.doi.org/10.1080/00223891.2013.852563 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transcript of Assessing DSM–5 Level of Personality Functioning From Videotaped Clinical Interviews: A Pilot...

This article was downloaded by: [Universitaetsbibliothek Heidelberg]On: 15 November 2013, At: 05:50Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Journal of Personality AssessmentPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/hjpa20

Assessing DSM–5 Level of Personality Functioning FromVideotaped Clinical Interviews: A Pilot Study WithUntrained and Clinically Inexperienced StudentsJohannes Zimmermann a , Cord Benecke a , Donna S. Bender b , Andrew E. Skodol b ,Henning Schauenburg c , Manfred Cierpka d & Daniel Leising ea Department of Psychology , University of Kassel , Germanyb Department of Psychiatry , University of Arizona College of Medicinec Clinic for General Internal Medicine and Psychosomatics , University of Heidelberg ,Germanyd Institute for Psychosomatic Cooperation Research and Family Therapy , University ofHeidelberg , Germanye Department of Psychology , Technical University of Dresden , GermanyPublished online: 13 Nov 2013.

To cite this article: Johannes Zimmermann , Cord Benecke , Donna S. Bender , Andrew E. Skodol , Henning Schauenburg ,Manfred Cierpka & Daniel Leising , Journal of Personality Assessment (2013): Assessing DSM–5 Level of Personality FunctioningFrom Videotaped Clinical Interviews: A Pilot Study With Untrained and Clinically Inexperienced Students, Journal ofPersonality Assessment, DOI: 10.1080/00223891.2013.852563

To link to this article: http://dx.doi.org/10.1080/00223891.2013.852563

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Journal of Personality Assessment, 1–13, 2013Copyright C© Taylor & Francis Group, LLCISSN: 0022-3891 print / 1532-7752 onlineDOI: 10.1080/00223891.2013.852563

Assessing DSM–5 Level of Personality Functioning FromVideotaped Clinical Interviews: A Pilot Study With Untrained

and Clinically Inexperienced Students

JOHANNES ZIMMERMANN,1 CORD BENECKE,1 DONNA S. BENDER,2 ANDREW E. SKODOL,2

HENNING SCHAUENBURG,3 MANFRED CIERPKA,4 AND DANIEL LEISING5

1Department of Psychology, University of Kassel, Germany2Department of Psychiatry, University of Arizona College of Medicine

3Clinic for General Internal Medicine and Psychosomatics, University of Heidelberg, Germany4Institute for Psychosomatic Cooperation Research and Family Therapy, University of Heidelberg, Germany

5Department of Psychology, Technical University of Dresden, Germany

Several authors have raised the concern that the DSM–5 Level of Personality Functioning Scale (LPFS) is relatively complex and theory laden,and thus might put high requirements on raters. We addressed this concern by having 22 untrained and clinically inexperienced students assessthe personality functioning of 10 female psychotherapy inpatients from videotaped clinical interviews, using a multi-item version of the LPFS.Individual raters’ LPFS total scores showed acceptable interrater reliability, and were significantly associated with 2 distinct expert-rated measuresof the severity of personality pathology. These findings suggest that, contrary to the previously mentioned concerns, successfully applying the LPFSto clinical cases might require neither extensive clinical experience nor training.

The fifth edition of the Diagnostic and Statistical Manual ofMental Disorders (DSM–5; American Psychiatric Association,2013) features an alternative model for the diagnosis of person-ality disorders (PDs). This model (which is published in SectionIII of DSM–5) includes a dimensional assessment of the severityof personality pathology: the Level of Personality FunctioningScale (LPFS; Bender, 2013; Bender, Morey, & Skodol, 2011;Morey, Bender, & Skodol, 2013; Morey et al., 2011). The LPFSwas designed to capture impairments in core capacities that cutacross all of the individual PD categories in DSM–IV (AmericanPsychiatric Association, 1994): identity, self-direction, empa-thy, and intimacy. The proposal of the LPFS received favorableresponses from several PD researchers (e.g., Clarkin & Huprich,2011; Kernberg, 2012; Livesley, 2012; Shedler et al., 2010), andis generally in line with a broad consensus that assessing theoverall severity of PD should be a central component in any fu-ture diagnostic manual (Bornstein, 1998; Bornstein & Huprich,2011; Crawford, Koldobsky, Mulder, & Tyrer, 2011; Leising &Zimmermann, 2011; Tyrer et al., 2011; Widiger & Trull, 2007).However, several authors raised the concern that the constructsfeatured in the LPFS are relatively complex and might put highrequirements on raters (Clarkin & Huprich, 2011; Leising &Zimmermann, 2011; Pilkonis, Hallquist, Morse, & Stepp, 2011;Pincus, 2011; Tyrer, 2012; Zimmermann et al., 2012). The maingoal of this study is to address this issue by testing whetheruntrained and clinically inexperienced psychology students areable to assess patients’ level of personality functioning withsufficient reliability and validity, based on clinical interviews.

Received June 19, 2013; Revised August 31, 2013.Address correspondence to Johannes Zimmermann, Department of Psychol-

ogy, University of Kassel, Hollandische Str. 36–38, 34127 Kassel, Germany;Email: [email protected]

WHY ASSESS THE LEVEL OF PERSONALITYFUNCTIONING IN DSM–5?

It has often been noted that the PD section in DSM–IV (whichis published without changes in Section II of DSM–5) has var-ious shortcomings (e.g., Bornstein, 1998; Clark, 2007; Lives-ley, 1998; Widiger & Trull, 2007; Wright & Zimmermann, inpress). In the following, we highlight three of these limitations,and explain how the LPFS is thought to help overcome them.Specifically, we address (a) the relatively vague and ineffectivegeneral diagnostic criteria, (b) the failure to acknowledge andexplicate the normative assumptions underlying PD diagnoses,and (c) the failure to represent the dimensional nature of per-sonality pathology. Note that we do not suggest that the LPFSis the only possible, or even the best possible, solution to theseproblems. However, we hope to clarify why members of theDSM–5 Work Group on Personality and Personality Disordersdeveloped this scale, and what might be gained from includingit in the new PD model in DSM–5 Section III.

First, several authors have argued that the general diagnosticcriteria for PD introduced in DSM–IV are relatively vague andineffective (Livesley, 1998; Morey et al., 2011; Parker et al.,2002). That is, they lack a strong theoretical and empirical ra-tionale, are largely uninformative about the pathological per-sonality processes common to, or underlying, the 10 specificPD categories, and are hardly referred to in existing PD mea-sures (but also see Johnson, First, Cohen, & Kasen, 2008, foran exception). Thus, a major goal of the DSM–5 Work Groupwas to elaborate on the core features of personality pathol-ogy that differentiate between persons with and without a PD.Although studies explicitly addressing common PD featuresare scarce, results indicate that many such features could besubsumed under “problems with the self” (e.g., identity dis-turbance, low self-directedness) and “problems with interper-sonal relationships” (e.g., isolation, uncooperativeness, fear of

1

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

2 ZIMMERMANN ET AL.

rejection; e.g., Gutierrez et al., 2008; Hopwood et al., 2011;Svrakic, Whitehead, Przybeck, & Cloninger, 1993; Turkheimer,Ford, & Oltmanns, 2008). Notably, similar higher order fac-tors emerged when factor analyzing various general criteria ofdisordered personality functioning (Parker et al., 2004). Therelevance of these domains is also emphasized in many majortheories of PD (Benjamin, 1996; Hopwood, Wright, Ansell, &Pincus, 2013; Kernberg & Caligor, 2005; Luyten & Blatt, 2013;Meyer & Pilkonis, 2005), as, for example, in Livesley’s (1998)definition of PD as a failure to develop self and interpersonalcapacities needed to fulfill adult life tasks. Taken together, theseand related considerations led the DSM–5 Work Group to pro-pose a revised criterion A requiring significant impairments inself and interpersonal functioning to be present before diagnos-ing a PD (Skodol, 2012). Note that, from a conceptual point ofview, establishing this criterion might be interpreted as a way ofintroducing Wakefield’s (1992) notion of “dysfunction” into thedefinition of PD (i.e., the presence of personality mechanismsthat do not perform the functions they were selected for in thecourse of evolution; see Krueger, Skodol, Livesley, Shrout, &Yueqin, 2007, p. 70). The LPFS is supposed to constitute anoperationalization of this new general criterion A.

Second, the PD section in DSM–IV fails to acknowledge andexplicate the normative assumptions underlying PD diagnoses(Leising, Rogers, & Ostner, 2009). Obviously, assigning a PDdiagnosis to a person necessarily involves a comparison of theperson’s personality with an image of how people should “nor-mally” feel or behave. However, this image of a well-functioningperson remains implicit in DSM–IV. Leising et al. (2009) ad-dressed this issue by semantically “inverting” the 79 individualPD criteria in DSM–IV, resulting in a set of positive expecta-tions regarding desirable behavior. A cluster analysis of sortingdata revealed 10 higher order value clusters that cut across the10 PD categories. Many of these values can be easily mappedinto the domains of self-functioning (e.g., be self-reliant andindependent; be self-confident, but in a realistic manner; haveself-control) and interpersonal functioning (e.g., get along withothers; connect with others emotionally and treat them fairly;enjoy social relationships and activities). Thus, one might ar-gue that the implicit normative assumptions that seem to haveguided the development of the PD criteria in DSM–IV are sur-prisingly congruent with the image of an optimally functioningperson explicitly portrayed within the LPFS (i.e., in the LPFSparagraphs outlining “little or no” impairment, see later). Mak-ing these normative assumptions explicit is an important stepforward, although we note that the discussion over why thesenorms should be endorsed (e.g., due to models of “natural”personality functioning that are rooted in evolutionary or devel-opmental theory, due to universal cultural values, etc.) is stillin its beginnings (Leising et al., 2009; Leising & Zimmermann,2011; Zachar & Kendler, 2010).

Third, the PD section in DSM–IV fails to represent the dimen-sional nature of personality pathology (Clark, 2007; Livesley,1998; Widiger & Trull, 2007). This is unfortunate for severalreasons: For example, it reduces the many shades of personal-ity pathology to two relatively arbitrary classes of persons withand without a PD, thereby ignoring a substantial proportionof patients who show “mild,” or subclinical personality prob-lems (Westen & Arkowitz-Westen, 1998). Moreover, it doesnot systematically capture the “general factor of PD” (i.e., in-dividual differences in the number of PD symptoms across all

categories), which has recently been shown to be a powerful pre-dictor of current and future problems in the domains of work,leisure, and social relationships (Hopwood et al., 2011). Fur-thermore, there are also psychometric reasons for shifting to adimensional model, including the reduced reliability and valid-ity of categorical measures (Markon, Chmielewski, & Miller,2011).

Several proposals have been made for including a dimensionalseverity measure of PD in a diagnostic manual, ranging fromsimply using the Global Assessment of Functioning (GAF) scale(Widiger & Trull, 2007), to counting and combining categoricalPD diagnoses (Tyrer & Johnson, 1996), to assessing the negativeconsequences of personality (Leising & Zimmermann, 2011;see Crawford et al., 2011, for a review). However, the DSM–5Work Group decided to construct a severity measure focusingon the level of impairment in capacities that are deemed crucialfor adaptive personality functioning (Livesley, 1998; Skodol,2012). One reason for doing so was that such a measure couldbe inherently linked to the new criterion A (see earlier), and thusguarantee a relatively high degree of specificity to pathologicalpersonality processes (especially when compared to the alterna-tive of using the rather unspecific GAF scale; Bender et al., 2011;Morey et al., 2011). Note that there is a long tradition of psy-chodynamic theorizing about PD supporting this decision, as acentral assumption in many psychodynamic models is that mal-adaptive mental representations of self and others are at the coreof personality pathology, and that the degree of disturbance canbe assessed along several distinct levels of functioning (Benderet al., 2011; Luyten & Blatt, 2013; Kernberg, 2012; Kernberg& Caligor, 2005; Westen, Gabbard, & Blagov, 2006). For ex-ample, the Operationalized Psychodynamic Diagnosis system(OPD; OPD Task Force, 2008) features a Level of StructuralIntegration Axis (LSIA), which is an expert-rating measure ofimpairments in basic mental capacities much akin to the LPFS(Zimmermann et al., 2012). The OPD–LSIA is widely used inGerman-speaking countries and has been shown to predict thepresence and number of PDs according to DSM–IV (Beneckeet al., 2009; Schauenburg & Grande, 2011; Zimmermann et al.,2012).

THE LEVEL OF PERSONALITY FUNCTIONING SCALE

The LPFS defines the severity of PD by the degree of dis-turbance in self and interpersonal functioning (American Psy-chiatric Association, 2013; Bender et al., 2011). This severitycontinuum is further specified by psychological features that areassumed to be typical for different levels of impairment in thedomains of identity and self-direction (i.e., self-functioning), aswell as empathy and intimacy (i.e., interpersonal functioning).The specific arrangement of domains and levels in the LPFSwas developed based on (a) a review of several clinician-ratedmeasures that are supposed to enable a dimensional assessmentof mental representations of the self and others (Bender, 2013;Bender et al., 2011), and (b) secondary data analyses of two self-report measures focusing on basic capacities crucial for adaptivefunctioning (Morey et al., 2011). Identity refers to the experi-ence of oneself as a unique person, the stability of self-esteem,the accuracy of self-appraisal, and the ability to regulate a rangeof emotional experience. Self-direction refers to the pursuit ofcoherent and meaningful goals, the utilization of constructiveand prosocial standards of behavior, and the ability to self-reflect

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 3

in a productive manner. Empathy captures the comprehensionand appreciation of others’ experiences and motivations, the tol-erance of differing perspectives, and the understanding of theeffects of one’s own behavior on others. Intimacy pertains tothe depth and duration of positive connections with others, thedesire and capacity for closeness, and the mutuality of regardreflected in interpersonal behavior. The LPFS assesses the de-gree of impairment in these domains along five distinct levels,ranging from 0 (little or no) to 1 (some), 2 (moderate), 3 (se-vere), and 4 (extreme). The domains and levels are describedin terms of three short paragraphs for each domain-level com-bination, resulting in a (4 × 5 × 3) table including 60 items.Raters are instructed to read these descriptions and to indicatethe level that most closely characterizes a patient’s current levelof functioning across domains. In other words, the LPFS is tobe rated on a single 5-point scale.

Recently, Morey and colleagues (2013) presented the firstdata on the LPFS, based on a study in which 337 clinicians as-sessed one of their patients using the DSM–IV and the alternativeDSM–5 model for PD. Morey at al. found that the patients’ LPFSscores were substantially associated with a range of clinicallyrelevant variables, including the presence of a PD according toDSM–IV and the total number of PD criteria met. Of note, theirresults suggest that a rating of “moderate” impairment on theLPFS is a reasonable threshold (i.e., providing the best combi-nation of sensitivity and specificity) for defining the presence ofa PD (this has been directly incorporated in the DSM–5 manual;see American Psychiatric Association, 2013).

Is Applying the LPFS Too Difficult for Many Raters?

As the LPFS is part of the new PD model in DSM–5 Sec-tion III, there is a strong need for further empirical studiesapplying this scale with various kinds of raters, targets, settings,and languages. One of the most pressing questions is whetherapplying the LPFS might actually be too difficult for manyraters. Several authors argued that the constructs featured in theLPFS are relatively complex, theory laden, and abstract, andthus might put overly high requirements on raters (Clarkin &Huprich, 2011; Leising & Zimmermann, 2011; Pilkonis et al.,2011; Pincus, 2011; Tyrer, 2012; Zimmermann et al., 2012).Specifically, it was noted that raters might require (a) extensivetraining courses, (b) several years of clinical experience (e.g., inworking with PD patients, or in working with psychodynamicconcepts similar to those featured in the LPFS), and (c) a longperiod of observation of, or contact with, the target patient. Ifthese concerns were true, untrained and clinically inexperiencedraters should have no chance of using the LPFS with sufficientreliability and validity, based on single clinical interviews. Onthe other hand, if one could show that untrained and clinical in-experienced raters can successfully apply the LPFS, this wouldindicate that the concerns mentioned earlier were premature,and that the LPFS could even be used in nonexpert settings.

THIS STUDY

The main goal of this study is to explore whether untrainedand clinically inexperienced raters are able to reliably and ac-curately assess patients’ levels of personality functioning fromclinical interview material. To this end, we employed a half-block design (Kenny, 1994): Twenty-two psychology under-graduates assessed the personality functioning of each of 10

female psychotherapy inpatients, using a multi-item version ofthe LPFS. The students’ ratings were based on videotaped clini-cal interviews that had been conducted by experienced cliniciansfollowing the guidelines of the OPD system (OPD Task Force,2008). The main outcome of interest in this study is the intraclasscorrelation coefficient (ICC; McGraw & Wong, 1996; Shrout &Fleiss, 1979), representing the reliability of ratings provided bysingle raters. Additionally, we conducted social relations modelanalyses (SRM; Back & Kenny, 2010; Kenny, 1994), allowingfor more detailed analyses of consensus and bias in the students’ratings.

It has often been noted that interrater reliability (consensus)does not guarantee validity (accuracy). Thus, we also testedwhether the students’ LPFS ratings converged with expert rat-ings of the severity of PD, specifically the presence and num-ber of DSM–IV PDs (First, Gibbon, Spitzer, Williams, & Ben-jamin, 1997), and the level of structural integration accordingto OPD–LSIA (Zimmermann et al., 2012). As a final test ofvalidity we had each student assess the personality functioningof each of five additional patients, based on short written casedescriptions. For these cases, we had access to “gold-standard”ratings by members of the DSM–5 Work Group. This enabled usto test the amount of “directional bias”; that is, whether studentraters tend to over- or underestimate impairment in personalityfunctioning as compared to experts (West & Kenny, 2011).

METHODS

Procedure

The study was conducted in two rounds of data collectionat the University of Kassel, Germany. The first round was partof a seminar on clinical personality assessment: Eight under-graduate psychology students took part as raters and receivedcourse credit for their participation. For the second round, wepublicly invited more psychology undergraduates to take partin a research study on clinical personality assessment. Fourteenstudents completed the study, receiving 150 Euros each for theirparticipation. One student dropped out after the second session.

In both rounds of data collection, the research procedureswere very similar: In the first session, students were providedwith a brief introduction into the DSM–5 Section III PD model,and were told that their task would be to assess the personalitypathology of 10 female patients according to this model basedon videotaped clinical interviews. In each of the following ses-sions, they met as a group and watched one or two videotapedinterviews, using a video projector and audio speakers. In thesecond round of data collection, videotapes were presented in adifferent, random order to minimize order effects. Directly afterwatching an interview, each student completed a rating sheetcontaining the multi-item version of the LPFS as well as othermeasures not pertinent to this report. Students were to refrainfrom talking to each other when watching an interview and whenproviding their ratings. They did not receive any further infor-mation about the patients, nor did they receive any feedbackabout their ratings until their task of rating 10 interviews wasfully completed.

After finishing their ratings, the students were presented withfive short written case descriptions (ranging from 342–531words in length) that had been compiled by members ofthe DSM–5 Work Group (Bender, 2012a, 2012b; Skodolet al., 2011) and translated into German by a professional

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

4 ZIMMERMANN ET AL.

translator and the first author. These cases were deemed pro-totypical for some (Level 1), moderate (Level 2), severe (Level3), and extreme (Level 4) impairment in personality function-ing, with moderate impairment being represented by two cases.Raters assessed these cases using the multi-item version of theLPFS, this time complemented by the original one-item LPFSassessing the global level of personality functioning. Moreover,raters provided information about their prior knowledge of thevarious diagnostic systems involved in this study (i.e., the PDmodels of DSM–IV and DSM–5 Section III, and the OPD sys-tem), and about their prior clinical experience and preferred ther-apeutic orientation (e.g., psychodynamic, cognitive-behavioral).

Raters

The 22 raters were psychology undergraduates. Seventeenstudents were female. Their mean age was 25.2 years (SD =7.7). Using 9-point scales ranging from 1 (none) to 9 (verygood), the raters indicated that, before rating the videotapes, theyhad had only limited knowledge of the PD model in DSM–IV(M = 3.1, SD = 2.0), and even less knowledge of the OPD sys-tem (M = 2.5, SD = 1.7), and the PD model in DSM–5 SectionIII (M = 2.1, SD = 1.6). Correspondingly, the raters’ mediannumber of hours of experience with these diagnostic systems(including reading about them) was 4 (DSM–IV), 1 (OPD), and0 (DSM–5). The majority of raters had no prior training in anyfield of clinical assessment (Mdn = 0 hours), and also no ex-perience with conducting clinical interviews (Mdn = 0 hours).Using 9-point scales ranging from 1 (none) to 9 (very strong),the raters’ self-reported allegiance to a psychodynamic modelof psychotherapy (M = 5.4, SD = 2.8) was somewhat higher, onaverage, than their allegiance to a cognitive-behavioral model(M = 4.3, SD = 2.8), although this difference was not statisti-cally significant, t(21) = 1.59, p = .13. In sum, the raters wereobviously untrained and clinically inexperienced.

Patients

The patient sample consisted of 10 female psychotherapyinpatients who had participated in one of two prior researchprojects investigating the OPD system (Benecke et al., 2009;Zimmermann, Stasch, Grande, Schauenburg, & Cierpka, inpress). In this study, we only included patients (a) whose OPDinterviews had been videotaped with sufficient audio quality,(b) whose OPD interviews had been rated by two indepen-dent experts using the OPD–LSIA, (c) who had consented thatthe videotape could be used in future research, and (d) whohad participated in additional Structured Clinical Interviewsfor DSM–IV Axis I and II Disorders (SCID–I [First, Spitzer,Gibbon, & Williams, 2002] and SCID–II [First et al., 1997])conducted by research assistants. Finally, to avoid confoundingeffects of patients’ gender, we decided to include female pa-tients only. Based on this preselection of potential targets, weselected five patients with, and five patients without a PD diag-nosis according to SCID–II. The mean age of these 10 patientswas 30.8 years (SD = 9.6). The number of current DSM–IV AxisI diagnoses per patient (including partially remitted disorders)ranged from 1 to 4, and was 2.10 (SD = 1.10) on average. Themost prevalent Axis I disorders were mood disorder (nine cases),anxiety disorder (five cases), eating disorder (three cases), andsomatoform disorder (two cases). The number of Axis II diag-noses ranged from 0 to 4, and was 1.40 (SD = 1.72) on average.

Within the subgroup of patients with a PD, the most prevalentdiagnoses were avoidant PD (four cases), dependent PD (twocases), obsessive-compulsive PD (two cases), depressive PD(two cases), and borderline PD (two cases).

Interviews

The clinical interviews on which the student raters based theirLPFS ratings had been conducted by trained experts, and in ac-cordance with the manual of the OPD system (OPD Task Force,2008). OPD interviews took between 57 and 97 min (M = 74,SD = 14), and covered a wide range of topics such as currentand former symptoms; prototypical descriptions of the self andothers; significant relationships including parents, siblings, part-ners, and friends; issues of intimacy and psychosexual develop-ment; and childhood experiences. The OPD interview techniquealternates between relatively unstructured phases of free explo-ration and more structured questions regarding biographical andclinical details. Note that this approach differs significantly fromthe semistructured interviews that are often recommended as a“gold standard” for assessing PDs (e.g., Widiger & Samuel,2005): In an OPD interview, the interviewer does not only askdirect questions, but also leaves considerable room for the in-terviewee to freely unfold his or her specific ways of describingrelationships with significant others, interacting with the inter-viewer, and coping with the interview situation (Schauenburg& Grande, 2011; see also Westen & Muderrisoglu, 2003, fora similar approach). Moreover, during the more unstructuredphases of the interview, the interviewer can apply common psy-chodynamic intervention strategies such as clarification, con-frontation, and interpretation (OPD Task Force, 2008). As OPDinterviews are designed to elicit all the information that is rel-evant for assessing an interviewee’s maladaptive relationshippatterns, enduring motivational conflicts, and structural capa-bilities, it seemed reasonable to assume that they would also bea useful data source for applying the LPFS (cf. Zimmermannet al., 2012).

Measures

Level of Personality Functioning Scale. The LPFS is anewly developed scale for assessing the global level of impair-ment in personality functioning with respect to the domains ofidentity, self-direction, empathy, and intimacy (American Psy-chiatric Association, 2013; Bender et al., 2011; see earlier). AGerman version of the LPFS was developed by two authors ofthis study (Johannes Zimmermann and Daniel Leising), using atranslation-backtranslation procedure with subsequent feedbackfrom authors of the original English version (Donna S. Bender,Andrew E. Skodol). Note that we adapted the LPFS somewhatfor use in this study. Specifically, we capitalized on the factthat the LPFS comprises four domains, and that each domainis further described in terms of three facets. Thus, instead ofproviding a global LPFS rating on a single 5-point scale, thestudents rated each of the 12 facets separately (for a similarsuggestion, see Pilkonis et al., 2011, p. 79). Facets were ratedon 5-point scales, and each of the five response options wasanchored with the respective short paragraph from the LPFS.For example, one of the three facets of self-direction is “pursuitof coherent and meaningful short-term and life goals.” The re-sponse options for the respective facet scale were (with level ofseverity in parentheses): “Sets and aspires to reasonable goals

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 5

based on a realistic assessment of personal capacities” (0), “Ex-cessively goal-directed, somewhat goal-inhibited, or conflictedabout goals” (1), “Goals are more often a means of gaining exter-nal approval than self-generated, and thus may lack coherenceand/or stability” (2), “Difficulty establishing and/or achievingpersonal goals” (3), and “Poor differentiation of thoughts fromactions, so goal-setting ability is severely compromised, withunrealistic or incoherent goals” (4). The other two facets ofself-direction were assessed by two separate scales, and so werethe nine facets of the three remaining domains, yielding 12 5-point scales altogether. Prior to analyses, these 12 items wereaggregated into four domain scores, which were then aggregatedinto a single LPFS total score (median Cronbach’s alpha acrossraters was greater than .75, both for domain and total scores).If not otherwise specified, results reported here pertain to thesingle total score. The primary rationale for using a multi-itemversion was to reduce the amount of informal, or impressionis-tic, data aggregation that would be necessary in providing theratings (cf. Westen & Weinberger, 2004). Additionally, a multi-item version allows for computing the internal consistency ofthe ratings, and for separating relationship and error variance inSRM analyses (see later).

Structured Clinical Interviews for DSM–IV Axis I andII Disorders. The SCIDs are semistructured interviews thatmany researchers consider the gold standard for assessing men-tal disorders according to DSM–IV. The SCID–I assesses AxisI disorders, including mood, psychotic, substance use, anxiety,somatoform, eating, and adjustment disorders. The SCID–II as-sesses Axis II disorders, including the 10 major PDs, PD nototherwise specified (PDNOS), and the appendix categories ofdepressive and negativistic PD. The SCID–II includes two steps:In the first step, a screening questionnaire is administered. In thesecond step, the interviewee is asked in more depth about thosePD criteria that were screened positive in the questionnaire. In-terrater reliability of SCID diagnoses in clinical samples can beexpected to be moderate to excellent (Lobbestael, Leurgans, &Arntz, 2011). In this study, the SCIDs were conducted by trainedresearch assistants, with no tests of interrater reliability takingplace. Unfortunately, we did not have access to ratings of theindividual PD criteria, as they were not recorded for data anal-ysis in the primary research projects from which the materialthat we used in this study came. Thus, we could not computedimensional PD scores or retrieve PDNOS diagnoses.

Level of Structural Integration Axis. The OPD–LSIA is anexpert-rated measure of the severity of personality dysfunctionrooted in psychodynamic theory (OPD Task Force, 2008; Zim-mermann et al., 2012). It assesses a person’s level of “structuralintegration” across eight basic mental capacities: perception ofthe self and others, regulation of the self and relationships, com-munication with the internal and external world, and attachmentto internal and external objects. The standard rating procedure ofthe OPD–LSIA requires assessing a person’s level of structuralintegration regarding each of these eight primary dimensions,ranging from good integration (1), to moderate integration (2),low integration (3), and disintegration (4). That is, higher scoresimply higher levels of impairment, as they do in the LPFS. Alsosimilar to the LPFS, the OPD manual contains a checklist inwhich each possible combination of dimensions and levels isdescribed by a short paragraph (OPD Task Force, 2008). The

OPD–LSIA dimensions are rated on 7-point scales, using mainand intermediate levels (e.g., “good to moderate integration” isrepresented by 1.5, etc.). In this study, each clinical interviewwas rated by two out of four trained OPD experts. We used theindividual experts’ mean scores across the eight dimensions asa measure of overall structural impairment. These mean scoreswere then averaged across the two raters, yielding a consensualmeasure of structural impairment. The interrater reliability ofthis composite measure was very high, ICC [1, 2] = .95. Scoresranged from 1.69 to 3.88 (M = 2.31, SD = 0.69), and were usedas a criterion measure for validating the students’ LPFS ratings.

Statistical Analyses

Statistical analyses were conducted using the statistical plat-form R 2.15.2 (R Core Team, 2012). First, we assessed theinterrater reliability of the students’ LPFS ratings by computingtwo types of ICC: the two-way random single measures ICC[2, 1], representing the reliability of a single rater, and the two-way random average measures ICC [2, 22], representing thereliability of the mean across the 22 raters (McGraw & Wong,1996; Shrout & Fleiss, 1979). We conducted these analyses foreach of the four LPFS domain scores, and for the LPFS totalscore, using the “irr” package. In line with recommendationsby the DSM–5 Task Force, we regarded ICC values higher than.40 as acceptable (Kraemer, Kupfer, Clarke, Narrow, & Regier,2012). Other researchers have provided similar recommenda-tions for what constitutes a minimal benchmark of interraterreliability when using ICCs (e.g., Cicchetti, 1994).

A simulation study showed that the precision and power of ourdesign was adequate (cf. Landau & Stahl, 2013). Specifically,the 95% confidence interval (CI) of the ICC had a maximumwidth of .48, which is in the range of precision that was acceptedin the DSM–5 field trials (Clarke et al., 2013). Moreover, ourdesign had a power greater than .99 to detect an ICC of .40,and a power greater than .97 to detect an ICC of .20 (assuminga Type I error rate of .05). Thus, obtaining a nonsignificantfinding would imply that the interrater reliability of the LPFS isnot acceptable in a population of similar raters and patients.

Second, we conducted SRM analyses, allowing for a system-atic decomposition of ratings into perceiver, target, relationship,and error variance (Back & Kenny, 2010; Kenny, 1994). Theproportion of target variance indicates the amount of consensusbetween raters in judging patients’ levels of personality func-tioning. The proportion of perceiver variance indicates the extentto which raters expose different response styles, with some ratersbeing more lenient than others in assigning scores to patients(i.e., generalized rater bias, or “assimilation”). The proportionof relationship variance indicates the extent to which individualraters judge the personality functioning of individual patients inidiosyncratic ways (i.e., idiosyncratic rater bias). SRM analy-ses were conducted using linear mixed-effect models (Kenny &Kashy, 2010), again focusing on the four LPFS domain scores,and on the LPFS total score. Each time we estimated the propor-tions of perceiver, target, relationship, and error variance usingthe “lme4” package, and tested the random coefficients for sig-nificance using exact restricted likelihood ratio tests (availablefrom the “RLRsim” package).

We are not aware of any previous research analyzing variancecomponents in ratings of personality functioning as defined inthe LPFS, or using videotaped clinical interviews as stimulus

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

6 ZIMMERMANN ET AL.

material. However, in research on perceptions of normal per-sonality, a common finding is that about 15% of the varianceis due to consensus, 20% is due to generalized bias, 20% isdue to idiosyncratic bias, and the remaining variance is due tomeasurement error (Kenny, 1994). The few existing studies thatinvestigated perceptions of maladaptive personality traits suchas pathological narcissism (Lukowitsky & Pincus, 2013) or psy-chopathy (Mahaffey & Marcus, 2006) suggest that the influenceof generalized bias might be even stronger, and consensus mightbe even lower than this.

Third, we assessed the validity of students’ LPFS ratings us-ing linear mixed-effect models: The LPFS total scores wereseparately predicted from one of the following three externalvalidation measures: the (dummy-coded) presence of at leastone DSM–IV PD, the number of DSM–IV PDs, and structuralimpairments according to OPD–LSIA. These analyses providefixed effects for the intercept (i.e., the expected LPFS score whenthe external measure is zero) and the slope (i.e., the expectedchange in LPFS score when the external measure increases byone). We were mainly interested in the slope parameter, as itdirectly quantifies the validity, or accuracy, of individual raters.When z-transforming all variables prior to analyses, the (therebystandardized) slope parameter β can be interpreted as the aver-age correlation between LPFS ratings by individual raters andthe external validation measure. Thereby, results become di-rectly comparable to the results of single-rater studies. Notethat, in these analyses, we accept the three criterion variables asproxies, or alternative operationalizations of the targets’ “true”personality functioning (cf. Campbell & Fiske, 1959). We alsoincluded random effects per rater for both the intercept andthe slope parameters, representing individual differences in theleniency and accuracy of raters, respectively. To rule out thepossibility that that validity estimates are inflated by individ-ual differences in patients’ general psychological distress (i.e.,distress unrelated to personality functioning in particular), wereran all analyses using the number of comorbid DSM–IV Axis Idisorders as a covariate. All linear mixed-effect model analyseswere conducted with the “lme4” package.

Finally, we conducted additional validity analyses drawingon data from five prototypical case descriptions (see earlier).In these analyses, the student raters’ total LPFS ratings werepredicted from the theoretically “correct” values (i.e., the nor-mative ratings by members of the DSM–5 Work Group), againusing a linear mixed-effect model. Note that, after centering allratings at the “correct” mean of M = 2.4, the intercept parame-ter directly quantifies whether individual raters tend to over- orunderestimate the average amount of impairment in personal-ity functioning (West & Kenny, 2011). As raters also providedadditional assessments of the global level of personality func-tioning using the original 5-point-scale (see earlier), we wereable to repeat the analyses for these “global” LPFS ratings.

RESULTS

Table 1 presents descriptive statistics and the results of theinterrater reliability and SRM analyses. Reliability of individualraters’ LPFS ratings was ICC [2, 1] = .51, 95% CI [.31, .78],which is acceptable. This suggests that an untrained and clini-cally inexperienced student is indeed able to apply a multi-itemversion of the LPFS to clinical interview material with suffi-cient reliability. Note that the reliability of the LPFS mean scoreacross raters was extremely high, ICC [2, 22] = .96, 95% CI

[.91, .99], once again illustrating that aggregating scores acrossraters is a powerful method of improving reliability (cf. Furr& Funder, 2007). These conclusions were further corroboratedby the results of our SRM analyses: 29% of the LPFS totalscore variance was due to consensus between raters regardingthe patients’ different levels of personality functioning. This isabout twice as much as is usually found with ratings of nor-mal personality traits. In contrast, only 12% were due to theraters’ response styles, which is considerably lower than whatis typically found in studies of normal personality perception(Kenny, 1994). The relative proportions of idiosyncratic raterbias and measurement error were similar to those found in pre-vious studies. The reliability of individual raters’ judgments ofthe targets’ impairments in specific LPFS domains ranged from.25 for empathy to .63 for intimacy. SRM analyses indicatedthat the proportion of perceiver variance in ratings of empathywas relatively high (21%), suggesting that the relatively poorreliability found for this domain might be due to raters differingin their individual calibrations of the rating scale.

The linear mixed-effects model analyses confirmed that stu-dents’ LPFS ratings were not only reliable, but also signifi-cantly associated with external criterion measures. Specifically,we found that LPFS total scores were higher in patients with aDSM–IV PD diagnosis than in patients without such a diagnosis,B = 0.48, SE = 0.10, β = .29, p < .001, and also positivelyassociated with the number of such PD diagnoses, B = 0.22, SE= 0.03, β = .43, p < .001 (see the upper panels of Figure 1).The size of the standardized slope parameter (β) for the lattereffect is only slightly lower than the correlation of .51 betweenthe LPFS and the total number of DSM–IV PD symptoms re-ported in a recent study based on clinician ratings (Morey et al.,2013).1 Notably, the association between students’ LPFS ratingsand expert-rated structural impairments (OPD–LSIA) was evenhigher, B = 0.77, SE = 0.06, β = .61, p < .001 (see the lowerleft panel of Figure 1). This suggests greater content overlapbetween LPFS and OPD–LSIA as compared to the overlap be-tween LPFS and DSM–IV PDs. The number of Axis I disorderswas also significantly associated with LPFS ratings, B = 0.26,SE = 0.05, β = .33, p < .001, but including this variable as acovariate did not change the results of the analyses presentedearlier. Specifically, the incremental standardized slopes for thepresence of a PD, number of PDs, and OPD–LSIA were still.22, .35, and .57, respectively (all ps < .001).2

1Morey et al. (2013) also conducted receiver operating characteristic (ROC)curve analyses to explore the associations between global LPFS ratings andDSM–IV PD diagnoses. They reported an area under the curve (AUC) of .83for predicting the presence of a PD, and an AUC of .70 for predicting caseswith multiple PDs. To explore how our results compare to their findings, wecomputed AUCs for each of the 22 raters using the R package “ROCR.” Themedian AUC across raters for predicting the presence of a PD was .69, and themedian AUC for predicting cases with multiple PDs was .70. Thus, the students’LPFS ratings performed similar to clinicians’ ratings when identifying severelydisturbed patients (i.e., patients with more than one PD), but were less efficientwhen distinguishing patients with and without PDs.

2Note that, across the 10 patients, correlations between the number ofDSM–IV Axis I disorders and the three focal predictors ranged from .29 to.40. However, due to the small sample size, these associations were not signifi-cant, ps > .25. Moreover, OPD–LSIA ratings were significantly associated withthe number of PDs, r = .77, p < .01, but not with the presence of a PD, r = .44,p = .21.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 7

TABLE 1.—Descriptive statistics, reliability estimates, and variance components for level of personality functioning ratings.

Descriptive ICCs [95% CI] SRM proportions of variance

Scale M SD Single Average Perceiver Target Relationship Error

Identity 1.91 0.87 .41 [.23, .71] .94 [.87, .98] .11 .29 .17 .43Self-direction 1.60 0.91 .46 [.27, 75] .95 [.89, .98] .06 .31 .19 .45Empathy 1.23 1.01 .25 [.12, .55] .88 [.74, .96] .21 .19 .26 .35Intimacy 1.76 1.07 .63 [.43, 85] .97 [.94, .99] .07 .47 .10 .36LPFS total score 1.62 0.82 .51 [.31, .78] .96 [.91, .99] .12 .29 .22 .37

Note. Data is based on 22 raters and 10 targets. Descriptive statistics refer to the full set of 220 ratings. Proportions of variance were estimated using linear mixed-effect models.Exact restricted likelihood ratio tests indicated that all variance components were significant at p < .01. ICC = two-way random intraclass correlation coefficient; SRM = social relationsmodel; LPFS = Level of Personality Functioning Scale.

0 1

(a) Presence of a PD

0

1

2

3

4

0 1 2 3 4

(b) Number of PDs

0

1

2

3

4

1 2 3 4

(c) OPD-LSIA

0

1

2

3

4

(d) Gold-standard ratings

0

1

2

3

4

0 1 2 3 4

FIGURE 1.—Predicting students’ total Level of Personality Functioning Scale (LPFS) ratings (y-axis) from the following patient variables (x-axis): (a) presenceof a personality disorder (PD) diagnosis, (b) number of PD diagnoses, (c) expert-rated Level of Structural Integration Axis ratings according to OperationalizedPsychodynamic Diagnosis (OPD–LSIA), and (d) gold-standard LPFS ratings by members of the DSM–5 Work Group. Bold lines represent fixed effects, and dottedlines represent random effects per rater.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

8 ZIMMERMANN ET AL.

Students’ LPFS ratings of the five prototypical case descrip-tions were highly congruent with the respective DSM–5 gold-standard ratings, B = 0.73, SE = 0.05, β = .77, p < .001.However, the intercept parameter indicated that student ratersunderestimated the average amount of impairment in the patientsample, B = –0.58, SE = 0.09, p < .001. This is evident fromthe regression line in the lower right panel of Figure 1 lyingsomewhat below the diagonal (which would represent a perfectmatch). Very similar results emerged for the global LPFS ratingsbased on a single 5-point scale: Associations with gold-standardratings were very high, B = 0.85, SE = 0.08, β = .73, p < .001,but the average level of impairment that the students attributedto the patients was again significantly lower, as compared tothe expert ratings, B = –0.59, SE = 0.11, p < .001. These lat-ter findings were unsurprising as the global LPFS scores werehighly associated with the total scores of the multi-item version,B = 0.74, SE = 0.03, β = .91, p < .001.

Finally, it should be noted that in all validity analyses therandom coefficients for the intercepts were highly significant,whereas the random coefficients for the slopes were virtuallyzero. This means that raters differed mainly in terms of leniency,but were quite similar in terms of accuracy. In Figure 1, this isevident from the fact that the dotted lines run in parallel, butat different levels. Estimated differences in leniency betweenraters could be as large as a full point on the LPFS.

DISCUSSION

This study explored whether untrained and clinically inex-perienced students are able to reliably assess patients’ level ofpersonality functioning from clinical interview material, andwhether their ratings converge with expert-rated proxy mea-sures of the severity of personality pathology. In the follow-ing, we discuss our main findings regarding the interrater re-liability of LPFS ratings, their associations with DSM–IV PDdiagnoses and with expert-rated OPD level of structural integra-tion. We also highlight possible consequences for the develop-ment of interview techniques and training courses, address thelimitations of this study, and outline the next steps for futureresearch.

Consensus Among Laypersons Assessing DSM–5 Levelof Personality Functioning

Several authors argued that the constructs featured in theLPFS might be too complex to be reliably rated when raterslack extensive training, clinical expertise, and prolonged contactwith the patient (e.g., Leising & Zimmermann, 2011; Pincus,2011; Tyrer, 2012; Zimmermann et al., 2012). Our findingssuggest that these concerns were premature. Even untrainedand clinically inexperienced students are able to consensuallyassess patients’ level of personality functioning based on singleclinical interviews lasting only between 60 and 90 min. Thisis even more remarkable as these interviews were not designedto systematically gather information pertaining to the LPFS,but followed the guidelines of a different diagnostic system(OPD Task Force, 2008). Moreover, the influence of generalizedrater bias was somewhat smaller as compared to findings fromresearch on perceptions of normal personality traits (Kenny,1994). This pattern of relatively strong consensus and small raterbias is surprising because current research suggests the oppositepattern when the rated features are undesirable (Lukowitsky &

Pincus, 2013; Mahaffey & Marcus, 2006) or difficult to observe(Kenny & West, 2010), both of which are likely to apply toimpairments in personality functioning.

Several explanations could account for this finding. First, al-though based on a different diagnostic system, the interviewsmight have been very efficient in eliciting relevant cues (notleast because they were conducted by experienced clinicians).For example, interviewers probed for the patients’ capacitiesto describe themselves and significant others, and explicitly in-quired about issues of intimacy and psychosexual development.These interview techniques are likely to provide cues that are notreadily available in everyday interactions. Thus, the increased“visibility” of the relevant personality features during the inter-view situation might have improved consensus and reduced bias(Kenny & West, 2010). Second, one should keep in mind thatinterrater reliability depends on the true score variance in theparticular sample at hand (cf. Lahey, Downey, & Saal, 1983).As our patients and case descriptions were systematically se-lected to differ in personality pathology, it seems probable thatinterrater reliability, or consensus, will be lower in more homo-geneous samples (e.g., in a sample of healthy controls). How-ever, the proportion of patients with a PD in our patient sample(i.e., 50%) was close to the prevalence of PDs in typical clinicalsamples (Zimmerman, Rothschild, & Chelminski, 2005), so thetrue score variance in this study might at least not be uncommonin clinical practice.

Finally, we used a multi-item version of the LPFS. It is wellknown that aggregating repeated measurement of the same un-derlying construct is likely to reduce measurement error. More-over, our breaking the global 5-point scale down into 12 morespecific scales probably reduced the necessary amount of infor-mal data aggregation on the side of the raters, and might havehelped raters anchor their assessment more in observable be-havioral criteria (cf. Westen & Weinberger, 2004). This in turnmight have improved the chances for finding acceptable inter-rater agreement. Therefore, we cannot rule out the possibilitythat interrater reliability might be somewhat lower when ratersuse the single 5-point scale. However, the difference might berather small, as our analyses pertaining to assessments of proto-typical case descriptions suggest that the mean of the multi-itemversion and the single 5-point scale are highly correlated.

Associations With DSM–IV Personality DisorderDiagnoses

Students’ LPFS ratings did not only meet common minimalstandards for interrater reliability, but were also significantlyassociated with the presence and number of patients’ DSM–IVPD diagnoses. This replicates recent findings from a study usingclinician ratings of the LPFS and DSM–IV PDs (Morey et al.,2013). Moreover, going beyond Morey et al. (2013), the asso-ciations reported in this study cannot be explained by sharedmethod variance (because LPFS ratings and DSM–IV PD di-agnoses in our study were based on different data sources andraters), or by individual differences in patients’ distress un-specific to personality (because associations largely held whencontrolling for the number of DSM–IV Axis I disorders). Thelatter finding seems especially important, because the LPFS wasdeveloped as a severity indicator of PD that should be specificto impairments in personality functioning (Bender et al., 2011;Morey et al., 2011).

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 9

However, although our results seem encouraging in this re-gard, it should be noted that the absolute size of the associationswas rather modest (e.g., the incremental standardized slope pa-rameter for the difference between patients with and without aPD was only .22). A viable explanation for this might be that theLPFS highlights facets of personality pathology that are com-mon to many of the DSM–IV PD categories, and thus is morelikely to detect cases that show a “mixed” pattern of problemsthat does not meet the thresholds for any of the specific PDcategories (Verheul & Widiger, 2004). For example, considerthe following patient in our study, who had no PD diagnoses ac-cording to the SCID–II, but received the second highest LPFStotal score across raters (M = 2.1):

This patient was a 26-year-old female student suffering from recurrentmajor depressive disorder and comorbid panic disorder. Although shereached several cutoff values in the SCID–II screening questionnaire(for avoidant, obsessive-compulsive, negativistic, schizoid, and bor-derline PD), the SCID interviewer came to the conclusion that noneof the 12 specific DSM–IV PDs was fully present (i.e., none of thecutoff values were reached after inquiring about the items screenedpositive). However, both the SCID–II and the OPD interview led to theconclusion that she had long-standing interpersonal problems: She wasbullied at school, consistently felt as an outsider, and tended to reactaggressively or defiantly to perceived threats. She completely avoidedintimate relationships, probably due to expectations of abandonment,and she was often cold or dismissive toward her (few) friends, to “test”whether she could truly count on them. One of her own explanationsfor her recurring interpersonal conflicts at school or university was thatothers, especially men, were bothered by, or even envious about, somespecial abilities she supposedly had, for example, an extraordinarilygood memory and general knowledge.

From this short sidelight on the rich clinical information avail-able from the OPD interview, it should already become clearthat the students’ ratings of “severe” impairments in intimacy(M = 3.0) and “moderate” impairments in empathy (M = 2.1)seem justifiable (for a more detailed discussion of this case seeZimmermann et al., 2013). Thus, the domains of the LPFS mightbe better suited than any of the specific DSM–IV PD categoriesto assess the problems of this particular patient, who seems tobe an instructive example of a person with a PDNOS or “mixed”PD (which was, unfortunately, not coded in this study).

Associations With OPD Level of Structural Integration

Our findings indicate that DSM–5 level of personality func-tioning and OPD level of structural integration are highly inter-related constructs. Whereas a previous OPD expert consensusstudy suggested that both measures conceptually relate to acommon severity continuum (Zimmermann et al., 2012), thisstudy demonstrates that this conceptual overlap is empiricallyreflected in high correlations between independent OPD–LSIAand LPFS ratings that are based on the same video material. Ob-viously, when psychodynamically oriented researchers in Ger-many and members of the DSM–5 Work Group created measuresof what they thought to be the particularly important phenomenain personality functioning, they independently arrived at verysimilar solutions. Notably, however, the OPD system allows fortracking specific aspects of therapeutic change, which the globalLPFS does not (yet): The OPD system requires the diagnosti-cian to identify a small set of “focus” or “marker” personality

problems for each patient (OPD Task Force, 2008). These arethe problems that the treatment should focus on and that willbe used to monitor whether the patient’s condition improvesover time (Grande, Rudolf, & Oberbracht, 2000; Stiles et al.,1990). To us, it is well conceivable that diagnosticians mightdo something similar with the LPFS; that is, they might selectspecific capacities that appear to be most impaired in a patientprior to treatment (e.g., one of the four LPFS domains), andsubsequently assess whether and how the patients’ awarenessof, and coping with, these impairments improves in the courseof treatment. According to our view, this approach might provehighly valuable for the future development of PD assessmentoverall (i.e., not only pertaining to the LPFS but to the trait do-mains and trait facets within the alternative DSM–5 PD modelas well; Zimmermann et al., 2013).

Implications for Interviews and Training Courses

Our findings might inform the development of interview tech-niques and training courses that are supposed to help researchersand clinicians improve their ability to assess people’s level ofpersonality functioning. For example, it seems that impairmentsin intimacy can be more reliably assessed than impairments inempathy, at least when ratings are based on videotaped OPDinterviews. This suggests that it might be helpful to includequestions that explicitly probe for a person’s capacity to appre-ciate others’ experiences and motivations, to tolerate differentperspectives, and to understand the effects of one’s own behavioron others. As direct questions seem not very helpful here (e.g.,“Do you appreciate others’ experiences and motivations?”), apromising alternative might be to include questions probing forreflective functioning, such as those that feature in the AdultAttachment Interview (e.g., “Why did your parents behave asthey did?”; Taubner et al., 2013).

Another finding that might inform future training coursesis that laypersons might differ considerably in their individualcalibrations of the LPFS. Although we already noted that therelative amount of generalized rater bias was smaller than wehad expected, an absolute difference of up to one full point onthe LPFS is likely to have important practical consequences: Forexample, two raters might accurately perceive that one patientis more impaired than another patient, but rater A might assignLPFS scores of 1 and 2 to the patients, whereas rater B mightassign scores of 2 and 3. As a consequence, the second raterwould come to the conclusion that both patients have PDs (e.g.,requiring treatment), whereas the first rater would diagnose thefirst patient as not having a PD. This way, rater biases mightdirectly influence diagnostic decisions, and thus should be sys-tematically addressed in training courses (e.g., by using casevignettes).

Maybe even more important, laypersons seem to underes-timate the amount of impairment in patients’ personality func-tioning, at least when compared to expert ratings. Interestingly,this corresponds to previous findings that were obtained with theOPD–LSIA, suggesting that trained but clinically inexperiencedraters generally perceive less structural impairment in patientsthan do experienced clinicians and OPD trainers (Benecke et al.,2009). The reason for this systematic discrepancy remains un-clear for now. It might either be owed to laypeople’s reluctanceto identify other persons as “severely disturbed” (probably dueto a lack of experience with what such persons are like), or to a

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

10 ZIMMERMANN ET AL.

tendency to pathologize people on the side of PD experts. Iffuture research demonstrates that laypeople actually tend tooverlook or downplay personality problems that people reallyhave (e.g., as evidenced longitudinally by negative outcomes),then such avoidance would also have to be addressed in trainingcourses.

Limitations

This study has several limitations. First, the sample of patientswas small, and thus the precision of the empirical estimates pre-sented is limited. However, we were able to recruit 22 raters,which is considerably more than in most studies on interrater re-liability, and the precision of our ICC estimates was in the rangeof what was accepted in the DSM–5 field trials (Clarke et al.,2013). Second, our sample was restricted to female inpatients,and did not cover the full range of PD categories accordingto DSM–IV (e.g., patients with antisocial, narcissistic, histri-onic, or schizotypal PDs were not included in the sample). Thismight limit the generalizability of our findings. It should benoted, however, that the specific PD categories that were miss-ing in our sample are rarely found in most clinical samples (e.g.,Zimmerman et al., 2005), making our sample rather typical inthis regard. Third, the student raters did not conduct their owninterviews with the patients, but were presented with the samevideo material. Although this is an appropriate design to assessinterrater reliability, it does not answer the question of test–retestreliability; that is, whether raters agree after independently inter-viewing and assessing the same patients (Kraemer et al., 2012).3

Thus, our results cannot be directly compared with the results ofthe DSM–5 field trials, which used a test–retest design (Clarkeet al., 2013). Finally, the SCID–II data that served as a crite-rion measure were somewhat incomplete, as they only pertainedto the presence or absence of the 12 categorical DSM–IV PDdiagnoses. It would have been desirable to also have access toratings of the individual PD criteria, because that would have en-abled us to compute a general severity index the way Hopwoodet al. (2011) did, and for retrieving possible PDNOS diagnoses(Verheul & Widiger, 2004). Additionally, as the SCIDs werenot audio- or videotaped, the reliability of the PD diagnoses inthis sample remains unclear. Nevertheless, we found some indi-cation for moderate associations between SCID–II ratings andstudents’ LPFS ratings, which is quite remarkable given that thetwo kinds of ratings were based on different data sources.

Future Directions

The major finding of this study is that assessing DSM–5 levelof personality functioning in patients seems easier than manycritics thought (including some of us). Specifically, it could beaccomplished even when raters lack training, clinical expertise,or prolonged contact with the patient. However, the LPFS wasobviously not constructed to be applied by untrained and in-experienced raters, so our findings might merely set a lowerbound to its interrater reliability. An important next step will beto more systematically explore conditions that could enhancereliability: What exactly is gained, for example, by recruiting

3Note that the conception of test–retest reliability adopted by the DSM–5Task Force assumes that the interviews are conducted within a short intervalduring which the features of the patients are unlikely to change (Kraemer et al.,2012).

experienced clinicians as raters, by providing training courseswith case vignettes prior to ratings, or by employing interviewtechniques that are explicitly tailored to assessing the LPFSdomains? Note that it is also possible that future studies donot find any increases in reliability under some of these condi-tions, which would question the value of experience, training,or specific interview techniques in this regard. Moreover, futurestudies should test whether the multi-item version of the LPFSis a viable alternative to the single 5-point scale, even beyondresearch contexts (i.e., whether possible gains in reliability andclinical utility outweigh the greater effort associated with mul-tiple ratings in clinical practice).

Maybe the most pressing research questions concerning theLPFS involve issues of construct validity (Campbell & Fiske,1959; Cronbach & Meehl, 1955). Using the LPFS as a single5-point scale (as it is currently recommended in DSM–5 SectionIII) actually assumes that the underlying construct of personalityfunctioning is well-represented as a single latent dimension.This might or might not be the case. Although Morey et al.(2013) presented the first evidence that LPFS domain scoresare correlated around .50 with the LPFS global score, they didnot formally test this assumption, and our patient sample wasfar too small to do so. In fact, it seems conceivable that thelatent structure of the LPFS is more complex, made up of tworelatively distinct factors of self and interpersonal functioning(Berghuis, Kamphuis, Verheul, Larstone, & Livesley, in press).This could be illustrated using the patient presented in our caseexample, whose impairments in intimacy clearly seemed to bemore severe than her impairments in identity or self-direction.Thus, only deciding whether this patient is “moderately” (2) or“severely” (3) impaired at a global level would imply losingclinically important information.

Regarding external correlates of the LPFS, we presented pre-liminary evidence for convergent associations with DSM–IVPDs, and even stronger associations with impairments in struc-tural capacities according to OPD, speaking to some continuityor overlap with previous diagnostic systems. However, futurestudies should make use of a broader set of criterion measures,including performance in standardized self and interpersonalfunctioning tasks (e.g., Leising, Krause, Kohler, Hinsen, &Clifton, 2011; Taubner et al., 2013), or (future) success in majorlife tasks (e.g., Hopwood et al., 2011; Ullrich, Farrington, &Coid, 2007). Even more important, it needs to be tested whethersuch associations hold when controlling for unspecific measuresof severity. Although we presented some evidence for such in-cremental utility of the LPFS by controlling for the number ofDSM–IV Axis I disorders, further studies should replicate thisfinding with more commonly used measures of psychologicaldistress and impairment that are unspecific to personality (e.g.,the GAF scale). Note that such studies might also benefit froma clearer conceptual analysis of what is actually meant by “im-paired functioning” (e.g., failure to succeed at major life tasksvs. “brokenness” of functional capacities that are prerequisitesfor succeeding at major life tasks; see Leising et al., 2009; Leis-ing & Zimmermann, 2011; Livesley, 1998; Wakefield, 1992;Zachar & Kendler, 2010).

Finally, the issue of the LPFS’s construct validity should alsobe addressed in reference to the “maladaptive personality traits”that feature as the second major component in the new PD modelin DSM–5 Section III (Krueger et al., 2011; Skodol, 2012). Forinstance, the interpersonal problems of the patient presented

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 11

in our case example were identified as “severe impairmentsin intimacy” on the LPFS, but might also be easily coded asfacets of “detachment” within the trait model (e.g., intimacyavoidance, suspiciousness, withdrawal). Thus, one might arguethat there is considerable redundancy between the level and traitcomponents of the DSM–5 model, as it might be necessary tocode the same diagnostic information twice. This clearly callsfor empirical studies assessing the discriminant or incrementalvalidity of the LPFS above and beyond maladaptive personalitytraits (Zimmermann et al., 2013).

ACKNOWLEDGMENTS

This research was funded by grants from the OPD Task Forceand the University of Kassel to Johannes Zimmermann. Wethank Katharina Rek for helping with the data collection, andLena Braun, Sabine Gluth, Sara Homburg, Doreen Jeske, AlanKrause, Annika Linder, Kerstin Schrage, Alina Urban, and allother student raters for their great commitment in conductingthe LPFS ratings.

REFERENCES

American Psychiatric Association. (1994). Diagnostic and statistical manual ofmental disorders (4th ed.). Washington, DC: Author.

American Psychiatric Association. (2013). Diagnostic and statistical manual ofmental disorders (5th ed.). Arlington, VA: Author.

Back, M. D., & Kenny, D. A. (2010). The social relations model: How tounderstand dyadic processes. Social and Personality Psychology Compass,4, 855–870. doi:10.1111/j.1751-9004.2010.00303.x

Bender, D. S. (2012a). Level of Personality Functioning Scale: Training casevignettes. Unpublished manuscript, Department of Psychiatry, University ofArizona College of Medicine.

Bender, D. S. (2012b). Mirror, mirror on the wall: Reflecting on narcissism. Jour-nal of Clinical Psychology: In Session, 68, 877–885. doi:10.1002/jclp.21892

Bender, D. S. (2013). An ecumenical approach to conceptualizing andstudying the core of personality psychopathology: A commentaryon Hopwood et al. Journal of Personality Disorders, 27, 311–319.doi:10.1521/pedi.2013.27.3.311

Bender, D. S., Morey, L. C., & Skodol, A. E. (2011). Toward a model forassessing level of personality functioning in DSM–5, Part I: A reviewof theory and methods. Journal of Personality Assessment, 93, 332–346.doi:10.1080/00223891.2011.583808

Benecke, C., Koschier, A., Peham, D., Bock, A., Dahlbender, R. W., Biebl, W., &Doering, S. (2009). Erste Ergebnisse zu Reliabilitat und Validitat der OPD–2Strukturachse [First results on the reliability and validity of the OPD–2 axisstructure]. Zeitschrift fur Psychosomatische Medizin und Psychotherapie, 55,84–96.

Benjamin, L. S. (1996). Interpersonal diagnosis and treatment of personalitydisorders. New York, NY: Guilford.

Berghuis, H., Kamphuis, J. H., Verheul, R., Larstone, R., & Livesley, J. (in press).The General Assessment of Personality Disorder (GAPD) as an instrumentfor assessing the core features of personality disorders. Clinical Psychology& Psychotherapy. doi:10.1002/cpp.1811

Bornstein, R. F. (1998). Reconceptualizing personality disorder diagnosis in theDSM–V: The discriminant validity challenge. Clinical Psychology: Scienceand Practice, 5, 333–343. doi:10.1111/j.1468-2850.1998.tb00153.x

Bornstein, R. F., & Huprich, S. K. (2011). Beyond dysfunction andthreshold-based classification: A multidimensional model of personal-ity disorder diagnosis. Journal of Personality Disorders, 25, 331–337.doi:10.1521/pedi.2011.25.3.331

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validationby the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.doi:10.1037/h0046016

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluatingnormed and standardized assessment instruments in psychology. Psycholog-ical Assessment, 6, 284–290. doi:10.1037/1040-3590.6.4.284

Clark, L. (2007). Assessment and diagnosis of personality disorder: Perennialissues and an emerging reconceptualization. Annual Review of Psychology,58, 227–257. doi:10.1146/annurev.psych.57.102904.190200

Clarke, D. E., Narrow, W. E., Regier, D. A., Kuramoto, S. J., Kupfer, D. J.,Kuhl, E. A., . . ., & Kraemer, H. C. (2013). DSM–5 field trials in the UnitedStates and Canada, Part I: Study design, sampling strategy, implementa-tion, and analytic approaches. American Journal of Psychiatry, 170, 43–58.doi:10.1176/appi.ajp.2012.12070998

Clarkin, J. F., & Huprich, S. K. (2011). Do DSM–5 personality disorder proposalsmeet criteria for clinical utility? Journal of Personality Disorder, 25, 192–205.doi:10.1521/pedi.2011.25.2.192

Crawford, M. J., Koldobsky, N., Mulder, R., & Tyrer, P. (2011). Classifyingpersonality disorder according to severity. Journal of Personality Disorders,25, 321–330. doi:10.1521/pedi.2011.25.3.321

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests.Psychological Bulletin, 52, 281–302. doi:10.1037/h0040957

First, M. B., Gibbon, M., Spitzer, R. L., Williams, J. B. W., & Benjamin,L. S. (1997). Structured Clinical Interview for DSM–IV Axis II personalitydisorders (SCID–II). Washington, DC: American Psychiatric Press.

First, M. B., Spitzer, R. L., Gibbon, M., &Williams, J. B.W. (2002). StructuredClinical Interview for DSM–IV–TR Axis I disorders (SCID–I). New York,NY: Biometrics Research, New York State Psychiatric Institute.

Furr, R. M., & Funder, D. C. (2007). Behavioral observation. In R. Robins, C.Fraley, & R. Krueger (Eds). Handbook of research methods in personalitypsychology (pp. 273–291). New York, NY: Guilford.

Grande, T., Rudolf, G., & Oberbracht, C. (2000). Veranderungsmessung aufOPD-Basis: Schwierigkeiten und ein neues Konzept [OPD-based assessmentof change: Difficulties and a new concept]. In W. Schneider & H. J. Freyberger(Eds.), Was leistet die OPD? Empirische Befunde und klinische Erfahrungenmit der Operationalisierten Psychodynamischen Diagnostik (pp. 148–161).Bern, Switzerland: Huber.

Gutierrez, F., Navines, R., Navarro, P., Garcıa-Esteve, L., Subira, S., Torrens,M., & Martın-Santos, R. (2008). What do all personality disorders have incommon? Ineffectiveness and uncooperativeness. Comprehensive Psychiatry,49, 570–580. doi:10.1016/j.comppsych.2008.04.007

Hopwood, C. J., Malone, J. C., Ansell, E. B., Sanislow, C. A., Grillo, C. M.,McGlashan, T. H., . . ., & Morey, L. C. (2011). Personality assessment inDSM–5: Empirical support for rating severity, style, and traits. Journal ofPersonality Disorders, 25, 305–320. doi:10.1521/pedi.2011.25.3.305

Hopwood, C. J., Wright, A. G. C., Ansell, E. B., & Pincus, A. L. (2013). Theinterpersonal core of personality pathology. Journal of Personality Disorders,27, 270–295. doi:10.1521/pedi.2013.27.3.270

Johnson, J. G., First, M. B., Cohen, P., & Kasen, S. (2008). Develop-ment and validation of a new procedure for the diagnostic assessmentof personality disorder: The Multidimensional Personality Disorder Rat-ing Scale (MPDRS). Journal of Personality Disorders, 22, 246–258.doi:10.1521/pedi.2008.22.3.246

Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. NewYork, NY: Guilford.

Kenny, D. A., & Kashy, D. A. (2010). Dyadic data analysis using multilevelmodeling. In J. Hox & J. K. Roberts (Eds.), The handbook of multilevelanalysis (pp. 335–370). London, UK: Taylor & Francis.

Kenny, D. A., & West, T. A. (2010). Similarity and agreement in self and otherperception: A meta-analysis. Personality and Social Psychology Review, 14,196–213. doi:10.1177/1088868309353414

Kernberg, O. F. (2012). Overview and critique of the classification of personalitydisorders proposed for DSM–V . Swiss Archives of Neurology and Psychiatry,163, 234–238.

Kernberg, O. F., & Caligor, E. (2005). A psychoanalytic theory of person-ality disorders. In M. F. Lenzenweger & J. F. Clarkin (Eds.), Major the-ories of personality disorder (2nd ed., pp. 114–156). New York, NY:Guilford.

Kraemer, H. C., Kupfer, D. J., Clarke, D. E., Narrow, W. E., & Regier, D.A. (2012). DSM–5: How reliable is reliable enough? American Journal ofPsychiatry, 169, 13–15. doi:10.1176/appi.ajp.2011.11010050

Krueger, R. F., Eaton, N. R., Derringer, J., Markon, K. E., Watson, D., & Skodol,A. E. (2011). Personality in DSM–5: Helping delineate personality disorder

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

12 ZIMMERMANN ET AL.

content and framing the metastructure. Journal of Personality Assessment,93, 325–331. doi:10.1080/00223891.2011.577478

Krueger, R. F., Skodol, A. E., Livesley, W., Shrout, P. E., & Yueqin, H. (2007).Synthesizing dimensional and categorical approaches to personality disor-ders: Refining the research agenda for DSM–V Axis II. International Journalof Methods in Psychiatric Research, 16, 65–73. doi:10.1002/mpr.212

Lahey, M. A., Downey, R. G., & Saal, F. E. (1983). Intraclass correlations:There’s more there than meets the eye. Psychological Bulletin, 93, 586–595.doi:10.1037/0033-2909.93.3.586

Landau, S., & Stahl, D. (2013). Sample size and power calculationsfor medical studies by simulation when closed form expressions arenot available. Statistical Methods in Medical Research, 22, 324–345.doi:10.1177/0962280212439578

Leising, D., Krause, S., Kohler, D., Hinsen, K., & Clifton, A. (2011). Assess-ing interpersonal functioning: Views from within and without. Journal ofResearch in Personality, 45, 631–641. doi:10.1016/j.jrp.2011.08.011

Leising, D., Rogers, K., & Ostner, J. (2009). The undisordered personality:Normative assumptions underlying personality disorder diagnoses. Review ofGeneral Psychology, 13, 230–241. doi:10.1037/a0017139

Leising, D., & Zimmermann, J. (2011). An integrative conceptual frameworkfor assessing personality and personality pathology. Review of General Psy-chology, 15, 317–330. doi:10.1037/a0025070

Livesley, W. J. (1998). Suggestions for a framework for an empirically basedclassification of personality disorder. The Canadian Journal of Psychiatry /La Revue canadienne de psychiatrie, 43, 137–147.

Livesley, W. J. (2012). Tradition versus empiricism in the current DSM–5 pro-posal for revising the classification of personality disorders. Criminal Be-haviour and Mental Health, 22, 81–90. doi:10.1002/cbm.1826

Lobbestael, J., Leurgans, M., & Arntz, A. (2011). Inter-rater reliability of theStructured Clinical Interview for DSM–IV Axis I Disorders (SCID I) andAxis II Disorders (SCID II). Clinical Psychology and Psychotherapy, 18,75–79. doi:10.1002/cpp.693

Lukowitsky, M. R., & Pincus, A. L. (2013). Interpersonal perception of patho-logical narcissism: A social relations analysis. Journal of Personality Assess-ment, 95, 261–273. doi:10.1080/00223891.2013.765881

Luyten, P., & Blatt, S. J. (2013). Interpersonal relatedness and self-definitionin normal and disrupted personality development: Retrospect and prospect.American Psychologist, 68, 172–183. doi:10.1037/a0032243

Mahaffey, K. J., & Marcus, D. K. (2006). Interpersonal perception of psychopa-thy: A social relations analysis. Journal of Social and Clinical Psychology,25, 53–74. doi:10.1521/jscp.2006.25.1.53

Markon, K. E., Chmielewski, M., & Miller, C. J. (2011). The reliability and va-lidity of discrete and continuous measures of psychopathology: A quantitativereview. Psychological Bulletin, 137, 856–879. doi:10.1037/a0023678

McGraw, K. O., & Wong, S. P. (1996). Forming inferences about someintraclass correlation coefficients. Psychological Methods, 1, 30–46.doi:10.1037/1082–989X.1.1.30

Meyer, B., & Pilkonis, P. A. (2005). An attachment model of personality dis-orders. In M. F. Lenzenweger & J. F. Clarkin (Eds.), Major theories ofpersonality disorder (2nd ed., pp. 231–281). New York, NY: Guilford.

Morey, L. C., Bender, D. S., & Skodol, A. E. (2013). Validating the proposedDiagnostic and Statistical Manual of Mental Disorders, 5th edition, severityindicator for personality disorder. Journal of Nervous and Mental Disease,201, 729–735. doi:10.1097/NMD.0b013e3182a20ea8

Morey, L. C., Berghuis, H., Bender, D. S., Verheul, R., Krueger, R. F., &Skodol, A. E. (2011). Toward a model for assessing level of personalityfunctioning in DSM–5, Part II: Empirical articulation of a core dimensionof personality pathology. Journal of Personality Assessment, 93, 347–353.doi:10.1080/00223891.2011.577853

OPD Task Force. (Eds.). (2008). Operationalized Psychodynamic DiagnosisOPD–2: Manual of diagnosis and treatment planning. Cambridge, MA:Hogrefe & Huber.

Parker, G., Both, L., Olley, A., Hadzi-Pavlovic, D., Irvine, P., & Jacobs, G.(2002). Defining disordered personality functioning. Journal of PersonalityDisorders, 16, 503–522. doi:10.1521/pedi.16.6.503.22139

Parker, G., Hadzi-Pavlovic, D., Both, L., Kumar, S., Wilhelm, K., &Olley, A. (2004). Measuring disordered personality functioning: To

love and work reprised. Acta Psychiatrica Scandinavica, 110, 230–239.doi:10.1111/j.1600–0447.2004.00312.x

Pilkonis, P. A., Hallquist, M. N., Morse, J. Q., & Stepp, S. D. (2011). Striking the(im)proper balance between scientific advances and clinical utility: Commen-tary on the DSM–5 proposal for personality disorders. Personality Disorders:Theory, Research, and Treatment, 2, 68–82. doi:10.1037/a0022226

Pincus, A. L. (2011). Some comments on nomology, diagnostic process, andnarcissistic personality disorder in the DSM–5 proposal for personality andpersonality disorders. Personality Disorders: Theory, Research, and Treat-ment, 2, 41–53. doi:10.1037/a0021191

R Core Team. (2012). R: A language and environment for statistical computing.Vienna, Austria: R Foundation for Statistical Computing. Retrieved fromhttp://www.R-project.org

Schauenburg, H., & Grande, T. (2011). Interview measures of interpersonalfunctioning and quality of object relations. In L. M. Horowitz & S. Strack(Eds.), Handbook of interpersonal psychology: Theory, research, assessment,therapeutic interventions (pp. 343–358). Hoboken, NJ: Wiley.

Shedler, J., Beck, A., Fonagy, P., Gabbard, G. O., Gunderson, J., Kernberg,O., . . . , & Westen, D. (2010). Personality disorders in DSM–5. The AmericanJournal of Psychiatry, 167, 1026–1028. doi:10.1176/appi.ajp.2010.10050746

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assess-ing rater reliability. Psychological Bulletin, 86, 420–428. doi:10.1037/0033-2909.86.2.420

Skodol, A. E. (2012). Personality disorders in DSM–5. Annual Review of ClinicalPsychology, 8, 317–344. doi:10.1146/annurev-clinpsy-032511–143131

Skodol, A. E., Bender, D. S., Oldham, J. M., Clark, L. A., Morey, L. C.,Verheul, R., . . ., & Siever, L. J. (2011). Proposed changes in personality andpersonality disorder assessment and diagnosis for DSM–5 Part II: Clinical ap-plication. Personality Disorders: Theory, Research, and Treatment, 2, 23–40.doi:10.1037/a0021892

Stiles, W. B., Elliott, R., Llewelyn, S. P., Firth-Cozens, J. A., Margison,F. R., Shapiro, D. A., & Hardy, G. (1990). Assimilation of problem-atic experiences by clients in psychotherapy. Psychotherapy, 27, 411–420.doi:10.1037/0033–3204.27.3.411

Svrakic, D. M., Whitehead, C., Przybeck, T. R., & Cloninger, C. R. (1993).Differential diagnosis of personality disorders by the seven-factor model oftemperament and character. Archives of General Psychiatry, 50, 991–999.doi:10.1001/archpsyc.1993.01820240075009

Taubner, S., Horz, S., Fischer-Kern, M., Doering, S., Buchheim, A., & Zim-mermann, J. (2013). Internal structure of the Reflective Functioning Scale.Psychological Assessment, 25, 127–135. doi:10.1037/a0029138

Turkheimer, E., Ford, D. C., & Oltmanns, T. F. (2008). Regional analysis of self-reported personality disorder criteria. Journal of Personality, 76, 1587–1622.doi:10.1111/j.1467-6494.2008.00532.x

Tyrer, P. (2012). Diagnostic and Statistical Manual of Mental Disorders: Aclassification of personality disorders that has had its day. Clinical Psychologyand Psychotherapy, 19, 372–374. doi:10.1002/cpp.1810

Tyrer, P., Crawford, M., Mulder, R., Blashfield, R., Farnam, A., Fossati, A., . . . ,& Reed, G. M. (2011). The rationale for the reclassification of personalitydisorder in the 11th revision of the International Classification of Diseases(ICD–11). Personality and Mental Health, 5, 246–259. doi:10.1002/pmh.190

Tyrer, P., & Johnson, T. (1996). Establishing the severity of personality disorder.American Journal of Psychiatry, 153, 1593–1597.

Ullrich, S., Farrington, D. P., & Coid, J. W. (2007). Dimensions of DSM–IVpersonality disorders and life-success. Journal of Personality Disorders, 21,657–663. doi:10.1521/pedi.2007.21.6.657

Verheul, R., & Widiger, T. A. (2004). A meta-analysis of the prevalence andusage of personality disorder not otherwise specified (PDNOS). Journal ofPersonality Disorders, 18, 309–319. doi:10.1521/pedi.18.4.309.40350

Wakefield, J. C. (1992). The concept of mental disorder: On the boundary be-tween biological facts and social values. American Psychologist, 47, 373–388.doi:10.1037/0003-066X.47.3.373

West, T. V., & Kenny, D. A. (2011). The truth and bias model of judgment.Psychological Review, 118, 357–378. doi:10.1037/a0022936

Westen, D., & Arkowitz-Westen, L. (1998). Limitations of Axis II in diagnosingpersonality pathology in clinical practice. American Journal of Psychiatry,155, 1767–1771.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013

ASSESSING LEVEL OF PERSONALITY FUNCTIONING 13

Westen, D., Gabbard, G. O., & Blagov, P. (2006). Back to the future. Personalitystructure as a context for psychopathology. In R. F. Krueger & J. L. Tack-ett (Eds.), Personality and psychopathology (pp. 335–384). New York, NY:Guilford.

Westen, D., & Muderrisoglu, S. (2003). Assessing personality disordersusing a systematic clinical interview: Evaluation of an alternative tostructured interviews. Journal of Personality Disorder, 17, 351–369.doi:10.1521/pedi.17.4.351.23967

Westen, D., & Weinberger, J. (2004). When clinical description becomes sta-tistical prediction. American Psychologist, 59, 595–613. doi:10.1037/0003-066X.59.7.595

Widiger, T. A., & Samuel, D. B. (2005). Evidence-based assessment of person-ality disorders. Psychological Assessment, 17, 278–287. doi:10.1037/1040-3590.17.3.278

Widiger, T. A., & Trull, T. J. (2007). Plate tectonics in the classification of per-sonality disorder: Shifting to a dimensional model. American Psychologist,62, 71–83. doi:10.1037/0003-066X.62.2.71

Wright, A. G. C., & Zimmermann, J. (in press). At the nexus of science andpractice: Answering basic clinical questions in personality disorder assess-ment and diagnosis with quantitative modeling techniques. In S. K. Huprich(Ed.), Personality disorders: Assessment, diagnosis, and research.

Zachar, P., & Kendler, K. (2010). Philosophical issues in the classificationof psychopathology. In T. Millon, R. F. Krueger, & E. Simonsen (Eds.),

Contemporary directions in psychopathology: Scientific foundations of theDSM–V and ICD–11 (pp. 127–148). New York, NY: Guilford.

Zimmerman, M., Rothschild, L., & Chelminski, I. (2005). The prevalenceof DSM–IV personality disorders in psychiatric outpatients. The Ameri-can Journal of Psychiatry, 162, 1911–1918. doi:10.1176/appi.ajp.162.10.1911

Zimmermann, J., Benecke, C., Bender, D. S., Skodol, A. E., Krueger, R. F., &Leising, D. (2013). Personlichkeitsdiagnostik im DSM–5 [Personality assess-ment in DSM–5]. Psychotherapeut, 58, 455–465. doi:10.1007/s00278-013-1009-1

Zimmermann, J., Ehrenthal, J. C., Cierpka, M., Schauenburg, H., Doer-ing, S., & Benecke, C. (2012). Assessing the level of structural inte-gration using Operationalized Psychodynamic Diagnosis (OPD): Impli-cations for DSM–5. Journal of Personality Assessment, 94, 522–532.doi:10.1080/00223891.2012.700664

Zimmermann, J., Stasch, M., Grande, T., Schauenburg, H., & Cierpka, M.(in press). Der Beziehungsmuster–Q–Sort (OPD–BQS): Ein Selbstein-schatzungsinstrument zur Erfassung von dysfunktionalen Beziehungsmusternauf Grundlage der Operationalisierten Psychodynamischen Diagnostik [Mal-adaptive Interpersonal Patterns Q–Sort (MIPQS): A self-report method forassessing maladaptive interpersonal patterns based on Operationalized Psy-chodynamic Diagnosis]. Zeitschrift fur Psychiatrie, Psychologie und Psy-chotherapie.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 05:

50 1

5 N

ovem

ber

2013