Multicriteria decision analysis methods with 1000Minds for developing systemic sclerosis...

10
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights

Transcript of Multicriteria decision analysis methods with 1000Minds for developing systemic sclerosis...

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/authorsrights

Author's personal copy

Multicriteria decision analysis methods with 1000Minds for developingsystemic sclerosis classification criteria

Sindhu R. Johnsona,b,c,*,1, Raymond P. Nadend, Jaap Fransene,1, Frank van den Hoogenf,1,Janet E. Popeg,1, Murray Baronh,1, Alan Tyndalli,1, Marco Matucci-Cerinicj,k,l,1,

Christopher P. Dentonm,2, Oliver Distlern,2, Armando Gabriellio,2, Jacob M. van Laarp,2,Maureen Mayesq,2, Virginia Steenr,2, James R. Seibolds,2, Phillip Clementst,2,

Thomas A. Medsger Jr.u, Patricia E. Carreirav, Gabriela Riemekastenw, Lorinda Chungx,y,Barri J. Fesslerz, Peter A. Merkelaa, Richard Silverbb, John Vargacc, Yannick Allanoredd,

Ulf Mueller-Ladneree, Madelon C. Vonke, Ulrich A. Walkeri, Susanna Cappellij,k,l,Dinesh Khannaff,1

aDivision of Rheumatology, Department of Medicine, Toronto Western Hospital, University of Toronto, Ground Floor, East Wing, 399 Bathurst Street,Toronto, Ontario, Canada M5T 2S8

bDivision of Rheumatology, Department of Medicine, Mount Sinai Hospital, University of Toronto, Toronto, Ontario, CanadacInstitute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada

dAuckland City Hospital, Auckland, New ZealandeDepartment of Rheumatic Diseases, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands

fRheumatology Centre, Sint Maartenskliniek, Nijmegen, The NetherlandsgDivision of Rheumatology, Department of Medicine, St Joseph Health Care, University of Western Ontario, London, Ontario, Canada

hDivision of Rheumatology, Department of Medicine, Jewish General Hospital, McGill University, Montreal, Quebec, CanadaiRheumatology Department, University of Basel, Basel, Switzerland

jDepartment of Rheumatology AVC, University of Florence, Firenze, ItalykDepartment of BioMedicine, University of Florence, Firenze, Italy

lDivision of Rheumatology AOUC, Department of Medicine & Denothecentre, University of Florence, Firenze, ItalymCentre for Rheumatology and Connective Tissue Diseases, Royal Free Hospital, London, UK

nDepartment of Rheumatology, University Hospital Zurich, Zurich, SwitzerlandoDipartimento di Scienze Cliniche e Molecolari, Clinica Medica, Universit!a Politecnica delle Marche, Ancona, ItalypMusculoskeletal Research Group, Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, UK

qThe University of Texas Health Science Center at Houston, Houston, TX, USArDivision of Rheumatology, Clinical Immunology and Allergy, Department of Medicine, Georgetown University School of Medicine, USA

sScleroderma Research Consultants, Avon, CT, USAtDepartment of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

uDivision of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USAvServicio de Reumatolog"ıa, Hospital Universitario 12 de Octubre, Madrid, Spain

wDepartment of Rheumatology, German Rheumatology Research Center, Leibniz Institute, Berlin, GermanyxDivision of Immunology and Rheumatology, Department of Medicine, Stanford University, Stanford, CA, USA

yDepartment of Dermatology, Stanford University, Stanford, CA, USAzDivision of Clinical Immunology and Rheumatology, The University of Alabama at Birmingham, Birmingham, AL, USA

aaDivision of Rheumatology, The University of Pennsylvania, Philadelphia, PA, USAbbDivision of Rheumatology & Immunology, Department of Medicine, Medical University of South Carolina, SC, USA

ccDivision of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, IL, USAddRheumatology A Department, Paris Descartes University, Cochin Hospital, France

eeDepartment of Rheumatology and Clinical Immunology, Justus-Liebig University Giessen, Kerckhoff Clinic, Bad Nauheim, GermanyffScleroderma Program, Division of Rheumatology, University of Michigan, Ann Arbor, MI, USA

Accepted 29 December 2013; Published online 8 April 2014

Funding: This research was supported by the American College ofRheumatology (ACR) Classification and Response Criteria Subcommitteeof the Committee on Quality Measures and the European League AgainstRheumatism (EULAR).

Conflict of interest: S.R.J. is supported by a Canadian Institutes ofHealth Research Clinician Scientist Award and the Norton-Evans Fundfor Scleroderma Research. D.K. was supported by the Scleroderma

Foundation (New Investigator Award) and a National Institutes of HealthAward (NIAMS K24 AR063120). All the other authors report no conflicts.

1 Systemic sclerosis classification criteria steering committee member.2 Expert panel member.* Corresponding author. Tel.: !1-416-603-6417; fax: !1-416-603-

4348.E-mail address: [email protected] (S.R. Johnson).

0895-4356/$ - see front matter ! 2014 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jclinepi.2013.12.009

Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

Abstract

Objectives: Classification criteria for systemic sclerosis (SSc) are being developed. The objectives were to develop an instrument forcollating case data and evaluate its sensibility; use forced-choice methods to reduce and weight criteria; and explore agreement amongexperts on the probability that cases were classified as SSc.

Study Design and Setting: A standardized instrument was tested for sensibility. The instrument was applied to 20 cases covering arange of probabilities that each had SSc. Experts rank ordered cases from highest to lowest probability; reduced and weighted the criteriausing forced-choice methods; and reranked the cases. Consistency in rankings was evaluated using intraclass correlation coefficients(ICCs).

Results: Experts endorsed clarity (83%), comprehensibility (100%), face and content validity (100%). Criteria were weighted (points):finger skin thickening (14e22), fingertip lesions (9e21), friction rubs (21), finger flexion contractures (16), pulmonary fibrosis (14), SSc-related antibodies (15), Raynaud phenomenon (13), calcinosis (12), pulmonary hypertension (11), renal crisis (11), telangiectasia (10),abnormal nailfold capillaries (10), esophageal dilation (7), and puffy fingers (5). The ICC across experts was 0.73 [95% confidence interval(CI): 0.58, 0.86] and improved to 0.80 (95% CI: 0.68, 0.90).

Conclusions: Using a sensible instrument and forced-choice methods, the number of criteria were reduced by 39% (range, 23e14) andweighted. Our methods reflect the rigors of measurement science and serve as a template for developing classification criteria. ! 2014Elsevier Inc. All rights reserved.

Keywords: Scleroderma; Systemic sclerosis; Decision analysis; Forced-choice; Classification criteria; Conjoint analysis; Sensibility

1. Introduction

Classification criteria for rheumatic diseases are impor-tant for research and practice. Previous iterations of classi-fication criteria have been criticized for their lack ofmethodological rigor, inability to reflect the changingconstruct of disease, or inefficiency when applied in thereal world [1e4]. Application of the preliminary criteriafor classification of systemic sclerosis (SSc) [5e7] forrecruitment into trials results in the exclusion of approxi-mately 20% of patients with either early SSc or the limitedsubtype [8e10]. As a result, new classification criteria forSSc are being developed [11].

In phase 1, a total of 168 candidate criteria were gener-ated through Delphi exercises [11]. The items were reducedto 23 criteria using another Delphi exercise and nominalgroup technique. The 23 criteria were validated using SScand SSc-mimicking condition cohorts. The criteria werefound to have good face, discriminant, and construct valid-ity [12]. The next phase of criteria development requiresfurther item reduction, weighting, and scaling. The 2010rheumatoid arthritis classification criteria were successfullydeveloped with a balanced use of expert-based and data-driven methods. Forced-choice methods (facilitated by1000Minds software; http://www.1000Minds.com) allowedfor item reduction and item weighting [13,14]. To use thesemethods in the development of SSc classification criteria, anSSc-specific instrument using a standardized format basedon the 23 candidate items needed to be developed [15].

The aim of this study was to develop an SSc-specific in-strument for use in the forced-choice phase of SSc criteriadevelopment and conduct a forced-choice study to reduceand weight the criteria. The sensibility of an instrument iscritical to whether it is useful or not [15]. Attributes of sen-sibility include comprehensibility, clarity, face validity,

content validity, and feasibility. If an instrument is not sen-sible, it does not warrant use in clinical research [15].Therefore, the first objective was to develop and evaluatethe sensibility of an SSc-specific instrument for use in aforced-choice study. The second objective was to useforced-choice methods to reduce the criteria and valuaterelative weights for each criterion. The third objectivewas to explore the agreement among experts on which pa-tients are considered to have SSc. Given the heterogeneityof SSc, we hypothesized that in the absence of distal ANDproximal skin thickening present at the same time, therewould be variability in agreement among experts on whichpatients have SSc. If this hypothesis were true, it wouldprovide justification for the need to apply standardized clas-sification criteria for inclusion of patients into researchstudies.

2. Methods

2.1. Candidate criteria

Items were generated through consensus exercises re-sulting in 168 candidate criteria. A Delphi exercise andnominal group technique reduced the candidate criteria to23:

1. antinuclear antibody;2. antietopoisomerase-I antibody;3. anticentromere antibody or centromere pattern on

antinuclear antibody test;4. anti-RNA polymerase III antibody;5. anti-PM-Scl antibody;6. scleroderma;7. puffy fingers;8. finger flexion contractures;

707S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

What is new?

" The systemic sclerosis (SSc)especific forms havedemonstrable sensibility: comprehensibility, clarity,face validity, content validity, and feasibility.

" Forced-choice methods using 1000Minds softwarewere successfully used to reduce the number ofcandidate criteria and indicate relative weights.

" The work described in this article defines a systemof criteria development, which produces a measureof the relative probability that a particular case(combination of clinical features) has SSc.

" These methods reflect the rigors of modern mea-surement science and serve as a template for devel-opment of classification criteria for other diseases.

9. tendon or bursal friction rubs;10. calcinosis;11. telangiectasia;12. abnormal nailfold capillary pattern;13. Raynaud phenomenon;14. fingertip ulcers and/or pitting scars;15. digital pulp loss or acro-osteolysis;16. renal crisis;17. pulmonary arterial hypertension;18. interstitial lung disease or pulmonary fibrosis;19. reduced diffusion capacity for carbon monoxide

(DLCO);20. reduced forced vital capacity (FVC);21. dysphagia for solid foods;22. esophageal dilatation; and23. persistent, recurrent gastroesophageal reflux disease

(GERD) by history.

The response option was dichotomized as present or absent[11]. SSc subjects (n 5 783) were compared with subjectswith diseases similar to SSc (n 5 1,071). The candidateitems were found to have good face, discriminant, andconstruct validity [12].

2.2. SSc-specific instrument development

The purpose of the disease-specific instrument was topresent SSc patientebased cases that differ in clinical char-acteristics in a standardized format [15]. Using Dillmanmethods [16], a standardized instrument was developedbased on the 23 candidate criteria [11].

2.3. Evaluation of sensibility

Sensibility is critical to the usefulness of an instrument[17]. Attributes of sensibility include comprehensibility,clarity of instructions and response options, face validity,content validity, and feasibility [15]. The SSc-specific

instrument was independently evaluated by four investiga-tors for clarity, format, visual design, organization, andnavigation through the form [15]. Members of the SSc clas-sification criteria steering committee (n5 6) pilot tested sixcases and the ranking procedure. This committee wascomposed of North American and European experts, eachwith experience in SSc care and research. The SSc-specific instrument and ranking procedure were evaluatedfor clarity of the instructions, comprehensibility of theresponse option, and time to completion. The cases and in-structions were revised based on pilot testing results.

The SSc-specific instrument was applied to 34 casesfrom 20 Scleroderma Centers in North America and Eu-rope. The cases were purposively sampled to reflect thespectrum of probability that each case could be classifiedas having SSc (high, intermediate, or low probability) byexperts. The case mix was sampled to have a balance of pa-tients with early disease (!2 years since the onset of thefirst non-Raynaud symptom) vs. late disease and those withthe limited and diffuse subtypes. Cases had a spectrum ofcombinations of present and absent items from among the23 criteria. Each case was assigned a unique architecturallybased identifier. This method has been demonstrated toreduce potential bias from alphanumeric labeling and hasbeen used in other criteria development studies [14].Twenty of 34 cases were purposively sampled for theranking procedure, and the remainder held in reserve(Table 1).

2.4. Forced-choice study

Four European and four North American SSc expertswere invited to participate in the forced-choice phase[18]. Experts were selected based on geographic represen-tation, clinical and research expertise, and availability. Atleast half of them had not participated significantly in pre-vious phases of this criteria development [11,12]

2.5. First ranking exercise

The 20 cases, answer form, standardized definitions, andstandardized instructions were sent electronically to theexpert panel (Appendices AeC at www.jclinepi.com). Tominimize the presence of ordering bias, the cases weresorted randomly. A randomly selected half of the experts(group A) were given one set of cases. The cases wereagain randomly sorted and given to the remaining experts(group B). Using a standardized response form, expertswere asked to rank order the cases from 1 (highest proba-bility of having SSc) to 20 (lowest probability of havingSSc). Data from earlier phases of criteria development werenot given to the experts to reduce the effect of anchoringbias [19]. A logic check was incorporated into the responseform to reduce data errors. Experts were blinded to otherexperts’ rankings.

708 S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

2.6. Item reduction and weighting

To enumerate experts’ preferences for criteria contrib-uting to the probability of a case being classified as SSc,a multicriteria decision analysis (forced-choice methods,conjoint analysis) was used. These methods are suited tothe valuation of criteria that differ, but each may contributeto the overall construct of the classification of disease [20].The experts met for a consensus meeting and were pre-sented their rankings. The experts discussed the need for(1) entry criterion, (2) exclusion criterion, and (3) sufficient(or absolute) criterion for inclusion (Fig. 1). The criteriawere discussed individually, to determine whether anycould be grouped (if they were not independent, eg, theSSc-related antibodies) or were essentially aspects of thesame essential criterion (eg, those related to lung involve-ment). The definitions of the criteria were examined toensure that they could be interpreted and applied consis-tently (Appendix D at www.jclinepi.com).

The experts participated in a multicriteria decision anal-ysis to determine the relative weighting of each of thecriteria using 1000Minds software [12]. The panel was pre-sented paired scenarios differing in two criteria (Fig. 2). Foreach pair, they anonymously (using touch keypads) selectedthe scenario they believed had a higher probability of beingclassified as having SSc. The distribution of opinions oneach dilemma was presented to the group. When therewas no agreement, the reasons for disagreement were dis-cussed. Sometimes, the group revoted. Consensus wasconsidered achieved when all experts indicated agreementor could accept the majority decision. Through iterativediscrete pairwise choices, the decision analytical softwarewas able to assign relative weights to the criteria [12]. Dur-ing this consensus and weighting process, criteria wereeliminated and revised. The weighting exercise wasrepeated when all the criteria had been agreed on. The re-sulting weights were presented to the expert panel. Ourmethods conform with good research practices for conjointanalysis [20].

2.7. Second ranking exercise

The SSc-specific instrument and the initial 20 caseswere revised, eliminating the criteria that had been dis-carded and refining terms and definitions. The revised casesand a standardized answer form were sent to the expertsand steering committee (n 5 13). Participants were askedto order the cases from 1 (highest probability of havingSSc) to 20 (lowest probability of having SSc).

2.8. Analysis

Summary statistics were used to describe the data.Randomization was achieved using a computerized randomnumber generator. The distribution of rankings for each caseand each expert panel member was plotted. The consistencyin rankings was evaluated using an intraclass correlation

coefficient (ICC). Landis and Koch have characterized valuesof reliability coefficients as follows: slight (0e0.20), fair(0.21e0.40), moderate (0.41e0.60), substantial (0.61e0.80),and almost perfect (0.81e1.00) [21]. The ICC [1,2] of Shroutand Fleiss was used to assess the reliability of k 5 8 ratings,assuming that all 20 cases were rated by the same raters whoare a subset of all possible raters [23]. With eight observationsfor each case and a hypothesized value of 0.4 for reliability, wehad 80% power to reject a minimally acceptable level of reli-ability of 0.2 witha5 0.05 [22]. Analyseswere performed us-ing R (version 2.2.1; The R Foundation for StatisticalComputing, Vienna, Austria).

Institutional research ethics board approval was obtainedfor collection of patient data.

3. Results

3.1. SSc-specific instrument testing

Participants endorsed clarity and ease of navigation ofthe form [83% (5 of 6)], clarity of the instructions [100%(6 of 6)], and clarity of the response option [100% (6 of6]. None of the participants endorsed insufficient face orcontent validity. The median time to completion was 10 mi-nutes (range, 10e20 minutes).

3.2. SSc experts

All experts invited to participate consented. Five weremen. The median years treating SSc patients were 30(range, 13e40). Fifty percentage of experts practice in Eu-rope and 50% in North America. Thirty-eight percentage (3of 8) had some involvement in earlier iterations of criteriadevelopment. All the steering committee members (n 5 8)participated in the ranking, so they had a full understandingof the methods, but their rankings were not included in thefirst ranking analysis to reduce bias.

3.3. First ranking exercise

Table 1 summarizes the case characteristics. Experts’initial ranking of cases is presented in Fig. 3. For the firstranking exercise, the ICC for agreement across expertswas 0.73 [95% confidence interval (CI): 0.58, 0.86]. TheICC for expert group A was 0.68 (95% CI: 0.48, 0.84),and the ICC for expert group B was 0.76 (95% CI: 0.60,0.88). The overlapping confidence intervals that includeboth point estimates suggest there was no significant differ-ence in rankings between expert groups.

3.4. Item reduction and weighting

Through consensus discussion, the experts agreed onfurther precision in the use of the criteria. They decided that‘‘skin thickening sparing the fingers’’ would be an exclusioncriterion. If this exclusion was present, the use of the SScclassification criteria should not proceed. The expert panel

709S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

agreed that ‘‘skin thickening of the fingers extending prox-imal to the metacarpophalangeal joints’’ was a sufficient cri-terion for classification as SSc. If present, the patient couldbe classified as SSc, and use of the classification criteria isnot required. After the weighting exercise, FVC, DLCO,antinuclear antibody, dysphagia, persistent GERD, anddigital pulp loss were discarded because of their relativelow weights, difficulties in assessing reliably in a clinicalcontext, or having low value in discriminating patients withhigh from low probability of SSc. Skin thickening was rede-fined as skin thickening of the fingers, with response options

(1) distal to metacarpophalangeal joints or (2) distal to prox-imal interphalangeal joints. Finger lesions were consolidatedunder ‘‘fingertip lesions,’’ with the response options catego-rized as (1) pitting scars, (2) digital tip ulcers, or (3) clinicalevidence of acro-osteolysis. Serology was consolidated un-der ‘‘scleroderma-related antibodies’’ and categorized aspositive or negative. This process reduced candidate criteriafrom 23 to 14. The relative weightings are presented inTable 2. The ranking of the reduced weighted criteria wascompared with the empirical ranking based on frequency ofcriteria in cases and controls using SSc cohorts [12]. TheICC for agreement was at 0.44 (P 5 0.045).

Fig. 1. Conceptual framework for systemic sclerosis classificationcriteria development.

Fig. 2. Example of how questions were posed to participants in thediscrete choice experiment. GERD, gastroesophageal reflux disease;SSc, systemic sclerosis.

Table 1. Summary of case characteristics

AlternativeSkin

thickening

Abnormalnailfold

capillaries Calcinosis

Digitalpulp loss

OR acro-osteolysisDysphagiafor solids

Esophagealdilation

Finger flexioncontracture

Fingertipulcers

or pittingscars

Puffyfingers

ILD orpulmonaryfibrosis PAH

Dome d Yes d d d d d d d d dColonnade d Yes d d d d d d d Yes dExedra d Yes Yes d Yes d d d d Yes YesCupola d d Yes d d d d d Yes d dSima d d d d d d d d Yes d YesFrieze d d d d d d d Yes Yes Yes YesNave d d d d d d d d Yes d dPortico d Yes d d d d d d d d dNaos d Yes d d Yes Yes d d d d dFinial d Yes d d d Yes d d d d dKeystone d d d d d Yes d d d d dCornice d d d d d d d d d d YesPediment Present but

not fingersd d d d d d d d d Yes

Gutta Present butnot fingers

Yes d d d Yes d d d Yes d

Column d Yes d d d d d d d d dDiocletian Present with

fingersd d d d Yes d Yes Yes Yes d

Metope Present withfingers

d d d d Yes d Yes d Yes d

Peristyle Present withfingers

d d d d d d d Yes d d

Pilaster Present withfingers

Yes Yes Yes d d d Yes Yes d d

Triglyph d d d d d d d d d d Yes

Abbreviations: ILD, interstitial lung disease; PAH, pulmonary arterial hypertension; GERD, gastroesophageal reflux disease; Ab, antibody;DLCO, diffusion capacity for carbon monoxide; FVC, forced vital capacity.

710 S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

3.5. Second ranking exercise

Experts’ second ranking of cases is presented in Fig. 4.For the second ranking exercise done after the face-to-face meeting, the ICC for agreement across experts was0.80 (95% CI: 0.68, 0.90).

4. Discussion

Methodological rigor is necessary for quality data andvalid inferences [24], and this is particularly true as we

enter an era of disease classification criteria redevelopment[1]. In this article, we outline the methods used to developan SSc-specific instrument and strategies used to reduce theeffect of potential biases. Often, methodological detail isexcluded, thereby limiting critical evaluation of themethods used. Our explicit methodological detail in criteriadevelopment may inform other groups, particularly if theyplan to use forced-choice methods endorsed by the Amer-ican College of Rheumatology (ACR) for the developmentof rheumatoid arthritis classification criteria [14]. Themethodologies outlined conform to modern standards ofmeasurement science and the recommendations of ACRand European League Against Rheumatism [2,25].

Our methods have used bias reduction strategies. Wehave used strategies to reduce potential biases resultingfrom lack of clarity in the instructions, misunderstandingthe response option, lack of clarity of the instrument, alpha-numeric labeling of cases, data entry errors, and orderingbias [15,16,26]. All these potential sources of bias havebeen shown to reduce the validity and/or reliability of thedata [19]. Our forms have demonstrable clarity and feasi-bility. In addition, blinding was used, so no respondentwas ‘‘singled out’’ as an outlier. Although the SSc classifi-cation criteria steering committee participated in the pro-cess, their rankings were not included in this analysis.This was done to reduce potential bias as the committeemembers have been involved in most of the process to date.

Fig. 3. Experts’ rankings of the relative probability that each of the 20cases has systemic sclerosis in first ranking exercise. Cases are pre-sented in the order of the median rank (x-axis). The ranks (y-axis)for each case are those assigned by each expert (AeG). Some pointsshown represent multiple coinciding data points.

GERDRaynaud

phenomenonRenalcrisis Telangiectasia

Tendonor bursal

friction rubsAntinuclear

AbAnticentromere

AbAntie

topoisomerase-I Ab

Anti-PM-SclAb

Anti-RNApolymerase

III Ab

DLCO!80%predicted

FVC!80%predicted

Yes Yes Yes d d ! ! # # # d dYes Yes d d d ! # ! # # Yes YesYes Yes d No d ! # # # # Yes YesYes d d d d ! ! # # # d dd d d d d ! ! # # # d dd d d d d ! # # # # Yes Yesd d d d d ! # ! # # d dYes Yes d d d ! # ! # # d dYes Yes d d d ! # # # # Yes YesYes Yes d d d ! # # # # d dYes d d d d ! # ! # # d dd Yes d d d ! # # # # Yes dYes d d d d ! # # # # d d

Yes Yes d d d ! # # ! # d Yes

Yes d d d d ! # # # # d dYes Yes d d d ! # ! # # Yes Yes

Yes Yes d Yes d ! # # # # Yes d

Yes Yes d Yes d ! # # # ! d d

Yes Yes d Yes d ! ! # # # d d

d Yes d d d ! # # # # d d

711S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

Circularity of reasoning is a bias that can occur when thesame experts who contribute patients are the ones whocontribute to criteria development [1]. This is a critical biasto previous iterations of classification criteria development[1,3]. We have ensured that some experts have not beeninvolved in previous phases of the current criteria develop-ment to avoid the bias introduced by circular reasoning.Together, our evaluation of instrument design, implementa-tion methods, and pilot testing are quality indicators ofgood research practice [20].

We have detailed the methodological steps in the itemreduction phase, leading up to and including the forced-choice experiment. The ranking of cases representing thespectrum of probability of having SSc, the group discussionamong experts regarding possible reasons for lack of agree-ment in case rankings, and the discarding of criteria that

lack reliability or discriminant validity were necessaryprerequisites to the forced-choice experiment. Our use ofpsychometric properties of the candidate criteria as a meansof item reduction led to a more manageable number ofcriteria and informed discussion during the forced-choiceexperiment.

A strength of this study is the use of forced-choicemethods. Forced-choice methodology allowed us to iden-tify relatively weak criteria and determine a hierarchyamong a reduced number of criteria that are influential inSSc classification. This is a more robust, scientific approachthan retaining numerous criteria that are all equallyweighted (and not necessarily reflective of the constructof the disease). The use of forced-choice methods to reduceitems and ascertain relative weights, instead of additionalDelphi exercises with/without nominal group technique,conferred a number of advantages. 1000Minds softwarefacilitated a systematic process for a points-based prioriti-zation system based on clinical cases. Using experts’knowledge and preferences, this method seeks to rank allhypothetically possible patients representable by a givenpoints system. From the overall ranking, it will derive thepoint values for the point stem, thereby matching the expertpanels’ knowledge and preferences. The overall ranking ofall hypothetically possible patients is arrived at by askingthe expert panel to make trade-offs between two criteriaat a time. The pairwise ranking of just two alternatives iscognitively less burdensome to derive the weights [14]. Us-ing the property of transitivity (if a O b and b O c, thena O c), tens of thousands of cases can be automaticallyand incontrovertibly ranked. This method is more efficientthan others because any pairwise decisions in which oneoption clearly has a high probability or in which consensus

Table 2. Relative weighting of reduced classification criteria

Criteria Subcriteria Weight

Skin thickening of the fingers (count only one of these two) Distal to PIP only 14Whole finger, distal to MCP 22

Fingertip lesions (count only one of these three) Digital tip ulcers 9Pitting scars 16Clinical evidence of acro-osteolysis 21

Finger flexion contractures 16Telangiectasia 10Abnormal nailfold capillaries 10Puffy fingers 5Calcinosis 12Raynaud phenomenon 13Tendon or bursal friction rubs 21Interstitial lung disease (ILD) or pulmonary fibrosis (PF) 14Pulmonary hypertension (without ILD/PF) 11Renal crisis 11Esophageal dilatation 7Scleroderma-related antibodies (any of anticentromere,

antietopoisomerase I [anti-Scl 70], anti-RNA polymerase III)15

Total score

Abbreviations: PIP, proximal interphalangeal; MCP, metacarpophalangeal.‘‘Skin thickening sparing the fingers’’ is an exclusion criterion. If present, the use of the SSc classification criteria should not proceed further.‘‘Skin thickening of the fingers and proximal to the metacarpophalangeal joints’’ is an absolute criterion. If present, the patient could be clas-

sified as SSc, and further use of the classification criteria is not required.

Fig. 4. Experts’ rankings of the relative probability that each of the 20cases has systemic sclerosis in second ranking exercise. Cases arepresented in the order of the median rank (x-axis). The ranks (y-axis)for each case are those assigned by each expert (AeK). Some pointsshown represent multiple coinciding data points.

712 S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

has been achieved are not presented. This method can beadministered over the Internet, and individual criterioncan be modified, such as when new information becomesavailable, and the weightings recalculated without disturb-ing the validity of the method or the previous consensus de-cisions made [14]. This methodology has been successfullyused in other areas [27,28].

A potential limitation of this study is the relatively smallnumber of cases used in the ranking procedure and thesmall number of experts involved. The inclusion of addi-tional cases would result in a greater representation of thediversity of clinical characteristics. In contrast, strengthsof our approach are that all the criteria were included ineach case as present or absent and all cases had differentpositive and negative items that varied. Furthermore, previ-ous work has shown that participants have difficulty per-forming relative rankings of more than about 20 cases.The inclusion of additional cases reduces the validity andreliability of the rankings, thereby affecting the internal val-idity of the study findings.

In conclusion, our SSc-specific forms have demonstrablesensibility. Forced-choice methods were successfully usedto reduce criteria and indicate relative weights. The workdescribed in this article defines a system of criteria develop-ment, which produces a measure of the relative probabilitythat a particular case (combination of clinical features) hasSSc. The next phase of SSc classification criteria develop-ment will determine the need for further item reduction,possible reweighting, and scaling [29,30]. The appropriatethreshold to classify a patient as having SSc will need tobe identified. The impact of criteria elimination, criteriaclustering, and threshold on sensitivity and specificity ofthe criteria will need to be tested using data-drivenmethods. Our methods reflect the rigors of modern mea-surement science and may serve as a template for othergroups developing classification criteria.

Appendix

Supplementary data

Supplementary data related to this article can be found athttp://dx.doi.org/10.1016/j.jclinepi.2013.12.009.

References

[1] Felson DT, Anderson JJ. Methodological and statistical approaches tocriteria development in rheumatic diseases. Baillieres Clin Rheuma-tol 1995;9(2):253e66.

[2] Singh JA, Solomon DH, Dougados M, Felson D, Hawker G, Katz P,et al. Development of classification and response criteria for rheu-matic diseases. Arthritis Rheum 2006;55:348e52.

[3] Johnson SR, Goek ON, Singh-Grewal D, Vlad SC, Feldman BM,Felson DT, et al. Classification criteria in rheumatic diseases: a re-view of methodologic properties. Arthritis Rheum 2007;57:1119e33.

[4] Hudson M, Taillefer S, Steele R, Dunne J, Johnson SR, Jones N, et al.Improving the sensitivity of the American College of Rheumatologyclassification criteria for systemic sclerosis. Clin Exp Rheumatol2007;25:754e7.

[5] Preliminary criteria for the classification of systemic sclerosis(scleroderma). Bull Rheum Dis 1981;31(1):1e6.

[6] Masi AT, Medsger TA Jr, Rodnan GP, Fries JF, Altman RD,Brown BW, et al. Methods and preliminary results of the sclerodermacriteria cooperative study of the American Rheumatology Associa-tion. Clin Rheum Dis 1979;5(1):27e48.

[7] Preliminary criteria for the classification of systemic sclerosis(scleroderma). Subcommittee for scleroderma criteria of the Amer-ican Rheumatism Association Diagnostic and Therapeutic CriteriaCommittee. Arthritis Rheum 1980;23:581e90.

[8] Wollheim FA. Classification of systemic sclerosis. Visions and real-ity. Rheumatology(Oxford) 2005;44(10):1212e6.

[9] Lonzetti LS, Joyal F, Raynauld JP, Roussin A, Goulet JR, Rich E,et al. Updating the American College of Rheumatology preliminaryclassification criteria for systemic sclerosis: addition of severe nail-fold capillaroscopy abnormalities markedly increases the sensitivityfor limited scleroderma. Arthritis Rheum 2001;44:735e6.

[10] Hachulla E, Launay D. Diagnosis and classification of systemic scle-rosis. Clin Rev Allergy Immunol 2011;40(2):78e83.

[11] Fransen J, Johnson SR, van den Hoogen F, Baron M, Allanore Y,Carreira PE, et al. Items for developing revised classification criteriain systemic sclerosis: results of a consensus exercise. Arthritis CareRes (Hoboken) 2012;64(3):351e7.

[12] Johnson SR, Fransen J, Khanna D, Baron M, van den Hoogen F,Medsger TA Jr, et al. Validation of potential classification criteria forsystemic sclerosis. Arthritis Care Res (Hoboken) 2012;64(3):358e67.

[13] Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT,Bingham CO 3rd, et al. 2010 Rheumatoid arthritis classificationcriteria: an American College of Rheumatology/European LeagueAgainst Rheumatism collaborative initiative. Arthritis Rheum 2010;62:2569e81.

[14] Neogi T, Aletaha D, Silman AJ, Naden RL, Felson DT, Aggarwal R,et al. The 2010 American College of Rheumatology/EuropeanLeague Against Rheumatism classification criteria for rheumatoidarthritis: phase 2 methodological report. Arthritis Rheum 2010;62:2582e91.

[15] Feinstein A. The theory and evaluation of sensibility. In: Feinsetin A,editor. Clinimetrics. New Haven, CT: Yale Universty Press; 1987:141e65.

[16] Dillman DS, Smyth JD, Christian LM. Internet, mail, and mixed-mode surveys. The tailored design method. Hoboken, NJ: John Wiley& Sons, Inc; 2009.

[17] Wright JG, McLeod RS, Lossing A, Walters BC, Hu X. Measurementin surgical clinical research. Surgery 1996;119:241e4.

[18] Hansen P, Ombler F. A new method for scoring multi-attribute valuemodels using pairwise rankings of alternatives. J Multi-Crit DecisAnal 2009;15:87e107.

[19] Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Feldman BM.Methods to elicit beliefs for Bayesian priors: a systematic review. JClin Epidemiol 2010;63:355e69.

[20] Bridges JFP, Hauber B, Marshall D, Lloyd A, Prosser LA,Regier DA, et al. A checklist for conjoint analysis applications inhealth: report of the ISPOR conjoint analysis good research practicestask force. Value Health 2011;14:403e13.

[21] Landis JR, Koch GG. The measurement of observer agreement forcategorical data. Biometrics 1977;33:159e74.

[22] Donner A, Eliasziw M. Sample size requirements for reliabilitystudies. Stat Med 1987;6:441e8.

[23] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing raterreliability. Psychol Bull 1979;86:420e8.

[24] Feinstein AR. Clinical epidemiology. The architecture of clinicalresearch. Philadelphia, PA: W. B. Saunders Company; 1985.

713S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714

Author's personal copy

[25] Dougados M, Gossec L. Classification criteria for rheumatic diseases:why and how? Arthritis Rheum 2007 15;57:1112e5.

[26] Streiner DL, Norman GR. Health measurement scales. A practicalguide to their development and use. 4th ed. Oxford, UK: Oxford Uni-versit Press; 2008.

[27] Stewart R, Barber A, Hamer A, Mahon B, Ruygrok P, Kang N,et al. Comparison of a clinical score with individual clinician judge-ment for assigning priority for heart valve surgery. Eur Heart J2010;31:71.

[28] Fitzgerald A, De Coster C, Naden R, Noseworthy T, Western CanadaWait List Project Rheumatology Clinical Panel. Priority-setting for

referrals from primary case physicians to rheumatologists. ArthritisRheum 2008;58(Suppl):S884.

[29] van den Hoogen F, Khanna D, Fransen J, Johnson SR, Baron M,Tyndall A, et al. 2013 Classification criteria for systemic sclerosis: anAmerican College of Rheumatology/European League Against Rheu-matism collaborative initiative. Arthritis Rheum 2013;65:2737e47.

[30] van den Hoogen F, Khanna D, Fransen J, Johnson SR, Baron M,Tyndall A, et al. 2013 Classification criteria for systemic sclerosis:an American College of Rheumatology/European League AgainstRheumatism collaborative initiative. Ann Rheum Dis 2013;72(11):1747e55.

714 S.R. Johnson et al. / Journal of Clinical Epidemiology 67 (2014) 706e714