a reliable and valid instrument for measuring clinical disease ...

12
Quarterly Journal ol Medicine, 1993; 86:447-458 The BILAG index: a reliable and valid instrument for measuring clinical disease activity in systemic lupus erythematosus E.M. HAY, P.A. BACON, C. CORDON, D.A. ISENBERG, P. MADDISON, M.L. SNAITH, D.P.M. SYMMONS, N. VINER and A. ZOMA From the ARC Epidemiology Research Unit, Stopford Building (University oi Manchester), Oxford Road, Manchester M13 9PT Received 15 February 1993; Accepted 13 March 1993 Summary The British Isles Lupus Assessment Group (BILAG) index is a computerized index for measuring clin- ical disease activity in systemic lupus eryth- ematosus (SLE), which was developed according to the principle of the physician's 'intention to treaf. The index allocates separate alphabetic scores to each of eight organ-based systems; a total score is not calculated. This study demon- strated good between-rater reliability for the BILAG index for each organ-based system. There was no evidence of bias between observers. The BILAG index had good overall sensitivity (87%) and specificity (99%) when compared with the 'gold standard' criterion (starting or increasing disease-modifying therapy). There were high pos- itive predictive values overall (80%), and for each organ-based system, with the exception of the neurological system. Introduction SLE is a complex multi-system disease; this com- plexity makes the disease difficult to monitor. In particular, there are problems in quantifying dis- ease activity across systems, and in differentiating potentially reversible organ dysfunction (due to active disease) from irreversible organ damage. Defining the term 'activity 7 is also a problem. Poor correlations were found between physicians' scores when they assessed patients using a semi- quantitative clinical rating scale. 1 In order to enable physicians to communicate using a common language, standardized quantitative measures of SLE clinical disease activity are required. Such scales would be of value in clinical trials, in longitudinal studies of outcome, or the evaluation of immunological markers. More than 60 scales for measuring clinical dis- ease activity in SLE have been developed and used in a variety of studies. 2 ' 3 Only recently, however, have scales have been tested for reliabil- ity and validity. 1 " 9 A scale should give the same result when two or more physicians assess the same patient (between-rater reliability), and when one physician assesses a patient with a steady clinical state at different points in time (within- rater reliability). Variability may result from either systematic (bias) or random measurement errors. Within-rater reliability is difficult to test because raters may remember the previous score if the time between assessments is short or, if the assess- ments are a long time apart, the patient's clinical condition may have changed. However, because within-rater reliability contributes to between-rater reliability, if between-rater reliability is good then within-rater reliability can be also assumed to be good. 10 The validity of clinical activity scales can be assessed in several ways. 11 First, a scale should appear to measure disease activity, and the indi- vidual variables should be measured in a generally accepted way (face validity). Second, a scale should include an adequate number of variables, Oxford University Press 1993 by guest on November 25, 2014 Downloaded from

Transcript of a reliable and valid instrument for measuring clinical disease ...

Quarterly Journal ol Medicine, 1993; 86:447-458

The BILAG index: a reliable and valid instrument formeasuring clinical disease activity in systemic lupuserythematosus

E.M. HAY, P.A. BACON, C. CORDON, D.A. ISENBERG, P. MADDISON,M.L. SNAITH, D.P.M. SYMMONS, N. VINER and A. ZOMA

From the ARC Epidemiology Research Unit, Stopford Building (University oi Manchester),Oxford Road, Manchester M13 9PT

Received 15 February 1993; Accepted 13 March 1993

Summary

The British Isles Lupus Assessment Group (BILAG)index is a computerized index for measuring clin-ical disease activity in systemic lupus eryth-ematosus (SLE), which was developed accordingto the principle of the physician's 'intention totreaf. The index allocates separate alphabeticscores to each of eight organ-based systems; atotal score is not calculated. This study demon-strated good between-rater reliability for the

BILAG index for each organ-based system. Therewas no evidence of bias between observers. TheBILAG index had good overall sensitivity (87%)and specificity (99%) when compared with the'gold standard' criterion (starting or increasingdisease-modifying therapy). There were high pos-itive predictive values overall (80%), and for eachorgan-based system, with the exception of theneurological system.

IntroductionSLE is a complex multi-system disease; this com-plexity makes the disease difficult to monitor. Inparticular, there are problems in quantifying dis-ease activity across systems, and in differentiatingpotentially reversible organ dysfunction (due toactive disease) from irreversible organ damage.Defining the term 'activity7 is also a problem. Poorcorrelations were found between physicians'scores when they assessed patients using a semi-quantitative clinical rating scale.1 In order toenable physicians to communicate using acommon language, standardized quantitativemeasures of SLE clinical disease activity arerequired. Such scales would be of value in clinicaltrials, in longitudinal studies of outcome, or theevaluation of immunological markers.

More than 60 scales for measuring clinical dis-ease activity in SLE have been developed andused in a variety of studies.2'3 Only recently,however, have scales have been tested for reliabil-ity and validity.1"9 A scale should give the same

result when two or more physicians assess thesame patient (between-rater reliability), and whenone physician assesses a patient with a steadyclinical state at different points in time (within-rater reliability). Variability may result from eithersystematic (bias) or random measurement errors.Within-rater reliability is difficult to test becauseraters may remember the previous score if thetime between assessments is short or, if the assess-ments are a long time apart, the patient's clinicalcondition may have changed. However, becausewithin-rater reliability contributes to between-raterreliability, if between-rater reliability is good thenwithin-rater reliability can be also assumed tobe good.10

The validity of clinical activity scales can beassessed in several ways.11 First, a scale shouldappear to measure disease activity, and the indi-vidual variables should be measured in a generallyaccepted way (face validity). Second, a scaleshould include an adequate number of variables,

Oxford University Press 1993

by guest on Novem

ber 25, 2014D

ownloaded from

448 E.M. Hay et al.

chosen in an appropriate manner (content validity).Third, a scale should agree with an external cri-terion, the 'gold standard', considered to be asuperior measure of disease activity (criterionvalidity). Fourth, a scale should correlate positivelywith the other similar scales, or with laboratorymarkers of disease activity, and should differentiatebetween patient groups (e.g. in-patients versusout-patients) who would be expected to havediffering levels of disease activity (constructvalidity)

The BILAG indexIn 1984, rheumatologists from five centres in theUnited Kingdom developed a new index for meas-uring disease activity in SLE.6 The British Isles LupusAssessment Croup (BILAC) index was developedaccording to the principle of the physician's 'inten-tion to treat' on the premise that, whilst physiciansmight not agree about the significance of individualclinical features or laboratory tests, there wasbroad agreement about when to treat lupuspatients with disease-modifying therapy such ashigh dose corticosteroids or immunosuppressants.The index allocates separate alphabetic scores toeach of eight organ-based systems (general, muco-cutaneous, neurological, musculoskeletal, car-diorespiratory, vasculitis and thrombosis, renal,haematological). A total score is not calculated.The BILAC index was designed specifically for useas a transitional index, and scores both newlypresenting features and changes in previouslyreported features. A computerized version wasdeveloped to combine the advantages of a com-prehensive and flexible database with easy entry,recall and analysis of data.

Early development of the BILAG index

The BILAG index (version 1) was tested forbetween-rater agreement, face validity and con-struct validity by Symmons et al. Following thisstudy, the BILAG index was modified (version 2)because of poor face validity; for example, it wastoo sensitive to minor fluctuations in bloodpressure.

Liang et al.5 tested the reliability and constructvalidity of six systems. The Systemic Lupus ActivityMeasure (SLAM),3 the SLE Disease Activity Index(SLEDAI)4 and the BILAC index (version 2) had highbetween-rater and within-rater reliability. Therewere high correlations between the instrumentswhen used cross-sectionally, but less similaritybetween instruments when used to measurechange in disease activity between visits. BILAC

was the most sensitive instrument for measuringchange in activity following the commencementof specific therapy, followed by SLAM; SLEDAI wasthe least sensitive. The BILAC system was the mostdifficult to complete, partly because of a lack ofdefinitions and rules, and partly because (in thisstudy) the score was calculated manually. Highcorrelations between BILAG, SLAM and SLEDAIwere also demonstrated in a cross-sectional studyby Cladman et al?

The results from a multicentre study in the UK,however, demonstrated both systematic andrandom between-rater variability in scoring mus-culoskeletal and vasculitic involvement usingBILAC (version 2).12 Additional problems with theindex, for example, large numbers of patientsscoring 'B' in the neurological system because ofmigraine headaches, and the inability of the indexto differentiate between patients with mild stableinvolvement and those with past involvement ofa system, were also reported in this study. Theindex was modified, therefore, with the aim ofimproving its reliability and validity.

Modification of the BILAG index

The BILAG index (version 3) was produced using anominal consensus technique (Appendix 1).Modifications included the clarification of ques-tions to avoid ambiguous terms, the provision ofa glossary (Appendix 2), definition of a time scale,the standardization of the method used to calcu-late BILAG scores, and the addition of a newcategory (E). The BILAG computer programme wasupdated and refined. Details of the new scoringsystem are shown in Table 1.

Studies of between-rater reliability and validityof BILAG (version 3) are reported below.

Table 1 Scoring system for the BILAC index (version 3)

Category A Denotes disease thought to be suffi-ciently active to require disease-modify-ing treatment (prednisolone > 20mgdaily or immunosuppressants)

Category B Denotes disease which is less active thanin "A"; mild reversible problemsrequiring only symptomatic therapysuch as antimalarials, non-steroidalanti-inflammatory drugs or prednisolone< 20 mg/day

Category C Indicates stable mild diseaseCategory D System previously affected but currently

inactiveCategory E Indicates system never involved

by guest on Novem

ber 25, 2014D

ownloaded from

The BILAC index 449

The objectives of this study were:1. To determine the between-rater reliability of theBILAG index.2. To determine the criterion validity of the BILACindex by testing for agreement between BILACscore "A" and the "gold standard" criterion,defined as the commencement or increase of DMT.3. To determine the construct validity of the BILAGindex by testing whether the BILAG index identifiedas "active" patients who, based on other evidence,would be expected to have active SLE.

Patients and methods

All patients fulfilled the 1982 revised ARA criteriafor the classification of SLE,14 and gave verbalconsent to participate in the study.

Between-rater reliability study

This was a multi-centre study conducted in out-patient clinics. All 82 patients with SLE whoattended each of the participating centres on thestudy days were included. Two rheumatologists,experienced in using the BILAG index, assessedeach patient and completed a BILAG questionnaire(Appendix 1). In each centre one rater knew thecase history (and had access to the case notes),whilst the other did not. Whenever possible, theorder of assessment was alternated. The renal andhaematological systems were not scored becausethey use only laboratory results and are notsubject to between-rater measurement error.

Validity study

This was a multicentre prospective study whichincluded 353 patients with SLE attending fourspecialist lupus clinics. A complete history andphysical examination were performed on allpatients to document past organ involvement anddemographic variables. Data were checked, andsupplemented, by case-note review. Patients werereviewed at intervals (dictated by their clinicalstate) over a 12-month period: at least two BILAGassessments were performed on each patientduring the study. Patients admitted to hospitalduring the study were assessed at the beginningof their stay. To avoid potential bias resulting froma small number of patients contributing a largenumber of assessments, no assessments performedless than 1 month apart were included. Treatmentdecisions were recorded at each assessment andblood was taken for estimation of haemoglobin,differential white cell count and platelets, erythro-cyte sedimentation rate (ESR) (Westergren), serumcreatinine and IgG antibodies to ds-DNA (meas-

ured by ELISA).15 Urine was examined for proteinand active sediment. Other tests were performedwhen indicated clinically. Computer data for eachcentre were pooled for analysis.

Analysis

Reliability study

The BILAG scores were calculated in two ways: asif for a 'first/ assessment, and as if for a 'second'or subsequent assessment (Appendix 1). Data fromthe five centres were pooled and tested forbetween-rater agreement using weighted andunweighted K.16 The weights used for calculatingthe weighted K are shown in Table 2. These weightswere chosen to emphasize possible disagreementbetween raters who scored patients as havingactive disease (BILAG 'A and 'B') and those whoscored patients as having inactive disease (BILAC'C, 'D' or 'E'). Possible bias between raters wasinvestigated using the test statistic C for marginalhomogeneity.10'

Criterion validity

The BILAG score was compared with the 'goldstandard' (defined as commencement or increasein disease-modifying therapy, i.e. prednisolone>20mg/daily or immunosuppressants), for eachcentre individually and for the four centres com-bined. The sensitivity, specificity, and positive pre-dictive value for A' in any system were calculated.Pooled data were used to calculate positive pre-dictive values for BILAG A' in each organ-basedsystem.

Construct Validity

Data collected on the Manchester patients wereused to test the BILAG index for construct validity.The following hypotheses were tested: lupuspatients with high ESRs will score more BILAC 'A'sthan patients with normal ESRs; patients with hightitres of antibodies to dsDNA will score more

Table 2 Weights employedweighted kappa statistic

for calculating the

RATER 2

ABCD/E

RATER 1A

1.00.750.250.0

B

0.751.00.50.25

C

0.250.51.00.75

D/E

0.00.250.751.0

by guest on Novem

ber 25, 2014D

ownloaded from

450 E.M. Hay et al.

BILAC 'A's (particularly in the renal system) thanpatients without elevated levels of anti-dsDNA; in-patients will score more A's than out-patients. Toavoid selection bias only one assessment perpatient (arbitrarily chosen as the first BILAC assess-ment performed in 1991) was analysed.

Associations between BILAC A' scores and ESR,anti-dsDNA levels, and in- or out-patient statuswere tested using the x2 test. Mean ESR and anti-dsDNA values, with 95% confidence intervals (CD,were compared in patients scoring A' in anysystem with those scoring a maximum of 'B' or less.

ResultsBetween-rater reliability

The results are shown in Table 3. The high valuesof K indicate very good between-rater agreementfor each of the organ-based systems. There wasno significant bias between observers. Overall, A'was scored eight times by one or other rater. Forseven of these assessments the two raters were inagreement.

Validity studies

Of the 353 patients, 331 (94%) were female, witha median age of 41 years (range 12-80) andmedian disease duration 9.5 years (range 0-58)(Table 4). The cumulative incidence of the 1982revised criteria for the classification of SLE14 metby patients from each centre are shown in Table 5.

Table 3 Between-rater reliability for the BILAG index;pooled data from five centres. BILAC scores have beencalculated as if for the first assessment and as if for asubsequent assessment.

Gen Muc NS Msk Car Vas

1st assessment% of 86% 88% 93% 88% 97% 88%agreementUnweighted K 0.78 0.77 0.78Weighted K 0.81 0.87 0.84Test statistic C4.8 5.6 4.1Subsequent assessment% of 86% 88%agreementUnweighted K 0.78 0.80 0.76Weighted K 0.79 0.80 0.72Test statistic G4.8 4.2 4.1

0.82 0.94 0.790.84 0.97 0.791.4 2.7 1.1

84% 88% 98% 85%

0.82 0.96 0.760.85 0.97 0.761.9 2.7 1.7

Gen = general; Muc = mucocutaneous; NS =neurological; Msk = musculoskeletal;Car = cardtorespiratory; vas = vasculitis andthrombosis

Criterion validity

The results from 1139 BILAC assessments areshown in Table 6; 111 assessments scored A' in atleast one system. The overall sensitivity of theindex was 87%, the specificity 99%, and thepositive predictive value of a BILAC A' in anysystem 80%. The positive predictive values forBILAC A' score for each organ-based system areshown in Table 7.

Construct validity

Fifty-two per cent of patients with an ESR ofgreater than 40 mm/h scored A' in one or moresystems, compared with 10% of those whose ESRwas less than 20 mm/h (#2 33.01; p < 0.001). MeanESR was 43.6 mm/h (SD + 26.0) in those scoring A'compared with 21 mm/h (SD±18.8) in those notscoring A' (difference = 22.6 mm/h; 95% Cl13.2-32.0). Fifty-six per cent of patients with IgGanti-dsDNA antibody concentrations above 30 iu/lscored A' in one or more systems compared with13% whose IgG anti-dsDNA level was below 30iu/l (x2 27.3; p < 0.001). Five of the six patients withan IgC anti-dsDNA antibody level greater than100 u/l scored A': three in the renal system andtwo in the vascular system because of systemicvasculitis. Mean levels of IgC anti-dsDNA antibod-ies were 84.5 iu/l (SD±17.5) in patients scoringBILAC A' in any system, and 7.81 iu/l (SD±18.5)in those not scoring A' (difference = 77.0 iu/l;95% Cl 41.6-112.0).

Nineteen patients were admitted to hospitalduring the study; 18 of their assessments scoredA in one or more systems compared with six ofthe out-patient assessments performed on theanalysis day (x2 83.1; p < 0.001). Two of the in-patients died during their admission. BILAG assess-ments performed immediately after admission onthe remaining 17 patients were compared withthose performed at the patients' first out-patientreview after discharge. Only one of these 17patients scored A' (in the neurological system)at review.

Discussion

These studies have demonstrated very goodbetween-rater reliability for the BILAC index. Therewas very little difference between the values forK, whether it was calculated as unweighted orweighted since, when raters were in disagreementover the BILAC score, it was usually by only byone point. There were only two occasions, bothin the vasculitis system, where one rater scored'B' and the other scored 'D'. Both resulted from

by guest on Novem

ber 25, 2014D

ownloaded from

The 8IMG index 451

Table 4 Demographic variables of 353 lupus patients

Demographicvariables

Number of patientsFemales (%)Age (median, range) yearsDisease duration (median,range), yearsEthnic group:(n, %)

Afro-CaribbeanAsianCaucasianOrientalOther

Manchester

127118 (93%)43 (19-72)9 (0-58)

6 (5%)5 (4%)

113 (88%)1 (0%)2 (1%)

Birmingham

117110 (94%)40 (12-75)10 (1-40)

12 (10%)13(11%)91 (77%)0 (0%)0 (0%)

Bloomsbury

5552 (94%)40 (21-63)9 (2-18)

8 (15%)5 (9%)

35 (64%)3 (5%)4 (7%)

Bath

5451 (94%)31 (17-80)10 (3-32)

0 (0%)0 (0%)

52 (96%)1 (0%)1 (0%)

Total

353331 (94%)41 (12-80)

9.5 (0-58)

24 (7%)24 (7%)

291 (83%)5(1%)7 (2%)

Table 5 Number (and %) of 353 patients with SLE who had met each of the eleven 1982 revised criteria for theclassification of SLE at some time during the course of their illness (cumulative incidence)

Classificationcriteria

Malar rashDiscoid rashPhoto-sensitivityOral ulcersArthritisSerositisRenal diseaseNeurologicalHaematologicalImmunologicalAnti-nuclearantibody

Manchester(n = 127)

39 (30%)11 (8%)51 (40%)11 (8%)

108 (85%)39(31%)20 (16%)41 (32%)91 (71%)77 (61%)

121 (95%)

Birmingham(n = 117)

61 (52%)11 (9%)65 (55%)53 (45%)

113 (97%)71 (61%)32 (27%)33 (28%)84 (72%)75 (64%)

112 (96%)

Bloomsbury(n = 55)

24 (61 %)18 (34%)31 (56%)29 (52%)52 (95%)33 (60%)21 (27%)22 (27%)35 (63%)34 (62%)53 (98%)

Bath(n = 54)

24 (43%)5 (8%)

24 (43%)20 (36%)40 (72%)20 (36%)5 (8%)7 (11%)

35 (63%)33 (60%)51 (93%)

Total(n = 353)

148 (42%)45 (13%)

171 (48%)113 (32%)313 (87%)163 (46%)78 (22%)

101 (35%)245 (69%)219 (62%)337 (95%)

Table 6 Criterion validity of the BILAG index (version 3). BILAC 'A' score is compared with the 'gold standard' criterion(the commencement or increase of disease-modifying therapy)

Centre

ManchesterBirminghamBloomsburyBath

Pooled data

Numberofpatients

127117

5554

353

Numberofassessments

450491113

85

1139

Sensitivity

89%95%64%75%

87%

Specificity

99%97%98%96%

99%

Positivepredictivevalue

89%75%8 1 %60%

80%

one rater failing to record the presence of minorcutaneous vasculitis.

One limitation of this study, conducted in out-patients, was that despite the large number ofpatients included, most had inactive or mildlyactive SLE. This made it difficult to test the scalethroughout its range; only 8 patients scored 'A' in

one or more systems, clinicians agreed about thescore seven times. Between-rater reliability formeasurement scales has been tested in otherstudies, most of which have included small num-bers of patients and observers.1'5'618 These studieshave used two approaches; either a number ofpatients have been assessed in a randomized order

by guest on Novem

ber 25, 2014D

ownloaded from

452

Table 7 Positive

Organ-basedsystem

GeneralMucocutaneousNervous systemMusculoskeletalCardiorespiratoryVasculitisRenalHaematology'A' in > 1 systemTotal

predictive value of a BILAC

Number ofBILAG A'scores

6291021

11111

418

111

E.M. Hay et al.

'A' score for each of the

Number ofA's withincreasedDMT

524

317

11111

21589

organ-based systems

Number of'A's withoutincreasedDMT

157400023

22

Positivepredictivevalue ofBILAC 'A'

83%82%30%8 1 %

100%100%100%

50%83%80%

Data shown for assessments with a single A' score, and for those with 'A' scores in more than one system.DMT, disease-modifying therapy.

by two raters618 or small numbers of patients haveeach been assessed by a number of observersaccording to a Latin or Youden square design.1'5

For a number of reasons, this study design mayresult in artificially high between-rater agreement;for example, a succinct case summary is usuallyprovided, there are no interruptions or distractions,and patients are usually selected for their abilityto give a clear, comprehensive history. The altern-ative approach, which we used, is to test the indexunder conditions as near as possible to real-life; inthe usual setting (routine clinic, ward or researchclinic), by usual observers (specialist or trainee)and on representative samples of patients withthe usual problems of time constraints and inter-ruptions. In our study, one rater at each centreknew the case history and had access to the casenotes (whilst the other did not), and all patientswho attended the clinics on the study days wereassessed. These factors would be expected toincrease between-rate error. Hence the finding ofgood between-rater reliability is particularly reas-suring.

The BILAG index was shown to be a sensitiveand specific measure of disease activity. Therewere high positive predictive values for eachorgan-based system, except the neurologicalsystem. Only three of 10 patients with BILAC 'A'in the neurological system were treated withdisease-modifying therapy. This finding appearedto relate to two problems. First, there was difficultyamongst physicians in deciding whether neurolo-gical features were caused by SLE or were coincid-ental findings. Second, neurological disease wasoften treated with specific therapy, for example,strokes with anticoagulants and epilepsy with anti-

convulsants, rather than with corticosteroids orimmunosuppressants. Our lack of understandingabout the aetiopathogenesis of neurological SLE,and about how to diagnose and treat it, emphasizethe need for further studies in this important area.

Amongst the other assessments scoring 'A', butwhere disease-modifying therapy was not com-menced or increased, were 11 first assessments. Incontrast to follow-up assessments, where onlysame, worse or new features contribute to thescore, for a first assessment, symptoms rated asimproving also contribute. For a first assessment,therefore, active disease which is improving willscore BILAC 'A', although such patients are unlikelyto have their therapy increased, thus explainingthe lack of agreement between BILAC 'A' and the'gold standard' in these cases.

There was some variability in the sensitivity andpositive predictive values obtained by the fivecentres. The results from Manchester andBirmingham were similar, but the sensitivity waslow in Bloomsbury (64%), and the overall positivepredictive value low in Bath (60%). These latterresults are based on small numbers of patients,however, and may not be stable estimates. In thisstudy, the same physician who completed theBILAC assessment also made the decision regard-ing the patient's treatment. If the BILAC scorewere influenced by the treatment decision (cri-terion contamination) this could have resulted inan artificially high level of agreement.11 However,the method for calculating the BILAC score iscomplicated, and not easily predictable from theanswers given to individual questions. None of theassessors entered data directly onto computerand, hence, patients' BILAG scores were not avail-

by guest on Novem

ber 25, 2014D

ownloaded from

The BILAG index 453

able until after the treatment decisions had beenmade. The potential for criterion contamination inthis study is probably less, therefore, than instudies which use the physician's global assess-ment of disease activity as the gold standard. Theglobal assessment, often a 10 cm visual analoguescale with no activity at one end and worstpossible activity at the other, is usually completedat the same time as the disease activity scale, andby the same assessor.1'5 Construct validity of theBILAC index was tested by determining whetherthe index identified patients who, based on otherevidence such as raised ESR or anti-dsDNA levels,would be expected to have active SLE. Somedisagreement between the BILAG score and thesehaematological markers of active SLE wasexpected because, although some groups ofpatients with active SLE do usually have high ESRand elevated antibody titres to dsDNA (eg thosewith renal disease), there are others in whomclinical and laboratory evidence of disease activitydo not correlate. Nevertheless, the proportion ofassessments which scored BILAC A' was signific-antly higher in patients with elevated ESR andanti-dsDNA levels than in those without thesemarkers. The BILAC index also discriminatedbetween in-patients and out-patients, both cross-sectionally and longitudinally.

In summary, there are a number of valid andreliable instruments for measuring clinical diseaseactivity in SLE.1'2'4'6'7 The physician's choice ofinstrument will depend, in part, on its proposeduse. The BILAC index is a comprehensive andflexible instrument designed specifically for use asa transitional index, and was quick and easy touse. It is the only instrument which allocatesseparate scores to different organ-based systems.The final version of this computerized index is nowavailable for use by other centres in clinical studies.

AcknowledgementsThe authors wish to acknowledge Drs Bernstein,Emery and Holt for allowing us to study theirpatients, Professor B Bresnihan for his assistancein the development of the neurological assessmentand the Arthritis and Rheumatism Council of GreatBritain for funding this research.

References1. Cladman DD, Goldsmith CH, Urowitz MB, et al.

Crosscultural validation and reliability of 3 disease activityindices in systemic lupus erythematosus. / Rheumatol1992; 19:608-11.

2. Liang MH, Stern S, Esdaile J. Systemic lupuserythematosus activity. An operational definition. RheumDis Clin N Am 1988; 14:57-66.

3. Liang MH, Socher SA, Roberts WN, Esdaile JM.Measurement of systemic lupus erythematosus activity inclinical research. Arthritis Rheum 1988; 31:817-25.

4. Bombardier C, Cladman DD, Urowitz MB, Caron D,Chang CH, and the Committee on Prognosis studies inSLE. Derivation of SLEDAI: a disease activity index forlupus patients. Arthritis Rheum 1992; 35:630-40.

5. Uang MH, Socher SA, Larson MC, Schur PH. Reliabilityand validity of six systems for the clinical assessment ofdisease activity in systemic lupus erythematosus. ArthritisRheum 1989; 32:1107-18.

6. Symmons DPM, Coppock JS, Bacon PA, et al.Development of a computerised index of clinical diseaseactivity in systemic lupus erythematosus. Q / Med 1988;69327-37.

7. Petri M, Cenovese M, Engle E, Hochberg M. Definition,incidence and clinical description of a flare in systemiclupus erythematosus. Arthritis Rheum 1991; 34537-44

8. Petri M, Hellman D, Hochberg M. Validity and reliability oflupus activity measures in the routine clinic setting.I Rheumatol 1992; 19:53-9.

9. Bencivelli W, Vitali C, Isenberg DA, Smolen JS, Snaith ML,Sciuto M, Bombardieri S, and the European ConsensusStudy Croup for Disease Activity in SLE. Disease activityin systemic lupus erythematosus: report of the ConsensusStudy Croup of the European Workshop forRheumatology Research. 111. Development of acomputerised clinical chart and its application to thecomparison of different indices of disease activity. C/inExp Rheumatol 1992; 10:549-54.

10 Brennan P, Silman A. Statistical methods for assessingvanability in clinical measures. Br Med j 1992; 304:1491-94.

11. Streiner DL, Norman CR. Health measurement scales. Apractical guide to their development and use. OxfordMedical Publications 1989.

12 Hay EM, Symmons DPM for the British Isles LupusAssessment Croup. The BILAC index has good inter-observer agreement when used in a routine out-patientclinic. Br I Rheumatol 1991; Suppl 30:23

13. Fink A, Kosecoff J, Chassin MR, Brook RH. Consensusmethods: characteristics and guidelines for use. AmI Public Health 1984; 74379-83.

14. Tan EM, Cohen AS, Fries JF, e( a/. The 1982 revised criteriafor the classification of systemic lupus erythematosus.Arthritis Rheum 1982; 25:1271-7.

15. Klotz JL, Minami RM, Teplitz RL. An enzyme linkedimmunosorbent assay for antibodies to native anddenatured DNA. / Immunol Methods 1979; 48:155-65.

16. Altman DC. Practical statistics lor medical research.London, Chapman and Hall 1991.

17. Schilling RSF, Hughes JPW, Dingwall-Fordyce I.Disagreement between observers in an epidemiologicalstudy of respiratory disease. Br Med 11955; 1:65-8.

18. Bellamy N, Anastassiades TP, Buchanen WW, et al.Rheumatoid arthritis antirheumatic drug trials. 1. Effectsof standardisation procedures on observer dependentoutcome measures. / Rheumatol 1991; 18:1893-900

by guest on Novem

ber 25, 2014D

ownloaded from

454 E.M. Hay et al.

Appendix 1: BILAG assessment form(version 3)(All events refer to the previous month unless noted otherwise)PatientHospital numberDate of assessmentTreatmentMaximum dose in last month or since last visit

GENERALAnswer: 1) Improving 2) Same 3) Worse 4) New1. Pyrexia (documented) ( )2. Weight loss - unintentional > 5% ( )3. Lymphadenopathy / splenomegaly ( )4. Fatigue / malaise / lethargy ( )5. Anorexia / nausea / vomiting ( )

MUCOCUTANEOUSAnswer: 1) Improving 2) Same 3) Worse 4) New6. Maculopapular rash - severe,active (discoid / ( )

bullous)7. Maculopapular eruption - mild ( )8. Active discoid lesions - generalized, extensive ( )9. Active discoid lesions - local, inc. lupus ( )

profundus10. Alopecia - severe, active ( )11. Alopecia - mild ( )12. Severe panniculitis ( )13 Angio-oedema ( )14. Extensive mucosal ulceration ( )15. Small mucosal ulcers ( )16. Malar erythema ( )17. Subcutaneous nodules ( )18. Perniotic skin lesions ( )19. Peri-ungual erythema ( )20. Swollen fingers Y/N21. Sclerodactyly Y/N22. Calcinosis Y/N23. Telangiectasia Y/N

NEUROLOGICALAnswer- 1) Improving 2) Same 3) Worse 4) New24. Deteriorating level of consciousness ( )25. Acute psychosis or delirium or confusional state ( )26. Seizures ( )27. Stroke or stroke syndrome ( )28. Aseptic meningitis ( )29. Mononeuritis multiplex ( )30. Ascending or transverse myelitis ( )31. Peripheral or cranial neuropathy ( )32. Disc swelling/cytoid bodies ( )33. Chorea ( )34. Cerebellar ataxia ( )35. Headaches - severe unremitting ( )36. Organic depressive illness ( )37. Organic brain syndrome inc. pseudotumour ( )

cerebri38. Episodic migrainous headaches ( )

MUSCULOSKELETALAnswer 1) Improving 2) Same 3) Worse 4) New39. Definite myositis (Bohan and Peter) ( )40. Severe polyarthritis - with loss of function ' )41. Arthritis ( )42. Tendonitis ' I43. Mild chronic myositis ( )44 Arthralgia ( )

45. Myalgia ( )46. Tendon contractures and fixed deformity ( )47. Aseptic necrosis ( )

CARDIOVASCULAR AND RESPIRATORYAnswer: 1) Improving 2) Same 3) Worse 4) New48. Pleuropericardial pain ( )49. Dyspnoea ( )50. Cardiac failure ( )51. Friction rub ( )52. Effusion (pericardial or pleural) ( )53. Mild or intermittent chest pain ( )54. Progressive CXR changes - lungs Y/N55 Progressive CXR changes - heart Y/N56. ECC evidence of pericarditis or myocarditis Y/N57. Cardiac arrhythmias including tachycardia ( )

> 100 in absence of fever58 Pulmonary function fall by > 20% Y/N59. Cyto-histological evidence of inflammatory lung ( )disease

vAscuunsAnswer 1) Improving 2) Same 3) Worse 4) New60. Major cutaneous vasculitis including ulcers ( )61. Major abdominal cnsis due to vasculitis ( )62. Recurrent thromboembolism (excluding stroke) ( )63. Raynaud's ( )64. Livedo reticularis ( )65. Superficial phlebitis ( )66. Minor cutaneous vasculitis (nailfold, digital, ( )

purpura, ulcers)67. Thromboembolism (excluding stroke) 1st episode ( )

RENALAnswer with number (value) or Y/N68. Systolic BP mmHg ( )69. Diastolic BP (5th phase) ( )70. Accelerated hypertension Y/N71 Dipstick ( - = 1 , + + = 2 , + + + = 3 ) ( )72. 24 h urine protein (g) ( )73. Newly documented proteinuria of > 1 g/24 h ( )74 Nephrotic syndrome Y/N75 Creatinine (plasma/serum) ( )76. Creatinine clearance/CFR (ml/min) ( )77. Active urinary sediment Y/N78. Histological evidence of active nephritis (within ( )

3 months)

HAEMATOLOGYAnswer with number (value) or Y/N79. Haemoglobin (g/dl) ( )80. Total white cell count * 109/l ( )81. Neutrophils * 109/l ( )82. Lymphocytes x 109/l ( )83. Platelets x 109/! ( )84. Evidence of active haemolysis ( )85. Coombs test positive ( )86. Evidence of circulating anticoagulant ( )

SCORE

Gen Muc NS Msk Car Vas Ren Hae

Scoring system for the BILAG index (version 3)

It is implicit in this scoring system that all features scored arethought to be due to active lupus. The questionnaire askswhether features are improving, the same, worse or new. If anew feature has developed in the last month (or since the

by guest on Novem

ber 25, 2014D

ownloaded from

The BILAG index 455

last assessment if less than a month ago) it should be scoredas new (i.e. 4), even if it has subsequently improved orresolved. For the first assessment any response will registerthe feature as a criterion. For subsequent assessments, fea-tures will only contribute to the score if they are the same,worse or new. These different grades have been used so thatBILAG can be used to identify all patients who have developeda particular feature for the first time and also to documentthe response of particular features to treatment. In the renaland haematological assessments (which include laboratorytests) they must confirm that abnormal results are due toactive lupus (rather than drug side-effects for example).

When assessing a patient for the first time there may beno data to enter for a particular system. In this circumstancethe patient should be assigned to either category D or E forthat particular system. 'D' should be entered if the patienthas ever had any involvement of that system and 'E' if therehas never been previous involvement. Once a patient hasscored an A, B, C or D in a particular system they will alwaysscore at least a D in the future. The score E implies noinvolvement of the system ever.

1. General non-specific manifestations

1 Pyrexia2. Weight loss—unintentional weight loss > 5% in 1 month3. Lymphadenopathy4. Fatigue/malaise/weakness5. Anorexia/nausea/vomiting

Category A

Pyrexia plus 2 other

Category B

Pyrexia or 2 other

Category C

Any other criterion

Category DPrevious involvement

Category E

No involvement

2. Mucocutaneous

Category A

Any one of:1. Severe maculopapular, discoid or bullous eruption, i.e.

active facial and/or extensive (> 2/9 body surface), scar-ring or causing disability.

2. Angio-oedema3. Extensive mucosal ulceration

Category B

Any one of:1. Malar erythema2. Mild maculopapular eruption3. Panniculitis4. Localised active discoid lesions including lupus profundus5 Severe active alopecia6. Subcutaneous nodules7. Perniotic skin lesions

Category C

Any one of:1. Peri-ungual erythema

2. Swollen fingers3. Sclerodactyly4. Calcinosis5. Telangiectasia6. Mild alopecia7. Small mucosal ulceration

Category D

Previous involvement

Category £

No previous involvement

3. Nervous system (first assessment)

Category AAny one of:

1. Impaired level of consciousness2. Psychosis or delirium or confusional state3. Grand mal seizure4. Stroke or stroke syndrome5. Aseptic meningitis6. Mononeuntis multiplex7. Ascending or transverse myelitis8. Peripheral or cranial neuropathy9. Chorea

10. Cerebellar ataxia

Category 8Any one of:

1. Headache (severe unremitting)2. Organic depressive illness3. Chronic brain syndrome including pseudotumour cerebri4 Disc swelling or cytoid bodies

Category C

Episodic migramous headaches

Category D

Previous involvement

Category E

No previous involvement

NS disease (subsequent assessments)

Category A

Any one of the following scored "worse" or "new":1. Impaired level of consciousness2. Psychosis or delirium or confusional state3. Grand mal seizure4. Stroke or stroke syndrome5. Aseptic meningitis6. Mononeuritis multiplex7. Ascending or transverse myelitis8. Peripheral or cranial neuropathy9. Chorea

10 Cerebellar ataxia

Category B

Any one of the following scored "new" or "worse":1. Headache (severe unremitting)2. Organic depressive illness3. Chronic brain syndrome including pseudotumor cerebri4. Disc swelling or cytoid bodies

by guest on Novem

ber 25, 2014D

ownloaded from

456 E.M. Hay et al.

Or any one of the following scored "same" or 'Improving"5. Impaired level of consciousness6 Psychosis, delirium or confusional state7. Grand mal seizure

Category C1. Episodic migrainous headaches, or "A" 4-10 or "B" 1-4

scored "same" or "improving"

Category D

Previous involvement

Category E

No previous involvement

4. Muscoloskeletal

Category A

One or more of1. Myositis2. Severe polyarthritis with loss of function (not responsive

to steroids < 20mg/day, antimalanals, NSAIDS)

Category B

One or more of1. Arthritis (definite synovitis)2. Tendonitis

Category C

1. Arthralgia2. Myalgia3. Tendon contractures and fixed deformity4. Aseptic necrosis5. Mild chronic myositis

Category DPrevious involvement

Category £

No previous involvement

5. Cardiovascular/ respiratory

Category A

Cardiac failure or symptomaticeffusion plus two other criteriaor four from:

1. Pleuropericardial pain2. Dyspnoea3. Friction rub4. Progressive CXR changes—lung fields5. Progressive CXR changes—heart size6. ECC evidence of pericarditis or myocarditis7. Cardiac arrhythmias including tachycardia > 100 in

absence of fever8. Deteriorating lung function: <20% of expected or

> 20% fall9. Cytohistological evidence of inflammatory lung disease

Category B

Any two criteria listed under A

Category C

Mild intermittent chest pain or one other criterion

Category D

Previous involvement

Category E

No previous involvement

6. Vasculitis

Category A

Any one of the following1. Major cutaneous vasculitis (including ulcers) accompan-

ied by infarction occurring in previous month2. Major abdominal crisis due to vasculitis3. Recurrent thromboembolism (excluding strokes)

Category B

Any one of the following1 Minor cutaneous vasculitis (nailfold vasculitis, digital

vasculitis, purpura, urticaria)2. Superficial phlebitis3. Thromboembolism (excluding strokes)—first episode

Category C

Any one of the following1. Raynaud's phenomenon2. Livedo reticularis

Category D

Previous involvement

Category E

No involvement

7. Renal (first assessment)

Category A

Two or more of the following provided that 1, 4 or 5is included'

1. Protanuria, defined as > 1 g/24 h or 3 + or 4 + dipstick2. Accelerated hypertension3. Creatinine clearance < 50 ml/min4. Active urinary sediment (on an uncentrifuged specimen)-

pyuna ( > 5 wc/hpf); haematuria (> 5 rbc/hpf) or red cellcasts in the absence of infection

5. Histological evidence of active nephritis within the last3 months (or since the previous assessment if seen lessthan 3 months ago)

Category B

One of the following1. One of the category A criteria2. Urinary dipstick 2 + or more3. 24h urinary protein > 0.5 g but < 1 g

Category C

One of the followingUrinary dipstick +1.

2.3.

Blood pressure > 140/90 (5th phase)Creatinine > 130 mmol/l

Category D

Previous renal involvement

Category E

No previous renal involvement

by guest on Novem

ber 25, 2014D

ownloaded from

The 6/LAC index 457

Renal (subsequent assessments)

Category ATwo or more of the following providing 1, 4 or 5 is included:

1. Proteinuria, defined as a) urinary dipstick increased by 2or more levels; or b) 24 hour urinary protein rising from>0.20g to > 1 g; or d 24 hour urinary protein risingfrom > 1 g by 100%; or d) newly documented pro-teinuria of > 1 g

2. Accelerated hypertension3. Deteriorating renal function, defined as a) plasma creatin-

ine >130 |iM/l and having risen to >130% of previousvalue; or b) creatinine clearance having fallen to <67%of previous value; or c) creatinine clearance < 50 ml/min,and last time was > 50 ml/min or was not measured

4. Active urinary sediment (as defined above)5. Histological evidence of active nephritis (as defined

above)

Category B

One of the following.1. One of the category A criteria2. (a) urinary dipstick of 2 + or more or (b) 24 h urinary

protein rising from > 1 g by > 50% but < 100%3 Plasma creatinine >130nM/l or having risen to 115%

of previous value

Category C

One of the following1. 24h urinary protein >0.25 g2. Urinary dipstick 1 + or more3. Rising blood pressure, defined as (a) systolic rise of

>30 mm or (b) diastolic rise of >15 mm (providing therecorded values are > 140/90)

Category DPrevious renal disease

Category E

No previous renal disease

8. Haematological

Category A

One of the following-1. wcc <10002. platelet count < 253. haemoglobin < 8

Category B

One of the following:1. wcc <25002. platelet count < 1003. haemoglobin <114. evidence of active haemolysis (raised bilirubin + / —

reticulocytes and positive Coomb's test)

Category COne of the following:1. wcc <40002. lymphocyte count < 15003. platelet count <1504. Coomb's test positive but no evidence of active hae-

molysis5. Evidence of circulating lupus anti-coagulant detected by

functional assays

Category D

Previous involvement

Category E

No previous involvement

Appendix 2: Glossary

General

1. Pyrexia: temperature > 37.5°C documented.3. Lymphadenopathy: palpable LNs more than 1cm in

diameter

Mucocutaneous

6. Maculopapular eruption—severe: active discoid, bullousor maculopapular eruption; severe, facial and/or extens-ive (> 2/9 of body surface area), scarring or causing dis-ability.

9. Lupus profundus: erythematous elevated plaques withan overlying discoid skin lesion.

10. Alopecia—severe, active: abnormal diffuse hair losswhich is clinically detectable with scalp inflammation.

12. Panniculitis: extensive, painful, erythematous subcutane-ous nodules associated with fat necrosis which resolvewith scarring.

13. Angio-oedema: with stridor, or affecting tongue or lips14 Extensive mucosal ulceration- severe, deep, disabling

ulcers.

Neurological

25. Acute psychosis or delirium or confusional state severedisturbance in the perception of reality characterisedby delusions, hallucinations, incoherence, marked illo-gical thinking, bizarre or catatonic behaviour.

35. Headache severe, unremitting: continuous headache notrelieved by non-narcotic analgesia

36 Organic depressive illness: associated with somaticsymptoms and severe enough to merit treatment withanti-depressive medication.

37 Organic brain syndrome: impaired orientation, memoryor other intellectual function in the absence of metabolic,psychiatric or pharmacological causes. Clinical featuresdevelop over a short period (usually hours to days) andtend to fluctuate over the course of the day:(a) clouding of consciousness with reduced capacity to

focus and sustain attention to environment;(b) i. perceptual disturbance: misinterpretations, illusions

or hallucinationsii. incoherent speechlii. insomnia or daytime drowsinessiv. increased or decreased psychomotor activity;

(c) disorientation and recent memory impairment.

Musculoskeletal

39. Myositis: at least three of proximal muscle weakness,elevated muscle enzymes, positive muscle biopsy andabnormal EMC

40. Polyarthritis with loss of function: active joint inflamma-tion with clinically significant loss of the functional rangeof movement of the involved joints.

41. Arthritis: active joint inflammation.

by guest on Novem

ber 25, 2014D

ownloaded from

458 E.M. Hay et al.

Cardiovascular and respiratory

4fl. Pleuropericardial pain: localised sharp or dull pain in thechest aggravated by respiration.

49. Dyspnoea: on exercise (not orthopnoea alone).53. Mild intermittent chest pain nonspecific (not clearly

pleuritic, pencardial, musculoskeletal or angina).58. Pulmonary function fall by >20%: <20% of expected

(predicted for height, weight, sex and age) or > 20% fallin total lung capacity (forced vital capacity) and/orDLCO.

Vasculitis

60. Major cutaneous vasculitis including ulcers: extensivegangrene and/or ulceration.

66. Minor cutaneous vasculitis. eg. digital vasculitis withnailfold infarcts.

Renal

70. Accelerated hypertension: BP nsing to > 170/110 (5thphase) within one month, if accompanied by Grade IVretinal changes de haemorrhages, exudates.)

77. Active urinary sediment: on uncentrifuged specimen.Pyuria (>5wc/hpf), haematuna (>5rbc/hpf) or red cellcasts in the absence of infection.

78. Histological evidence of active nephritis, according toWHO criteria. Sclerosis alone (without inflammation) willnot be regarded as evidence of active nephritis.

by guest on Novem

ber 25, 2014D

ownloaded from