The endoscopic assessment of esophagitis: A progress report on observer agreement

8
GASTROENTEROLOGY 1996;111:85–92 The Endoscopic Assessment of Esophagitis: A Progress Report on Observer Agreement DAVID ARMSTRONG,* JOHN R. BENNETT, ANDRE ´ L. BLUM, § JOHN DENT, x F. TIMOTHY DE DOMBAL, Ø JEAN–PAUL GALMICHE, # LARS LUNDELL,** MARIANA MARGULIES, Ø JOEL E. RICHTER, ‡‡ STUART J. SPECHLER, §§ GUIDO N. J. TYTGAT, xx and LENE WALLIN ØØ *Department of Medicine, Division of Gastroenterology, McMaster University Medical Center, Hamilton, Ontario, Canada; Remenham House, West Ella, Hull, England; § Division de Gastro-ente ´rologie, Centre Hospitalier, Universitaire Vaudois, Lausanne, Switzerland; x Gastroenterology Unit, Royal Adelaide Hospital, Adelaide, Australia; Ø Clinical Information Science Unit, Leeds, England; # Centre Hospitalier Regional et Universitaire de Nantes, Ho ˆpital G & R Lae ¨nnee, Nantes, France; **Department of Surgery, University of Gothenburg, Sahlgren’s Hospital, Gothenburg, Sweden; ‡‡ Division of Gastroenterology, Department of Medicine, Cleveland Clinic Foundation, Cleveland, Ohio; §§ Department of Medicine, Beth Israel Hospital, Boston, Massachusetts; xx Academisch Medisch Centrum, Darm-Enleverzeiken, Amsterdam, The Netherlands; and ØØ Department of Surgical Gastroenterology, KAS, Glostrup, Denmark Background & Aims: The study and management of endoscopic appearance, but these lesions can occur tran- reflux esophagitis require an endoscopic classification siently or even be absent in severely symptomatic pa- system founded on esophageal lesions that can be re- tients. 1–5 Therefore, the endoscopic appearance has a high producibly identified. The aim of this study was to inves- specificity, but lack of sensitivity, for GERD. Despite tigate interobserver agreement for the identification of these limitations, it is quite clear that endoscopy remains endoscopic lesions typical of reflux esophagitis. Meth- the investigation of choice for making the diagnosis of ods: Paired comparisons of observers’ descriptions reflux esophagitis and grading its severity. were obtained. Seventeen endoscopists assessed 100 The spectrum of endoscopic findings in reflux disease still images, and 42 endoscopists, including 13 endos- ranges from minimal, equivocal, mucosal changes copists in training, assessed 23 endoscopic video re- cordings. In a third, ancillary study, using a simpler through erosions to deep ulcerations, strictures, and co- evaluation sheet, 219 gastroenterologists recorded lumnar-lined esophagus. Data indicate that the clinical their assessments of 20 still images. Results: The response to treatment and the subsequent prognosis are agreement between endoscopists was similar for still dependent on the severity of the mucosal lesions observed images and video recordings. Agreement between ex- endoscopically. 6–12 Therefore, it is important to accu- perienced endoscopists was acceptable to good for rately classify the severity with which the peptic lesions recognition of minimal changes (erythema, friability, affect the esophagus. With respect to esophagitis, the mucosal edema; k Å 0.46 to k Å 0.8), mucosal breaks present situation is not that we lack endoscopic classifi- (discretely, demarcated areas of slough or erythema; cation systems for severity but that a large number of k Å 0.84), and complications (ulceration, k Å 0.92; such classifications exists. 13–21 A survey of existing work stricturing, k Å 0.80; columnar metaplasia, k Å 0.81), indicated that there are more than 30 such systems docu- although there was poor agreement when the circum- mented in the literature, none of which is universally ferential extent and number of mucosal breaks were assessed. However, total circumferential extent of the accepted, each with individual proponents, many with mucosal break had a k value of 0.59. Agreement be- overlapping criteria, and some mutually incompatible. tween inexperienced endoscopists was poor for recog- This chaotic area is not likely to be helped by adapting nition of minimal changes but was good for recognition an existing system or by developing another new system of complications (k, 0.70–0.90). Conclusions: Endos- with the expectation that previous systems will be aban- copists can identify mucosal breaks confined to a mu- doned. Well-founded criticism has been directed against cosal fold and lesions that extend throughout the many existing classification systems, and difficulties are esophageal circumference. Complications of reflux dis- recognized in recommending any of them uncondition- ease can be reproducibly recorded. Criteria for as- ally. One important reason for this is that virtually all sessing the number of mucosal breaks and their radial extent must be defined more clearly, as must the fea- tures of minimal change esophagitis. Deceased. Abbreviation used in this paper: GERD, gastroesophageal reflux G disease. astroesophageal reflux disease (GERD) is difficult to define because of its extreme heterogeneity. Peptic 1996 by the American Gastroenterological Association 0016-5085/96/$3.00 lesions of the squamous epithelium have a characteristic

Transcript of The endoscopic assessment of esophagitis: A progress report on observer agreement

GASTROENTEROLOGY 1996;111:85–92

The Endoscopic Assessment of Esophagitis: A ProgressReport on Observer Agreement

DAVID ARMSTRONG,* JOHN R. BENNETT,‡ ANDRE L. BLUM,§ JOHN DENT,x

F. TIMOTHY DE DOMBAL†,Ø JEAN–PAUL GALMICHE,# LARS LUNDELL,** MARIANA MARGULIES,Ø

JOEL E. RICHTER,‡‡ STUART J. SPECHLER,§§ GUIDO N. J. TYTGAT,xx and LENE WALLINØØ

*Department of Medicine, Division of Gastroenterology, McMaster University Medical Center, Hamilton, Ontario, Canada; ‡RemenhamHouse, West Ella, Hull, England; §Division de Gastro-enterologie, Centre Hospitalier, Universitaire Vaudois, Lausanne, Switzerland;xGastroenterology Unit, Royal Adelaide Hospital, Adelaide, Australia; ØClinical Information Science Unit, Leeds, England; #Centre HospitalierRegional et Universitaire de Nantes, Hopital G & R Laennee, Nantes, France; **Department of Surgery, University of Gothenburg, Sahlgren’sHospital, Gothenburg, Sweden; ‡‡Division of Gastroenterology, Department of Medicine, Cleveland Clinic Foundation, Cleveland, Ohio;§§Department of Medicine, Beth Israel Hospital, Boston, Massachusetts; xxAcademisch Medisch Centrum, Darm-Enleverzeiken, Amsterdam,The Netherlands; and ØØDepartment of Surgical Gastroenterology, KAS, Glostrup, Denmark

Background & Aims: The study and management of endoscopic appearance, but these lesions can occur tran-reflux esophagitis require an endoscopic classification siently or even be absent in severely symptomatic pa-system founded on esophageal lesions that can be re- tients.1–5 Therefore, the endoscopic appearance has a highproducibly identified. The aim of this study was to inves- specificity, but lack of sensitivity, for GERD. Despitetigate interobserver agreement for the identification of these limitations, it is quite clear that endoscopy remainsendoscopic lesions typical of reflux esophagitis. Meth- the investigation of choice for making the diagnosis ofods: Paired comparisons of observers’ descriptions reflux esophagitis and grading its severity.were obtained. Seventeen endoscopists assessed 100

The spectrum of endoscopic findings in reflux diseasestill images, and 42 endoscopists, including 13 endos-ranges from minimal, equivocal, mucosal changescopists in training, assessed 23 endoscopic video re-

cordings. In a third, ancillary study, using a simpler through erosions to deep ulcerations, strictures, and co-evaluation sheet, 219 gastroenterologists recorded lumnar-lined esophagus. Data indicate that the clinicaltheir assessments of 20 still images. Results: The response to treatment and the subsequent prognosis areagreement between endoscopists was similar for still dependent on the severity of the mucosal lesions observedimages and video recordings. Agreement between ex- endoscopically.6–12 Therefore, it is important to accu-perienced endoscopists was acceptable to good for rately classify the severity with which the peptic lesionsrecognition of minimal changes (erythema, friability, affect the esophagus. With respect to esophagitis, themucosal edema; k Å 0.46 to k Å 0.8), mucosal breaks

present situation is not that we lack endoscopic classifi-(discretely, demarcated areas of slough or erythema;cation systems for severity but that a large number ofk Å 0.84), and complications (ulceration, k Å 0.92;such classifications exists.13–21 A survey of existing workstricturing, k Å 0.80; columnar metaplasia, k Å 0.81),indicated that there are more than 30 such systems docu-although there was poor agreement when the circum-mented in the literature, none of which is universallyferential extent and number of mucosal breaks were

assessed. However, total circumferential extent of the accepted, each with individual proponents, many withmucosal break had a k value of 0.59. Agreement be- overlapping criteria, and some mutually incompatible.tween inexperienced endoscopists was poor for recog- This chaotic area is not likely to be helped by adaptingnition of minimal changes but was good for recognition an existing system or by developing another new systemof complications (k, 0.70–0.90). Conclusions: Endos- with the expectation that previous systems will be aban-copists can identify mucosal breaks confined to a mu- doned. Well-founded criticism has been directed againstcosal fold and lesions that extend throughout the

many existing classification systems, and difficulties areesophageal circumference. Complications of reflux dis-recognized in recommending any of them uncondition-ease can be reproducibly recorded. Criteria for as-ally. One important reason for this is that virtually allsessing the number of mucosal breaks and their radial

extent must be defined more clearly, as must the fea-tures of minimal change esophagitis. †Deceased.

Abbreviation used in this paper: GERD, gastroesophageal refluxG disease.astroesophageal reflux disease (GERD) is difficult todefine because of its extreme heterogeneity. Peptic � 1996 by the American Gastroenterological Association

0016-5085/96/$3.00lesions of the squamous epithelium have a characteristic

86 ARMSTRONG ET AL. GASTROENTEROLOGY Vol. 111, No. 1

at the World Congress of Gastroenterology in Los Angeles inexisting systems are based on visual observations that1994. Mucosal breaks were graded as A, B, C, and D accordinghave not yet been assessed for reproducibility. To beto the extent (Table 1). After an initial presentation of thewidely used and accepted as a standard in clinical practicerationale for and content of the proposed endoscopic classifica-and also as a scientific tool, a system must fulfil a numbertion, 219 delegates recorded their impression of 20 still im-of basic requirements. One of these is that the proposedages.criteria must be submitted to a validation program before

they can be generally accepted for wide-spread clinical Analysis of Datause. Our international working group has been engaged A total of 103,000 data points were recorded from thein problems related to the definition and evaluation of still image and video image studies. In terms of comparisonan endoscopic classification system for esophagitis, with between observers, a total of 1994 pairs of observers werethe ultimate aim of its being universally accepted and potentially available from the two studies alone. However,found clinically useful. The initial aim was therefore to because comparisons were made within groups (e.g., each ex-identify those endoscopic esophageal mucosal lesions that pert was compared with each other expert but not with each

endoscopist under training), the total number of two-way com-can be reproducibly described. We investigated the inter-parisons made was 872 between pairs of observers.observer variation in the identification and description

The analyses of the large volume generated by these studiesof various lesions thought to characterize uncomplicatedwere performed in Leeds, England, using the facilities of theesophagitis, as well as complications of the disease, byClinical Information Science Unit. The 103,000 data pointsan analysis of still images and video images.were analyzed using a 386 desk-top computer via the dBASEIV program. Statistical analyses were performed based on theMaterials and Methodsk statistics of Cohen.22 For those unfamiliar with the possible

Three interlinked, but separate, studies of observer range of this statistic, Figure 2 shows the interval withinagreement in the visual appearances of GERD at endoscopy which this k statistic usually falls, along with the customarywere performed. interpretation of the k findings. For each of the appearances

in Figure 1, the k values between each pair of observers wereStill Image Analysescalculated and the results were expressed in Figures 3–6 as

Seventeen observers from three countries assessed the mean kappa values and an interquartile range.reproducibility in recordings of specific endoscopic appearances Data from the (mucosal break) study at the Los Angelesfrom a series of color slides. Each participant was given a set World Congress were analyzed using different (simpler) meth-of 100 slides from those in the possession of one of the authors ods. Similar (k) analyses would have resulted in about 48,000(G.N.J.T.), showing visual endoscopic features of GERD rang- two-way comparisons between widely disparate observers. Theing from minimal change to severe morphological damage. results are therefore presented as the proportion of still imagesData from each patient were recorded on a predesigned work in which 90% of those delegates observing the image agreedsheet (Figure 1). that (according to the criteria in Table 1) the appearance was

or was not present. For details of statistical analyses and furtherVideo Image Analyses statistical comment, see the Appendix.The video image study involved 42 observers from Resultsseven countries. The objective was to assess whether the differ-

ences shown in the first study were influenced by the artificial- This section refers to the still image and videoity of still images used in the slide presentations. Each partici- image studies. A subsequent added note concerns thepant was given video tapes, recorded using Olympus video study conducted (under less controlled conditions) as theendoscopes and processed on Super VHS or NTS Video Re- Los Angeles World Congress.cording Systems, showing endoscopic appearances from 23 dif-ferent patients with GERD of varying severity. The recording Severe Complicationsfrom each patient lasted approximately 30 seconds, and the

Severe complications of reflux disease (ulcer, stric-same work sheet was used as in the previous study (Figure 1).

ture, or columnar-lined mucosa)23,24 were noted by the ma-The experiences of the observers varied from endoscopists un-jority of observers in approximately one quarter of the obser-der training (who had performed less than 500 upper gastroin-vations made: ulcer craters were claimed to be present intestinal endoscopies) to experts (who had performed more than10% of the responses concerning video images and 15%3000 upper gastrointestinal endoscopies). Of the 42 partici-of the responses concerning still images. These findings ofpants, 29 were experts.the studies involving video images were broadly similar.

Mucosal Break Analysis These changes were reliably reported (Figure 3). Amongexperienced endoscopists reviewing still images, k valuesA proposal for a new endoscopic classification, with

emphasis on mucosal breaks, was presented to a symposium were high for ulcer (0.92), stricture (0.8), or columnar-lined

ESOPHAGITIS 87July 1996

Figure 1. Sample of work sheet used in still image and video image analyses.

esophagus, with comparable k values (0.81, 0.83, and 0.84, Mucosal Breaksrespectively) for experienced endoscopists’ observations from

The presence or absence of a mucosal break on athe series of video images. Observations made by endosco-particular still or video image (Figure 4) was reliablypists under training were slightly less homogeneous but stillrecorded by all grades of endoscopists (mean k values,well within limits of acceptability, given the conventional0.81–0.84). In terms of the extent of mucosal breaks,interpretation of k statistics (ulcer, 0.9; stricture, 0.7; and

columnar-lined esophagus, 0.7). the data from these studies indicated less agreement.

88 ARMSTRONG ET AL. GASTROENTEROLOGY Vol. 111, No. 1

Table 1. The Los Angeles Classification of Esophagitis

Grade A One or more mucosal breaks confined to the mucosalfolds, each no longer than 5 mm

Grade B At least one mucosal break more than 5 mm longconfined to the mucosal folds but not continuousbetween the tops of two mucosal folds

Grade C At least one mucosal break continuous between thetops of two or more mucosal folds but notcircumferential

Grade D Circumferential mucosal break

Figure 3. k values for comparisons made among experienced endos-copists in recognizing severe esophagitis changes (ulcer, stricture,and columnar-lined mucosa). In Figures 3–6, horizontal bars repre-sent mean of all two-way interobserver comparisons and columnsComparisons between experienced endoscopists evalu-represent interquantile range. �, Video; �, slides.

ating video images resulted in k values of about 0.4 formost aspects concerning the length of break and thepresence or absence of yellow exudate (see B1 to B4 in the reproducibility of observations made by experiencedFigure 1). Similar values emerged from observations of endoscopists and endoscopists under training. Virtuallystill images. One value was unusual: feature B2 (small all minimal distal and mucosal junction changes (Figuremucosal breaks with yellow exudate over all or part of the 1) were reproducibly recorded by experienced endosco-lesion) was recorded with excellent agreement between pists, with k values around 0.8 for increased vascularity,observers. However, it should be noted that this lesion local erythema, and friability. By contrast, correspondingwas infrequently seen (only noted in 3%). values for endoscopists under training were significantly

In terms of circumference, two categories were reliably lower, with the highest k value of 0.39 for increase inidentified. These were categories D1 (total involvement vascularity and with values between 0.19 and 0.22 forwas equal to or less than onefold) and F1 (circumferential blurring of the mucosal junction, friability, and edema.involvement), with k values of 0.84 and 0.59, respec-

Los Angeles Studytively. Distinction between intermediate categories ofbreak proved difficult (k, õ0.4). A study was performed at the symposium of the

Finally, attempts to calculate the number of mucosal World Congress of Gastroenterology in Los Angeles ex-breaks (C1 to C4 in Figure 1) resulted in completely amining the agreement between 219 observers gradingunreliable figures, ranging (in the same individual pa- 20 exposed slide images. Small mucosal breaks (õ5 mmtient) from 1 to 15 on several occasions. These data were in extent) graded as A were reliably recorded, with a kso obviously variable that they were not subjected to value of 0.65. There was also a good agreement aboutformal analysis. the presence or absence of circumferential breaks (grade

D) attaining a k of 0.55. In contrast, there was lessMinimal Changes agreement concerning intermediate grading of breaks

(grade B and C, 0.30 and 0.10, respectively). ComparingA different pattern emerged from analyses of min-these findings and the k values from the more formalimal changes either in the distal esophagus or at thestudies (Figures 3–6) showed astonishingly similar out-squamocolumnar junction (section A in Figure 1) (Fig-comes.ures 5 and 6). Distinct differences were shown between

Figure 2. Explanation of k statistics of Cohen22 showing possiblerates of values and conventional interpretation of results. Po is the Figure 4. Agreement concerning mucosal breaks (MB) assessed on

video images by experienced endoscopists (�) and endoscopists un-observed proportion of agreement and Pc is the expected (change)agreement in the relevant contingency table. der training (�).

ESOPHAGITIS 89July 1996

of extent was designed to be as simple as possible butwith adequate discrimination between the degrees of se-verity of esophagitis for clinical and research purposes,regardless of esophageal length. It is obviously importantto determine the extent of mucosal lesions independentof other measures of severity because the extent of esopha-gitis may be the single most important measure to defineseverity and the need for subsequent therapy.6,7,12,14,25–28

Figure 5. Analysis concerning minimal changes at mucosal junction It could be argued that the smallest mucosal break thaton video images evaluated by experienced endoscopists (�) and en- can be identified is a relatively minor abnormality, notdoscopists under training (�).

carrying any significant risk for the patient. A similarlesion extending to more than 5 mm might be associatedwith a greater amount of acid exposure of the squamousepithelium. Similarly, the radial extent of mucosal breaks

Discussion might also be related to severity of reflux.29–31

This study, using both still and video images to assessA useful endoscopic classification of reflux diseasethe interobserver variability with regard to disease-spe-must fulfil a number of criteria. It should identify lesionscific lesions affecting the esophageal mucosa, is uniquespecific for GERD with high accuracy and minimal inter-primarily because of the magnitude of the study andobserver variability. The identification of lesions shouldthe huge number of comparisons made. The agreementnot be dependent on subtle nuances of fiberoptic endo-between observers was acceptable when determining thescopic technology, and the classification should minimizepresence or absence of a mucosal break confined to athe problems of misinterpreting the extent of the lesion(s)mucosal fold. In addition, extensive lesions, involvingeither longitudinally or horizontally. Furthermore, itthe entire circumference of the esophagus, could be deter-should describe each lesion exactly, irrespective of themined with high accuracy. Furthermore, complicationscoexistence of other lesions. An important feature is toof the disease, such as the presence of strictures, ulcers,allow a practical collection of data and description of alland Barrett’s columnar-lined mucosa, were assessed withconcomitantly existing lesions, and each grade of esopha-extremely good agreement between observers, essentiallygitis should be distinct enough to avoid overlap. It isirrespective of whether they were experienced endosco-also important that the endoscopic classification can bepists or under training. Other studies have reportedeasily memorized and provide a standardized, generallymarkedly lower k values for the overall endoscopic grad-accepted, and comprehensive description of the diseaseing of esophagitis.15,32,33 Differences in design and sizespecific features while differentiating clearly among min-of the studies may explain much of the apparent differ-imal changes, mucosal breaks, and complications. Theences in outcome. In general terms, by using video re-system should also distinguish reversible from irrevers-cordings, it is possible that an underestimation of theible lesions.true k values will occur. Video recordings are differentThe term mucosal break was introduced with the hopefrom standard endoscopic procedures, especially becauseof avoiding confusion in the use of the terms erosion andthe observers do not have the benefit of moving theulceration. These latter terms are primarily histologicalendoscope through the organ.15 From our results, it canrather than visual and imply a greater degree of certaintybe concluded that single mucosal breaks confined to theabout the judged depths of a mucosal lesion than may be

possible through an endoscope. Furthermore, the termserosion and ulceration seem to be used in different waysby different observers.6,13–21 Yet, for any assessment ofextent, a clear and pragmatic working definition of theendoscopic appearance of the mucosal break is manda-tory. In the working group, some agreement was achievedon the definition of a mucosal break: an area of sloughor an area of erythema with a discrete lined demarcationfrom the adjacent or normal looking mucosa. We decidedthat the peaks of mucosal folds should be reference

Figure 6. Same analysis as in Figure 3 concerning minimal changes inpoints, and we considered that these could be identified distal esophagus for comparisons between experienced endoscopists

(�) and endoscopists under training (�).during partial air inflation of the esophagus. The scoring

90 ARMSTRONG ET AL. GASTROENTEROLOGY Vol. 111, No. 1

Figure 7. (A) Mucosal breaks confined to the mucosal fold, each no longer than 5 mm. (B) At least one mucosal break longer than 5 mmconfined to the mucosal fold but not continuous between two folds. (C) Mucosa breaks that are continuous between the tops of mucosal foldsbut not circumferential. (D) Circumferential mucosal break with one portion being of significant depth.

mucosal fold, as well as mucosal breaks involving most it is essential to design further studies specifically focus-ing on the radial extent of esophagitis using refined videoof the esophageal circumference, can be accurately and

reproducibly assessed. The question remains how to in- images and a work sheet individually designed to addressthis issue.terpret the low k values for mucosal breaks involving two

or more folds and the valleys in between. It is pertinent to The inclusion of ‘‘minimal lesions’’ in the endoscopicdiagnosis of GERD has been found to improve the sensi-emphasize that the design of the work sheet used in the

present evaluation program was aimed at gaining basic tivity of endoscopy compared with that of other diagnos-tic tests.13,34 The improvement in sensitivity, however,information on different endoscopic features rather than

specifically focusing on the accuracy by which the radial can only be achieved at the expense of a markedly de-creased specificity. In fact, studies have previously foundextent of mucosal breaks could be assessed. At this stage,

ESOPHAGITIS 91July 1996

no correlation between the subtle endoscopic changes of Forthcoming studies, already in progress, have to deter-mine the interobserver agreement on the radial extent of‘‘grade 1 esophagitis’’ and histological changes implying

GERD.5,13 Patients with mild or minimal lesions also mucosal breaks. Studies are also under way to determinethe usefulness of the proposed endoscopic classification inhave a weaker association between symptoms and docu-

mented reflux episodes (as assessed by esophageal pH- clinical practice. In this context, the proposed endoscopicclassification will be compared with the commonly usedmetry) than those with more clear-cut evidence of esoph-

agitis.1,13,34,35 Savary–Miller classification.19,20 In addition, the pre-dictive value of the proposed grading in the subsequentIt might be concluded from reviewing the literatureshort- and long-term outcome on medical therapy inthat these ‘‘minor’’ endoscopic stigmata represent disput-GERD has to be determined.able endoscopic lesions, resulting in great discrepancies

in the role of minimal changes in the diagnosis of theAppendixdisease. In the present study, we obtained unexpectedlyStatistical Analysishigh k values among experienced endoscopists with re-

spect to minimal changes. This observation is also unex- Despite the recognition for at least 40 years that ob-pected in view of previous data reporting a low level server variation is important in clinical medicine, there is stillof agreement when similar patients were concomitantly unfortunately no absolute criterion by which observer variationassessed by different investigators during ‘‘live endos- in clinical medicine can be measured. The most widely used

coefficient of agreement in clinical studies is the k statistic ofcopy.’’15 However, previous studies did not measure kCohen.22values for each individual endoscopic finding comprising

For the purposes of this study, for each individual character-the entity ‘‘minimal changes,’’ whereas the present studyistic in Figure 1, we analyzed data from a large number ofhas done so. The drawbacks and pitfalls of k statisticsobservers as a multiple of comparisons between each pair ofin, for instance, the absence of an item have also to beobservers. That is, if 20 observers recorded the presence orrecognized.36 We nevertheless consider it important toabsence of a particular feature, k statistics were calculated for

obtain additional information on interobserver variability a total of (n 1 n 0 1) (20 1 19) (380) comparisons. In thiswhen this minimal change issue is specifically addressed way, each observer is compared with each other observer.in the next phase of this evaluation process. Perhaps We used the k statistic in its original version. Po is themodern endoscopic technologies will allow more refined observed proportion of agreement and Pc is the expectedvideo images to be presented, offering the potential for agreement by chance in the relevant contingency table.higher accuracy and resolution in the assessment. The range of possible values for k and the conventional

interpretation of the k statistic are shown in Figure 2. InA proposal for a new endoscopic classification systemFigures 3–6, the data from the present study are describedwith emphasis on mucosal breaks, graded A to D ac-in a simple fashion. For each attribute, the pooled observercording to the extent, was presented to a symposium atagreement in recording that attribute is described by way ofthe World Congress of Gastroenterology in Los Angelesa mean k value (illustrated by a bar on the Figure) for all two-in 1994 (Table 1 and Figure 7). At this symposium, 219way interobserver comparisons together with an upper and

delegates recorded their impression of 20 still images.lower quartile band (illustrated by a column). Where appro-

The comparison between their results and the k values priate, observer subgroups are specified in the text and/or leg-from the more formal studies showed good agreement ends to Figures.despite the different manner of conducting them. Small Finally, it needs to be stressed that the study represents anbreaks (grade A) were reliably recorded in both studies. interobserver study, not an intraobserver study.Equally, there was a good agreement in both studies

Referencesabout the presence or absence of circumferential breaks.In constrast, there was less agreement concerning inter- 1. Galmiche JP, Bruley des Varannes S. Symptoms and diseases

severity in gastro-oesophageal reflux disease. Scand J Gastroen-mediate grades. The results of the set of studies thusterol 1994;29:201:62–68.strongly support the potential usefulness of grades A

2. Pace F, Santalucia F, Bianchi Porro G. Natural history of gastro-and D with some additional intermediate category. The oesophageal reflux disease without oesophagitis. Gut 1992;32:

858–848.precise details and definition of this intermediate cate-3. Schindlbeck NE, Klauser AG, Berghammer G, Londong W, Muller-gory need to be further developed.

Lissner SA. Three year follow up of patients with gastro-oesopha-It must be emphasized that this endoscopic classifica- geal reflux disease. Gut 1992;33:1016–1019.

4. Spechler SJ. Epidemiology and natural history of gastro-oesopha-tion system, which still represents a working model, isgeal reflux disease. Digestion 1992;51:(Suppl 1):24–29.not yet ready for clinical application. Adaptation of an

5. Armstrong D, Emde C, Inauen W, Blum AL. Diagnostic assess-incompletely developed scheme would result in its aban- ment of gastroesophageal reflux disease: what is possible vs

what is practical? Hepatogastroenterology 1992;39:3–13.donment, which has happened to many existing systems.

92 ARMSTRONG ET AL. GASTROENTEROLOGY Vol. 111, No. 1

6. Hetzel DJ, Dent J, Reed WD, et al. Healing and relapse of severe Working party report to the World Congress of Gastroenterology,Sydney 1990. J Gastroenterol Hepatol 1991;6:1–22.peptic esophagitis after treatment with omeprazole. Gastroenter-

ology 1988;95:903–912. 25. Bennett JR, Chapman RWG, Harvey RF, Heading RC, Morris AI,Stevens RM. Management guidelines for gastro-oesophageal re-7. Sandmark S, Carlsson R, Fausa O, Lundell L. Omeprazole orflux disease. The European Digestive Disease Week, Amsterdam,ranitidine in the treatment of reflux esophagitis. Results of dou-1991:46–54.ble-blind, randomized, Scandinavian multicentre trial. Scand J

26. Lundell L. Long-term treatment of gastro-oesophageal reflux dis-Gastroenterol 1988;23:625–632.ease with omeprazole. Scand J Gastroenterol 1994;29:(Suppl8. Tytgat GN, Nio CY, Schotborgh RH. Reflux esophagitis. Scand J201):74–78.Gastroenterol 1990;25:(Suppl 175):1–12.

27. Blum AL, Adami B, Bouzo MH, Brandstatter G, Fumagalli I, Gal-9. Koelz HR, Birchler R, Bretholz A, et al. Healing and relapse ofmiche JP, Hebbelin H, Heutschel E, Huttemann W, Schutz E,reflux oesophagitis during treatment with ranitidine. Gastroenter-Verlinden M, Italian Eurocis Trialists. Effect of cisapride on re-ology 1986;91:1198–1205.lapse of esophagitis. A multinational placebo-controlled trial in10. Tytgat GNJ, Hansen OJA, Carling L, Degroot GH, Geldof H, Glisepatients healed with an antisecretory drug. Dig Dis Sci 1993;H, Elfskind P, Elsborg L, Karvonen AL, Ohlin B, Solhaug OH,38:551–560.Vermeersch B. Effect of cisapride on relapse of reflux oesopha-

28. Kuster E, Ros E, Toledo-Pimentel V, Pujol A, Bordas JM, Grandegitis, healed with an antisecretory drug. Scand J GastroenterolJ, Pera C. Predictive factors of the long-term outcome in gastro-1992;27:175–183.oesophageal reflux disease: six year follow up of 107 patients.11. Lundell L, Backman L, Ekstrom T, et al. Prevention of relapse ofGut 1994;35:8–14.reflux esophagitis after endoscopic healing: the efficacy and

29. Johnsson F, Joelsson B, Isberg P-E. Ambulatory 24 hour intra-safety of omeprazole compared with ranitidine. Scand J Gas-esophageal pH-monitoring in the diagnosis of gastroesophagealtroenterol 1991;26:248–256.reflux disease. Gut 1987;28:1145–1150.12. Dent J, Yeomans ND, Mackinnon M, Reed W, Narielvala FM,

30. Singh S, Taylor RH, Colin-Jones DG. Simultaneous two levelHetzel DJ, Solcia E, Shearman DJS. Omeprazole versus ranitidineoesophageal pH monitoring in healthy controls and patients withfor prevention of relapse in reflux oesophagitis. A controlled dou-oesophagitis: comparison between two positions. Gut 1994;35:ble-blind trial of the efficacy and safety. Gut 1994;35:590–598.304–308.13. Armstrong D, Monnier P, Nicolet M, Blum AL, Savary M. Endo-

31. Baldi F, Ferrarini F, Langanesi A, Ragazzini M, Barbava L. Acidscopic assessment of oesophagitis. Gullet 1991;1:63–67.gastro-oesophageal reflux and symptom occurrence. Analyses of14. American Society of Gastrointestinal Endoscopy. The role of en-some factors influencing their association. Dig Dis Sci 1989;34:doscopy in the management of esophagitis—guidelines for clini-1890–1893.cal application. Gastrointest Endosc 1988;34:(Suppl 9s).

32. Gustavsson S, Bergstrom R, Erwall C, Krog M, Lindholm C-E,15. Bytzer P, Havelund T, Moller Hansen J. Interobserver variation inNyren O. Reflux esophagitis: assessment of therapy effects andthe endoscopic diagnosis of reflux esophagitis. Scand J Gas-observer variation by video documentation of endoscopy findings.troenterol 1993;28:119–125.Scand J Gastroenterol 1987;22:585–591.

16. Day DW. Infective, physical and chemical oesophagitis. In: 33. Kling PA, Edin K, Domellof L. Observer variability in upper gastro-Whitehead R, ed. Gastrointestinal and oesophageal pathology. intestinal fiberendoscopy. Scand J Gastroenterol 1985;20:462–Edinburgh: Churchill Livingstone 1989:377–383. 465.

17. Giuli R, Savary M, Skinner DB. Esophagitis and Barrett’s esopha- 34. Schule A, Brandli H, Belloni S, Koelz HR, Pirozynski WJ, Blumgus. In: Collaborative studies on esophagitis and esophageal AL. Endosckopische Diagnose der Oesophagitis. Wo liegt diedysplasia. Paris: Organisation Internationale d’Etudes Stat- Grenze zum normalen? Deutsche Medicinishe Wochenschriftistiques pour les Maladies de l’Oesophage, 1989. 1977;102:606–609.

18. Makuuchi H, Mitomi T. A new endoscopic criterion and classifica- 35. Havelund T, Laursen LS, Lauritsen K. Efficacy of omeprazole intion of reflux esophagitis. Abstracts from the 42nd Congress lower grades of gastro-oesophageal reflux disease. Scand J Gas-of the Japan Gastroenterological Endoscopy Society (abstr). Dig troenterol 1994;29:201:69–73.Endosc 1992;4:289. 36. Lind T, Havelund T, Carlsson R, Glise H, Hernqvist H, Junghard

19. Monnier P, Savary M. Contribution of endoscopy to gastro- O, Lauritsen K, Lundell L, Pedersen SA. The effect of omeprazoleoesophageal reflux disease (abstr). Scand J Gastroenterol 1984; (OME) 20 mg daily on heartburn in patients with endoscopy nega-106:26. tive reflux disease (ENRD) (abstr). Scand J Gastroenterol 1995;

20. Ollyo JB, Gontollier C, Brossard E, Lang F. La nouvelle classifica- 108:A151.tion de Savary des oesophagites de reflux. Acta Endosc 1992; 37. Brennan P, Silman A. Statistical methods for assessing observer22:307. variability in clinical measures. BMJ 1992;304:1491–1494.

21. Tytgat GN, Molenaar CBH. Grading endoscopic severity in gastro-esophageal reflux disease. Rom J Gastroenterol 1994;3:91–93. Received October 25, 1995. Accepted February 26, 1996.

22. Cohen J. A coefficient of agreement for nominal scales. Educ Address requests for reprints to: Lars Lundell, M.D., DepartmentPsychol Measurement 1960;20:37–46. of Surgery, Sahlgren’s Hospital, S-413 45 Gothenburg, Sweden. Fax:

23. Spechler SJ. Complications of gastroesophageal reflux disease. (46) 31-41-18-82.In: Castell DO, ed. The esophagus. Boston: Little Brown, 1992: This paper is dedicated to Professor F. T. de Dombal who sadly543–556. and suddenly died concomitant with the final preparation of this

24. Dent J, Bremner CG, Collens MJ, et al. Barrett’s oesophagus. report.