Concordance between four European centres of PET reporting criteria designed for use in multicentre...

10
ORIGINAL ARTICLE Concordance between four European centres of PET reporting criteria designed for use in multicentre trials in Hodgkin lymphoma Sally F. Barrington & Wendi Qian & Edward J. Somer & Antonella Franceschetto & Bruno Bagni & Eva Brun & Helén Almquist & Annika Loft & Liselotte Højgaard & Massimo Federico & Andrea Gallamini & Paul Smith & Peter Johnson & John Radford & Michael J. ODoherty Received: 13 January 2010 / Accepted: 27 April 2010 / Published online: 27 May 2010 # Springer-Verlag 2010 Abstract Purpose To determine if PET reporting criteria for the Response Adapted Treatment in Hodgkin Lymphoma (RATHL) trial could enable satisfactory agreement to be reached between corelaboratories operating in different countries. Methods Four centres reported scans from 50 patients with stage IIIV HL, acquired before and after two cycles of Adriamycin/bleomycin/vinblastine/dacarbazine. A five-point scale was used to score response scans using normalmediastinum and liver as reference levels. Centres read scans independently of each other. The level of agreement between centres was determined assuming (1) that uptake in sites involved at diagnosis that was higher than liver uptake represented disease (conservative reading), and (2) that uptake in sites involved at diagnosis that was higher than mediastinal uptake represented disease (sensitive reading). Results There was agreement that the response scan was positiveor negativefor lymphoma in 44 patients with a conservative reading and in 41 patients with a sensitive reading. Kappa was 0.85 (95% CI 0.740.96) for conservative reading and 0.79 (95% CI 0.670.90) for sensitive reading. Agreement was reached in 46 and 44 patients after discussion for the conservative and sensitive readings, respectively. S. F. Barrington (*) : E. J. Somer : M. J. ODoherty PET Imaging Centre at St Thomas, Kings College London Division of Imaging, Lambeth Palace Road, London SE1 7EH, UK e-mail: [email protected] W. Qian MRC Clinical Trials Unit, London, UK A. Franceschetto : B. Bagni Department of Nuclear Medicine, University of Modena and Reggio Emilia, Modena, Italy E. Brun : H. Almquist Departments of Oncology and Clinical Physiology, Lund University Hospital, Lund, Sweden A. Loft : L. Højgaard PET and Cyclotron Unit, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark M. Federico Department of Haematology and Oncology, University of Modena and Reggio Emilia, Modena, Italy A. Gallamini Hematology Department, Azienda Ospedaliera S. Croce e Carle, Cuneo, Italy P. Smith Cancer Research UK and UCL Cancer Trials Centre, London, UK P. Johnson Cancer Research UK Clinical Centre, Southampton, UK J. Radford The Christie NHS Foundation Trust and the University of Manchester, Manchester, UK Eur J Nucl Med Mol Imaging (2010) 37:18241833 DOI 10.1007/s00259-010-1490-5

Transcript of Concordance between four European centres of PET reporting criteria designed for use in multicentre...

ORIGINAL ARTICLE

Concordance between four European centres of PETreporting criteria designed for use in multicentre trialsin Hodgkin lymphoma

Sally F. Barrington & Wendi Qian & Edward J. Somer & Antonella Franceschetto &

Bruno Bagni & Eva Brun & Helén Almquist & Annika Loft & Liselotte Højgaard &

Massimo Federico & Andrea Gallamini & Paul Smith & Peter Johnson & John Radford &

Michael J. O’Doherty

Received: 13 January 2010 /Accepted: 27 April 2010 /Published online: 27 May 2010# Springer-Verlag 2010

AbstractPurpose To determine if PET reporting criteria for theResponse Adapted Treatment in Hodgkin Lymphoma(RATHL) trial could enable satisfactory agreement to bereached between ‘core’ laboratories operating in differentcountries.Methods Four centres reported scans from 50 patients withstage II–IV HL, acquired before and after two cycles ofAdriamycin/bleomycin/vinblastine/dacarbazine. A five-pointscale was used to score response scans using ‘normal’mediastinum and liver as reference levels. Centres read scansindependently of each other. The level of agreement between

centres was determined assuming (1) that uptake in sitesinvolved at diagnosis that was higher than liver uptakerepresented disease (conservative reading), and (2) that uptakein sites involved at diagnosis that was higher than mediastinaluptake represented disease (sensitive reading).Results There was agreement that the response scan was‘positive’ or ‘negative’ for lymphoma in 44 patients with aconservative reading and in 41 patients with a sensitivereading. Kappa was 0.85 (95%CI 0.74–0.96) for conservativereading and 0.79 (95% CI 0.67–0.90) for sensitive reading.Agreement was reached in 46 and 44 patients after discussionfor the conservative and sensitive readings, respectively.

S. F. Barrington (*) : E. J. Somer :M. J. O’DohertyPET Imaging Centre at St Thomas’,Kings College London Division of Imaging,Lambeth Palace Road,London SE1 7EH, UKe-mail: [email protected]

W. QianMRC Clinical Trials Unit,London, UK

A. Franceschetto : B. BagniDepartment of Nuclear Medicine,University of Modena and Reggio Emilia,Modena, Italy

E. Brun :H. AlmquistDepartments of Oncology and Clinical Physiology,Lund University Hospital,Lund, Sweden

A. Loft : L. HøjgaardPET and Cyclotron Unit, Rigshospitalet,Copenhagen University Hospital,Copenhagen, Denmark

M. FedericoDepartment of Haematology and Oncology,University of Modena and Reggio Emilia,Modena, Italy

A. GallaminiHematology Department, Azienda Ospedaliera S. Croce e Carle,Cuneo, Italy

P. SmithCancer Research UK and UCL Cancer Trials Centre,London, UK

P. JohnsonCancer Research UK Clinical Centre,Southampton, UK

J. RadfordThe Christie NHS Foundation Trust and theUniversity of Manchester,Manchester, UK

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833DOI 10.1007/s00259-010-1490-5

Conclusion The criteria developed for reporting in theRATHL trial are sufficiently robust to be used in amulticentre setting.

Keywords Positron emission tomography . Hodgkinlymphoma . Quality control/quality assurance . Clinical trial

Introduction

Positron emission tomography (PET) is a powerful prog-nostic indicator in Hodgkin lymphoma (HL) and aggressivenon-Hodgkin lymphoma (NHL) [1]. Recent reports indicatethat interim PET performed after two cycles of chemother-apy is a better predictor of response in advanced HL thanthe International Prognostic Score [2]. The challenge is todetermine whether response-adapted therapy using PET canimprove patient outcomes. Multicentre trials are underwayto test whether PET can be used to safely de-escalatetherapy in PET ‘responders’ to reduce toxicity withoutadversely affecting cure rates and to intensify treatment inPET ‘nonresponders’ to improve survival. One such trial isResponse-Adapted Therapy in Hodgkin Lymphoma(RATHL) trial (www.cancer.gov/clinicaltrials, referenceNCT00678327).

An important aspect of designing multicentre trialsinvolving PET is to ensure that images of high quality areobtained and reported consistently irrespective of thelocation at which the patient was scanned. Ideally reportingcriteria should be flexible enough to allow subgroupanalyses of individuals with uptake above background, as‘minimal residual uptake’ in sites of disease appears to havedifferent implications with regard to patient outcomeaccording to lymphoma type and stage [3, 4] and possiblyaccording to the treatment given. Reporting criteria previ-ously referred to as the ‘London criteria’ [5] have beendeveloped for use in the RATHL trial. The aim of this studywas to test in advance of the trial opening whether thecriteria would be sufficiently robust to enable satisfactoryagreement to be reached between ‘core’ laboratoriesreading scans in different European countries.

Methods

Four European PET centres took part in the reportingexercise. The test dataset was obtained from 50 consecutivepatients with HL who had baseline and response PET/CTscans acquired after two cycles of ABVD (Adriamycin,bleomycin, vinblastine, dacarbazine) between September2005 and March 2007 at the St Thomas’ PET Centre,London. Patients who met the eligibility criteria for theRATHL trial (i.e. >18 years of age with stage IIB–IV

classical HL or stage IIA with adverse features) wereidentified retrospectively from the patient database held atthe PET Centre. All scans were acquired after a 6-h fast oneither a Discovery VCT or a Discovery ST camera afteradministration of 350–400 MBq 18F-fluorodeoxyglucose(FDG). The uptake period of the response scans was alwayswithin 10 min of the uptake period of the baseline scansand both scans were acquired on the same camera. Half-body scans were acquired from the upper thigh to the skullbase with separate head and neck views as necessary.Images were reconstructed using OSEM. The CT scanswere acquired at 120 kVp and 65 mAs without administra-tion of oral or intravenous contrast agent. Scans wereanonymized and saved in DICOM format for loading ontothe usual reporting workstation at each centre. Each centreused its standard reporting software (Hermes Medical, GE-Xeleris or Siemens-Leonardo). Centres B and D used thesame reporting software.

The scans were reported using a five-point scale. Thisscale was developed at a meeting held in London attendedby invited international PET experts. The presence orabsence of uptake above background in regions involvedby lymphoma at diagnosis was scored using the mediasti-num and the liver as reference organs (Table 1). Reporterswere not ‘trained’ in advance of this exercise which wasperformed prior to the opening of the RATHL trial todetermine if the reporting system would be reproducibleacross national core laboratories undertaking central review.

Scans were read by two experienced reporters at eachPET centre. The reviewers at each centre scored the scanswithout knowledge of the clinical history or patientoutcome and independently from the other centres takingpart. The scores of this ‘independent read’ were collated.Scans where there was disagreement were discussed todetermine if consensus could be reached.

The hypothesis of the RATHL trial is that FDG PETimaging can be reproducibly and effectively applied in theearly assessment of response to chemotherapy for a risk-adapted treatment strategy in advanced HL. The design ofthe RATHL trial is shown in Fig. 1.

The study aims to assess:

1. Whether ABVD and AVD produce equivalent out-comes in patients with good prognosis who becomePET-negative after two cycles.

2. The 2-year progression-free survival (PFS) in patientswho remain PET-positive after two cycles of ABVDand have escalation of chemotherapy to BEACOPP-14or escalated BEACOPP.

Based on 1,200 patients entering the study and a 75%response rate with PET, the study will exclude 3-year PFSbeing 3–4% worse with AVD than ABVD with 90% power(assuming 3-year PFS with ABVD is 95%). With a total of

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833 1825

300 patients receiving intensified chemotherapy, the 2-yearPFS in this group of patients would be reliably estimatedwith a standard error of <3% and the outcome comparedwith earlier series.

The group of PET experts felt confident that residualuptake higher than normal liver uptake was likely torepresent lymphoma and this was chosen as the criterionfor a positive scan in the RATHL trial, where a score of 4 or5 is regarded as a ‘positive’ PET scan and results inescalation of therapy. The group were confident that uptakeequal to or lower than normal mediastinum was unlikely torepresent disease (score 1 or 2). However, there wasuncertainty about the nature of the uptake with intensitybetween that of the mediastinum and the liver (score 3).Therefore levels of agreement were measured regardingscore 3 as ‘negative’ (conservative reading) and score 3 as‘positive’ (sensitive reading).

Statistical analysis

The levels of agreement between four centres wereanalysed using non-weighted kappa statistics [6]. Kappastatistics were calculated for a PET/CT scan classified as‘negative’ (based on a score of 1, 2 or 3) or ‘positive’ (score

4 or 5; conservative reading) and ‘negative’ (based on ascore of 1 or 2) or ‘positive’ (score 3, 4 or 5; sensitivereading). Score ‘X’, which denotes new uptake unlikely torepresent lymphoma, was not included in the analysis.Kappa values between 0.81 and 1.00 indicate very goodagreement, between 0.61 and 0.80 good agreement,between 0.41 and 0.60 moderate agreement, and between0.21 and 0.4 fair agreement [7]. Analyses were performedusing Stata version 10.

Results

At baseline, 23 patients had stage II disease, 17 patientsstage III disease and 10 patients stage IV disease. Therewas agreement that the response scan was ‘positive’ (13scans) or ‘negative’ (31 scans) for lymphoma in 44 of 50patients at all four centres when scans were reported usingthe conservative reading. Consensus was reached in 46patients after discussion, with one scan deemed ‘positive’and one deemed ‘negative’. There was agreement that theresponse scan was ‘positive’ (16 scans) or negative (25scans) at all four centres in 41 of 50 patients when scanswere reported using the sensitive reading. Consensus was

Score PET/CT scan result

1 No uptake above background

2 Uptake ≤ mediastinum

3 Uptake > mediastinum but ≤ liver

4 Uptake moderately increased compared to the liver at any site

5 Uptake markedly increased compared to the liver at any site

X New areas of uptake unlikely to be related to lymphoma

Table 1 Five-point scoringsystem developed for theRATHL trial

Note: If mediastinal blood poolactivity is equal to or greaterthan activity in the liver, thenthe uptake within the lesionshould be compared with that inthe liver (uptake lesion < liverscore 2; lesion = liver score 3)

2 cycles ABVD Full dose, on schedule

PET negative PET positive

4 cycles ABVD

PET-CT 2

Randomize

Follow-up

4 cycles BEACOPP-14 or 3 cycles escBEACOPP

PET-CT 3

XRT or salvage

2 cycles BEACOPP-14 or 1 cycle escBEACOPP

4 cycles AVD no bleomycin

PET positive PET negative

PET-CT 1 (Staging)Fig. 1 Design of the RATHLtrial

1826 Eur J Nucl Med Mol Imaging (2010) 37:1824–1833

reached in 44 of 50 patients after discussion. Scans wherethere was disagreement are shown in Table 2 (using theconservative reading) and Table 3 (using the sensitivereading).

The issues over which the four centres could not come toa consensus were:

1. Distinguishing pathological from physiological orinflammatory uptake in the cervical region (patients15, 22, 29 and 41; Fig. 2).

2. Difficulty interpreting hilar uptake in the context ofresolution of uptake at other sites (patients 16 and 43;Fig. 3).

3. Interpreting low uptake in the axilla in a patientscanned with arms in different positions on the baselineand response scans (patient 36; Fig. 4).

Consensus was reached in the following patients(Tables 2 and 3) after discussion as follows:

1. Residual uptake in a left pelvic side wall node wasoverlooked at one centre (patient 10).

2. Physiological uptake in the myocardium was initiallyconsidered at one centre to represent misregistereduptake within adjacent subcarinal nodes, not myocar-dial uptake (patient 19).

3. Residual uptake in a neck node was overlooked at onecentre and at another centre the intensity of uptake in

the node was considered to be equal to the intensity ofthe liver rather than higher than liver uptake (patient28).

4. Uptake initially considered to be within residual necknodes at one centre was considered more likely torepresent prominent uptake in brown fat (patient 29).

Kappa was 0.85 (95% CI 0.74–0.96) for conservativereading (very good agreement) when score 4 or 5 wasregarded as ‘positive’, and 0.79 (95% CI 0.67–0.90) forsensitive reading (good agreement) when score 3, 4 or 5was regarded as ‘positive’ [7].

Discussion

The need for standardization for reporting PET scans is wellrecognized [1, 8–10]. Consistency in reporting is especiallyimportant in multicentre trials to ensure reproducibilitybetween different centres so that a meaningful analysis ofresults can be obtained and to validate PET reportingmethods so that they can be translated into clinical practice.

FDG uptake on a patient scan is not a ‘black or white’phenomenon indicating the presence or absence of malig-nancy. FDG is a nonspecific tracer which is taken up in anyprocess with increased glucose utilization. Images aredisplayed according to a continuum of uptake with thelikelihood of malignancy increasing with the level of FDG

Patient Score (Independent read) Score (Consensus read)

Centre A Centre B Centre C Centre D Centre A Centre B Centre C Centre D

15 5 5 1 4 5 5 1 4

16 4 5 1 5 4 5 1 5

19a 3 1 1 2 1 1 1 2

22 1 3 1 3 1 3 1 3

28a 3 1 4 3 3 3 3 3

29a 3 1 1 1 1 1 1 1

36 3 3 1 2 3 3 1 2

41 1 4 1 4 4 4 1 4

43 4 3 1 4 4 3 1 4

Table 3 Scoring of scans wherecentres disagreed whether thescan was ‘positive’ or ‘negative’with sensitive reading

a In patients 19, 28 and 29consensus was reached afterdiscussion.

Patient Score (Independent read) Score (Consensus read)

Centre A Centre B Centre C Centre D Centre A Centre B Centre C Centre D

10a 4 3 5 5 4 4 5 5

15 5 5 1 4 5 5 1 4

16 4 5 1 5 4 5 1 5

28a 3 1 4 3 3 3 3 3

41 1 4 1 4 4 4 1 4

43 4 3 1 4 4 3 1 4

Table 2 Scoring of scans wherecentres disagreed whether thescan was ‘positive’ or ‘negative’with conservative reading

a In patients 10 and 28 consensuswas reached after discussion.

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833 1827

uptake. The pretest likelihood of a ‘positive’ or ‘negative’test also influences the way in which FDG uptake can beinterpreted. ‘Minimal residual uptake’ (MRU) is a termwhich has been used to describe low-grade uptake that maybe seen after treatment for lymphoma [3, 4]. MRU has beenattributed to inflammatory change related to treatment and/orto low volume residual disease. The significance of MRU in

interim PET scans may differ according to the lymphomatype (e.g. HL vs. NHL), stage (early vs. advanced) andpossibly the treatment administered [2–4, 11–13]. Inpatients with early HL receiving ABVD chemotherapyand involved field radiotherapy with a low pretestlikelihood of disease, MRU is associated with a verygood prognosis [11]. Conversely MRU appears to be

Fig. 2 Residual uptake in theneck (arrows) in patient 41 wasinitially overlooked (one centre)but finally interpreted as disease(three centres) or inflammation(one centre). a, b Baselineslices: a coronal, b axial. c,d Response slices: c coronal, daxial

1828 Eur J Nucl Med Mol Imaging (2010) 37:1824–1833

associated with a poor prognosis in patients with advancedNHL receiving systemic chemotherapy and a higher pretestlikelihood of disease [4]. There is no consensus as to whatlevel of FDG uptake on a scan constitutes MRU. It has been

variously defined as uptake just above background [14–16],as uptake sometimes related to physiological uptake in theregion of the body in which it is present [15], as uptakesmaller than, equal to, or just higher than mediastinal blood

Fig. 3 Right hilar uptake(arrows) in patient 43 was var-iably interpreted as residualdisease or reactive change. a, bBaseline slices: a coronal, baxial. c, d Response slices: ccoronal, d axial

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833 1829

pool structures [11] and as uptake less than or equal tonormal liver uptake [5], and can be combined with anassessment of lesion size on CT [17].

The reporting criteria that were developed for use in theRATHL trial were intended to deal with some of the issuessurrounding the ‘grey-zone’ that is referred to as MRU. TheLondon criteria were intended to:

1. Be an objective reporting method for lymphoma thatwould be reproducible when used by reporters indifferent countries.

2. Allow the outcome in patients with different levels ofresidual FDG uptake to be analysed to determine if theseparation of patients according to the degree of FDGuptake is meaningful.

The same criteria could be applied in early and advancedstages of HL and NHL by altering the threshold used to

define a ‘positive’ scan rather than relying on severaldifferent criteria for the reporting of scans in patients withlymphoma. A high negative predictive value is desirable fora test where de-escalation of therapy is proposed in a groupof patients with a good prognosis. Here the aim is to reducetoxicity without adversely affecting outcome. A highpositive predictive value is desirable instead for a testwhere intensification of therapy is proposed in a group ofpatients with a poor prognosis in whom the aim is toimprove cure rates. The London criteria were adoptedrecently at a consensus meeting in Deauville for use in aninternational validation study for the reporting of interimscans with lymphoma [18].

Semiquantitative analysis has been used to try andimprove on the results of visual analysis. Current datasuggest that semiquantitative analysis does not haveadditional value in HL [11, 14, 15], but using the relative

Fig. 4 Residual uptake in theaxilla (arrow) in patient 36caused problems in interpreta-tion between centres possiblycontributed to by different posi-tioning of the arms for the twoscans. Baseline (a) and response(b) coronal slices are shown

1830 Eur J Nucl Med Mol Imaging (2010) 37:1824–1833

reduction in standardized uptake value (SUV) might bebeneficial for the assessment of response in NHL [19, 20].The problem with using a semiquantitative approach is thatthe SUV depends on many factors including the scannerused and the image reconstruction parameters. Standardi-zation is therefore required if using SUVs from multiplecentres and, while phantom measurements show goodagreement when standardization is implemented, errors thatmay occur such as inaccurate weight or tissued injectioncan have a large influence on SUV but a small effect onrelative visual assessment. The optimal threshold chosenalso depends on the cut-off points generated by thedistribution of the data in the studies [11, 19].

The aim of this study was to determine if the five-pointscale was practical and sufficiently robust to enablestandardization of reporting across four different Europeancentres for the RATHL trial. There was very goodagreement for conservative reading considering scores 1,2 and 3 as ‘negative’ for disease and 4 and 5 as ‘positive’for disease, which is how scans are read in the RATHLstudy. There was good agreement for sensitive readingconsidering scores 1 and 2 as ‘negative’ and 3, 4 and 5 as‘positive’, which may be relevant for future trials where ahigher negative predictive value from the PET scan isrequired than in the RATHL trial.

The cases where centres disagreed are typical of casesthat are challenging in everyday practice. There wasdifficulty separating physiological from pathological uptakein the neck with prominent muscle and brown fat uptake,even with PET/CT. Intravenous contrast agent might helpwith the distinction between nodal and physiological uptake[21], but this is not universally accepted [11, 19, 22, 23]. Inone case different positioning of the arms for the baselineand response scans may have caused problems withinterpretation of uptake in the axilla. Scanning in differentpositions is not considered best practice and this pair ofscans could have been excluded from the analysis but wewanted the dataset to be representative of the real clinicalsituation. It is not uncommon for a patient to be too unwellto tolerate a staging scan with the arms positioned abovethe head but to be able to tolerate this position for asubsequent scan. It may be preferable to perform theresponse scan with arms down and accept the abdominalartefacts that ensue if the baseline scan has demonstrateddisease in the neck or upper chest.

The interpretation of residual hilar uptake is a difficultarea. Reactive hilar uptake is often seen in smokers andpatients with airways disease and sarcoid-like reactionsmay occur with ‘false-positive’ hilar uptake in lymphoma[24]. In a recent study the incidence of bilateral hilar uptakein patients with cancer, excluding lung cancer was 14%[25]. In the 37 patients included with lymphoma, bilateralhilar uptake turned out to be benign in 49% and malignant

in 51%. Important variables in discriminating benign frommalignant included a history of granulomatous disease andexposure to asbestos. Reporters in our study were notallowed access to clinical information, which might havehelped in interpretation. In two studies of observer variationin PET/CT reporting for staging lymphoma and lungcancer, agreement was near perfect between three experi-enced reporters from the same institution for superior andinferior mediastinal nodal stations but much lower for hilarnodes [10, 26]. Poor agreement has been reported for thepresence or absence of disease in hilar regions even withcontrast-enhanced CT amongst experts reporting paediatricHL [27].

Zijlstra et al. reported observer variation between 11nuclear medicine physicians reading PET-only scans from37 patients with HL and NHL [8]. Levels of agreementwere much lower than in our study but predefined reportingcriteria were not used. Reporters were asked ‘to assess thelikelihood of malignancy of any potential lymphomalesion’. Scans were PET, not PET/CT, and not allphysicians were experienced reporters.

More recently Horning et al. reported agreement levelsbetween three experienced reporters scoring paired baselineand interim response scans from 38 patients with diffuselarge B-cell lymphoma within a trial setting [5]. Patientswith PET-positive scans after three cycles of R-CHOP(rituximab, cyclophosphamide, doxorubicin, vincristine,prednisone) had treatment escalated to RICE (rituximab,ifosfamide, carboplatin, etoposide). Agreement was reachedbetween reviewers as to whether a response scan was‘positive’ or ‘negative’ for disease in 26 of 38 patientsusing predefined Eastern Co-operative Oncology Group(ECOG) reporting criteria and in 27 of 38 using the Londoncriteria, which were used in our study. It is not surprisingthat agreement levels were similar as the two sets ofreporting criteria are similar in that they both use liveruptake as the cut-off for a ‘positive’ scan. Kappa valueswere reported as 0.445 and 0.502, respectively, indicatingonly moderate agreement.

Possible explanations for the better agreement in our studymay relate to the way in which the reporting exercise wasconducted rather than superiority of the reporting criteria.

1. PET/CT was used in all cases in our study, whereasHorning et al. included PET-only scans. It is not statedhow many scans were PET-only but better agreementwould be expected for PET/CT than for PET alone,especially in the interpretation of paraaortic uptakewhich caused problems in interpretation in five patientsin their study [24].

2. Double-reading was carried out in our study by twoexperienced readers in national core laboratories, ratherthan by single readers.

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833 1831

3. There may be less experience in reporting ‘interim’PET scans in US than in European centres asreimbursement for PET is based on examination ofresponse at the end of a course of treatment in the US.

There may be a higher number of ‘false positive’ scansin diffuse large B-cell lymphoma than in HL [20] and withrituximab treatment [12], which could also possibly haveaffected reporter confidence.

Conclusion

There was very good agreement in the reporting of interimPET/CT scans between expert readers from four differentEuropean centres using a five-point scale. The reportingcriteria were sufficiently robust to allow standardizedreporting of scans at different core laboratories for thepurpose of multicentre trials even when different thresholdsfor a ‘positive’ scan are to be applied. Conservative readingusing the five-point scale has been implemented in theRATHL trial.

Continued audit will be required to ensure thatconsistency is maintained in the RATHL trial andvalidation of the reporting criteria will be required indifferent groups of patients with lymphoma receivingdifferent therapy. Reproducibility in reporting will makeit possible to apply PET reporting methods validated in atrial setting to be translated into clinical practice for thebenefit of patients.

Acknowledgments The RATHL study is supported by CancerResearch UK. The authors thank Dr. Biggi, Azienda Ospedaliera S.Croce e Carle, Cuneo, Dr Quon, Stanford University, Dr. Juweid,University of Iowa, and Dr. Elstrom, University of Michigan, forcontributions to the development of reporting criteria for the RATHL trial.

Conflicts of interest None.

Reference

1. Hutchings M, Barrington SF. PET/CT for therapy responseassessment in lymphoma. J Nucl Med 2009;50:21S–30S.

2. Gallamini A, Hutchings M, Rigacci L, Specht L, Merli F, HansenM, et al. Early interim 2-[18F]fluoro-2-deoxy-D-glucose positronemission tomography is prognostically superior to internationalprognostic score in advanced-stage Hodgkin's lymphoma: a reportfrom a joint Italian-Danish study. J Clin Oncol 2007;25:3746–52.

3. Hutchings M, Mikhaeel NG, Fields PA, Nunan T, Timothy AR.Prognostic value of interim FDG-PET after two or three cycles ofchemotherapy in Hodgkin lymphoma. Ann Oncol 2005;16:1160–8.

4. Mikhaeel NG, Hutchings M, Fields PA, O'Doherty MJ, TimothyAR. FDG-PET after two to three cycles of chemotherapy predictsprogression-free and overall survival in high-grade non-Hodgkinlymphoma. Ann Oncol 2005;16:1514–23.

5. Horning SJ, Juweid ME, Schoder H, Wiseman G, McMillan A,Swinnen LJ, et al. Interim positron emission tomography (PET)scans in diffuse large B-cell lymphoma: an independent expert

nuclear medicine evaluation of the Eastern Cooperative OncologyGroup E3404 study. Blood 2010;115:775–7.

6. Cohen J. A coefficient of agreement for nominal scales. EducPsychol Meas 1960;20:37–46.

7. Altman DG. Some common problems in medical research.Practical statistics for medical research. London: Chapman andHall/CRC; 1999. p. 396–439.

8. Zijlstra JM, Comans EF, van Lingen A, Hoekstra OS, Gundy CM,Willem Coebergh J, et al. FDG PET in lymphoma: the need forstandardization of interpretation. An observer variation study.Nucl Med Commun 2007;28:798–803.

9. Chiti A. Evaluation of response: is 18F-FDG PET the answer? EurJ Nucl Med Mol Imaging 2009;36:733–4.

10. Hofman MS, Smeeton NC, Rankin SC, Nunan T, O'Doherty MJ,Hofman MS, et al. Observer variation in FDG PET-CT for stagingof non-small-cell lung carcinoma. Eur J Nucl Med Mol Imaging2009;36:194–9.

11. Hutchings M, Loft A, Hansen M, Pedersen LM, Buhl T, JurlanderJ, et al. FDG-PET after two cycles of chemotherapy predictstreatment failure and progression-free survival in Hodgkinlymphoma. Blood 2006;107:52–9.

12. Han HS, Escalon MP, Hsiao B, Serafini A, Lossos IS. Highincidence of false-positive PET scans in patients with aggressivenon-Hodgkin's lymphoma treated with rituximab-containing regi-mens. Ann Oncol 2009;20:309–18.

13. Hutchings M, Loft A, Hansen M, Ralfkiaer E, Specht L,Hutchings M, et al. Different histopathological subtypes ofHodgkin lymphoma show significantly different levels of FDGuptake. Hematol Oncol 2006;24:146–50.

14. Gallamini A, Rigacci L, Merli F, Nassi L, Bosi A, Capodanno I, et al.The predictive value of positron emission tomography scanningperformed after two courses of standard therapy on treatment outcomein advanced stageHodgkin's disease. Haematologica 2006;91:475–81.

15. Kostakoglu L, Goldsmith SJ, Leonard JP, Christos P, Furman RR,Atasever T, et al. FDG-PET after 1 cycle of therapy predictsoutcome in diffuse large cell lymphoma and classic Hodgkindisease. Cancer 2006;107:2678–87.

16. Querellou S, Valette F, Bodet-Milin C, Oudoux A, Carlier T,Harousseau JL, et al. FDG-PET/CT predicts outcome in patientswith aggressive non-Hodgkin's lymphoma and Hodgkin's disease.Ann Hematol 2006;85:759–67.

17. Juweid ME, Stroobants S, Hoekstra OS, Mottaghy FM, DietleinM, Guermazi A, et al.; Imaging Subcommittee of InternationalHarmonization Project in Lymphoma. Use of positron emissiontomography for response assessment of lymphoma: consensus ofthe Imaging Subcommittee of International Harmonization Projectin Lymphoma. J Clin Oncol 2007;25:571–8.

18. Meignan M, Gallamini A, Haioun C. Report on the FirstInternational Workshop on Interim-PET-Scan in Lymphoma. LeukLymphoma 2009;50:1257–60.

19. Lin C, Itti E, Haioun C, Petegnief Y, Luciani A, Dupuis J, et al.Early 18F-FDG PET for prediction of prognosis in patients withdiffuse large B-cell lymphoma: SUV-based assessment versusvisual analysis. J Nucl Med 2007;48:1626–32.

20. Itti E, Lin C, Dupuis J, Paone G, Capacchione D, Rahmouni A, etal. Prognostic value of interim 18F-FDG PET in patients withdiffuse large B-cell lymphoma: SUV-based assessment at 4 cyclesof chemotherapy. J Nucl Med 2009;50:527–33.

21. Berthelsen AK, Holm S, Loft A, Klausen TL, Andersen F,Hojgaard L. PET/CT with intravenous contrast can be used forPET attenuation correction in cancer patients. Eur J Nucl MedMol Imaging 2005;32:1167–75.

22. Rodriguez-Vigil B, Gomez-Leon N, Pinilla I, Hernandez-Maraver D,Coya J, Martin-Curto L, et al. PET/CT in lymphoma: prospectivestudy of enhanced full-dose PET/CT versus unenhanced low-dosePET/CT. J Nucl Med 2006;47:1643–8.

1832 Eur J Nucl Med Mol Imaging (2010) 37:1824–1833

23. Elstrom RL, Leonard JP, Coleman M, Brown RKJ. Combined PETand low-dose, noncontrast CT scanning obviates the need foradditional diagnostic contrast-enhanced CT scans in patients undergo-ing staging or restaging for lymphoma. Ann Oncol 2008;19:1770–3.

24. Barrington SF, O'Doherty MJ. Limitations of PET for imaginglymphoma. Eur J Nucl Med Mol Imaging 2003;30:S117–27.

25. Karam M, Roberts-Klein S, Shet N, Chang J, Feustel P, Karam M,et al. Bilateral hilar foci on 18F-FDG PET scan in patients without

lung cancer: variables associated with benign and malignantetiology. J Nucl Med 2008;49:1429–36.

26. Hofman MS, Smeeton NC, Rankin SC, Nunan T, O'Doherty MJ.Observer variation in interpreting 18F-FDG PET/CT findings forlymphoma staging. J Nucl Med 2009;50:1594–7.

27. Fletcher BD, Glicksman AS, Gieser P. Interobserver variability inthe detection of cervical-thoracic Hodgkin's disease by computedtomography. J Clin Oncol. 1999;17:2153.

Eur J Nucl Med Mol Imaging (2010) 37:1824–1833 1833