Improving the Reliability of Stroke Subgroup Classification Using the Trial of ORG 10172 in Acute...

8
Chilukuri, S. Beth Armstrong and Ronnie D. Horner Larry B. Goldstein, Michael R. Jones, David B. Matchar, Lloyd J. Edwards, Jennifer Hoff, Vani in Acute Stroke Treatment (TOAST) Criteria Improving the Reliability of Stroke Subgroup Classification Using the Trial of ORG 10172 Print ISSN: 0039-2499. Online ISSN: 1524-4628 Copyright © 2001 American Heart Association, Inc. All rights reserved. is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231 Stroke doi: 10.1161/01.STR.32.5.1091 2001;32:1091-1097 Stroke. http://stroke.ahajournals.org/content/32/5/1091 World Wide Web at: The online version of this article, along with updated information and services, is located on the http://stroke.ahajournals.org//subscriptions/ is online at: Stroke Information about subscribing to Subscriptions: http://www.lww.com/reprints Information about reprints can be found online at: Reprints: document. Permissions and Rights Question and Answer process is available in the Request Permissions in the middle column of the Web page under Services. Further information about this Once the online version of the published article for which permission is being requested is located, click can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office. Stroke in Requests for permissions to reproduce figures, tables, or portions of articles originally published Permissions: by guest on March 20, 2014 http://stroke.ahajournals.org/ Downloaded from by guest on March 20, 2014 http://stroke.ahajournals.org/ Downloaded from

Transcript of Improving the Reliability of Stroke Subgroup Classification Using the Trial of ORG 10172 in Acute...

Chilukuri, S. Beth Armstrong and Ronnie D. HornerLarry B. Goldstein, Michael R. Jones, David B. Matchar, Lloyd J. Edwards, Jennifer Hoff, Vani

in Acute Stroke Treatment (TOAST) CriteriaImproving the Reliability of Stroke Subgroup Classification Using the Trial of ORG 10172

Print ISSN: 0039-2499. Online ISSN: 1524-4628 Copyright © 2001 American Heart Association, Inc. All rights reserved.

is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231Stroke doi: 10.1161/01.STR.32.5.1091

2001;32:1091-1097Stroke. 

http://stroke.ahajournals.org/content/32/5/1091World Wide Web at:

The online version of this article, along with updated information and services, is located on the

  http://stroke.ahajournals.org//subscriptions/

is online at: Stroke Information about subscribing to Subscriptions: 

http://www.lww.com/reprints Information about reprints can be found online at: Reprints:

  document. Permissions and Rights Question and Answer process is available in the

Request Permissions in the middle column of the Web page under Services. Further information about thisOnce the online version of the published article for which permission is being requested is located, click

can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office.Strokein Requests for permissions to reproduce figures, tables, or portions of articles originally publishedPermissions:

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

Improving the Reliability of Stroke Subgroup ClassificationUsing the Trial of ORG 10172 in Acute Stroke Treatment

(TOAST) CriteriaLarry B. Goldstein, MD; Michael R. Jones, MD; David B. Matchar, MD; Lloyd J. Edwards, PhD;

Jennifer Hoff, MS; Vani Chilukuri, MD; S. Beth Armstrong, BA; Ronnie D. Horner, PhD

Background and Purpose—We sought to improve the reliability of the Trial of ORG 10172 in Acute Stroke Treatment(TOAST) classification of stroke subtype for retrospective use in clinical, health services, and quality of care outcomestudies. The TOAST investigators devised a series of 11 definitions to classify patients with ischemic stroke into 5 majoretiologic/pathophysiological groupings. Interrater agreement was reported to be substantial in a series of patients whowere independently assessed by pairs of physicians. However, the investigators cautioned that disagreements in subtypeassignment remain despite the use of these explicit criteria and that trials should include measures to ensure the mostuniform diagnosis possible.

Methods—In preparation for a study of outcomes and management practices for patients with ischemic stroke withinDepartment of Veterans Affairs hospitals, 2 neurologists and 2 internists first retrospectively classified a series of 14randomly selected stroke patients on the basis of the TOAST definitions to provide a baseline assessment of interrateragreement. A 2-phase process was then used to improve the reliability of subtype assignment. In the first phase, acomputerized algorithm was developed to assign the TOAST diagnostic category. The reliability of the computerizedalgorithm was tested with a series of synthetic cases designed to provide data fitting each of the 11 definitions. In thesecond phase, critical disagreements in the data abstraction process were identified and remaining variability wasreduced by the development of standardized procedures for retrieving relevant information from the medical record.

Results—The 4 physicians agreed in subtype diagnosis for only 2 of the 14 baseline cases (14%) using all 11 TOASTdefinitions and for 4 of the 14 cases (29%) when the classifications were collapsed into the 5 major etiologic/pathophys-iological groupings (k50.42; 95% CI, 0.32 to 0.53). There was 100% agreement between classifications generated bythe computerized algorithm and the intended diagnostic groups for the 11 synthetic cases. The algorithm was thenapplied to the original 14 cases, and the diagnostic categorization was compared with each of the 4 physicians’ baselineassignments. For the 5 collapsed subtypes, the algorithm-based and physician-assigned diagnoses disagreed for 29% to50% of the cases, reflecting variation in the abstracted data and/or its interpretation. The use of an operations manualdesigned to guide data abstraction improved the reliability subtype assignment (k50.54; 95% CI, 0.26 to 0.82). Criticaldisagreements in the abstracted data were identified, and the manual was revised accordingly. Reliability with the useof the 5 collapsed groupings then improved for both interrater (k50.68; 95% CI, 0.44 to 0.91) and intrarater (k50.74;95% CI, 0.61 to 0.87) agreement. Examining each remaining disagreement revealed that half were due to ambiguitiesin the medical record and half were related to otherwise unexplained errors in data abstraction.

Conclusions—Ischemic stroke subtype based on published TOAST classification criteria can be reliably assigned with theuse of a computerized algorithm with data obtained through standardized medical record abstraction procedures. Somevariability in stroke subtype classification will remain because of inconsistencies in the medical record and errors in dataabstraction. This residual variability can be addressed by having 2 raters classify each case and then identifying andresolving the reason(s) for the disagreement.(Stroke. 2001;32:1091-1097.)

Key Words: diagnosisn stroke classification

The Trial of ORG 10172 in Acute Stroke Treatment(TOAST) investigators noted that stroke prognosis, risk

of recurrence, and choices for management are influenced byischemic stroke subtype. Because of the potential importance

See Editorial Comment, page 1096

of stroke subtype in interpreting the results of this and otheracute intervention trials, they devised a series of 11 defini-

Received August 21, 2000; final revision received November 1, 2000; accepted February 6, 2001.From the Durham Veterans Affairs Medical Center (L.B.G., M.R.J., J.H., S.B.A., R.D.H.); Department of Medicine (Neurology [L.B.G., V.C.] and

General Internal Medicine [D.B.M.]), Stroke Policy Program, Center for Clinical Health Policy Research (L.B.G., D.B.M.), Center for Cerebrovascular Disease(L.B.G., D.B.M., V.C., R.D.H.), Duke University, Durham, NC; and Department of Biostatistics, University of North Carolina, Chapel Hill (L.J.E.).

Correspondence to Larry B. Goldstein, MD, Duke Center for Cerebrovascular Disease, Stroke Policy Program, Center for Clinical Health PolicyResearch, Box 3651, Duke University Medical Center, Durham, NC 27710. E-mail [email protected]

© 2001 American Heart Association, Inc.

Stroke is available at http://www.strokeaha.org

1091 by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

tions to classify patients with ischemic stroke into 5 majoretiologic/pathophysiological groupings (Table 1).1 Interraterreliability was moderate in a series of 18 patients who wereindependently assessed by 24 physician-investigators (overallk50.54).2 Since the description of the TOAST scheme, it hasbeen used to classify patients according to ischemic strokesubtype in several studies. However, the TOAST investiga-tors cautioned that disagreements in subtype assignmentremain despite the use of these explicit criteria and that trialsshould include measures to ensure the most uniform diagno-sis possible.2 In the final report of the TOAST study, all of thestroke subtype diagnoses were assigned by a central-blindedevaluator to minimize interrater variability.3

In preparation for a study of outcomes and managementpractices for patients with ischemic stroke within Departmentof Veterans Affairs (VA) hospitals, we used the publishedTOAST criteria1,2 to retrospectively categorize a series ofcases. Because only fair to moderate levels of interrateragreement were achieved, we engaged in a systematic effortto improve the reliability of assigning subtype diagnosesusing the TOAST definitions.

MethodsAll patient records used in this study were randomly selected fromthose of patients enrolled in the VA Acute Stroke Study. Patients hadbeen admitted to any of 9 geographically dispersed VA medicalcenters and identified by onsite research assistants. The diagnosis ofischemic stroke was confirmed by medical record review and, whenrequired, by consultation with the attending physician.

Two experienced neurologists and 2 internists first independentlyreviewed the medical records of each of 14 randomly selectedpatients and assigned stroke subtype on the basis of the TOASTdefinitions (Table 1). Each rater was provided with referencematerials listing the published criteria used by the TOAST investi-gators in assigning patients to a given diagnostic category.1 Only fairto moderate levels of interrater agreement (see Results) led to a2-phased approach aimed at improving reliability.

In the first phase, a standardized form was designed to record theabstracted data necessary to assign a TOAST subtype classification(Figure). A computerized SAS algorithm (SAS Institute Inc) wasthen developed and refined to classify cases according to the 11described TOAST definitions. The reliability of the computerizedalgorithm was tested with synthetic cases that provided data fittingeach of the 11 definitions (Table 2).

The computerized algorithm was then applied to the original 14cases with the use of data from 1 set of abstractions. The resultingdiagnostic categorizations were compared with each physician’sbaseline assignment. With variability due to differences in thesubjective interpretation of the data removed by use of the algorithm,remaining variability had to be due to differences in the physicians’application of the TOAST criteria or differences in the abstracteddata (ie, critical differences in data entered into the computerizedalgorithm could lead to differences in subtype diagnosis).

The data abstraction process was refined in the second phase.First, an operations manual designed to guide the abstraction processwas developed. Using the operations manual, 3 experienced abstrac-tors (a nurse with extensive stroke-related experience, a strokeneurologist, and a medical student with extensive training) indepen-dently recorded data on the standardized forms for a series of 17patients. Systematically comparing the data abstracted by each rateridentified areas of disagreement critical to the assignment of strokesubtype, and the abstraction manual was then revised though aseries of iterations. Using the final revised operations manual(see theAppendix, which may be found online at http://stroke.ahajournals.org), 2 raters then independently abstracted a final set of20 cases with the extracted data entered into the computerized

algorithm. Intrarater reliability was assessed by having 1 observerabstract a set of 61 cases on 2 separate occasions 6 months apart.

The degrees of intrarater and interrater reliability were measuredby simple percentage of agreement and with the unweightedkstatistic.4 The values of thek statistic may be interpreted in a mannersimilar to the interpretation of correlation coefficients (k50 to 0.20,slight; k50.21 to 0.40, fair;k50.41 to 0.60, moderate;k50.61 to0.80, substantial; andk50.81 to 1.00, almost perfect agreement).5

Probabilities reflect the chances that the calculatedk values werestatistically different from zero.

ResultsBaseline AssessmentTable 3 presents the baseline stroke subtype classificationsfor 14 patients with acute ischemic stroke derived by 2experienced neurologists and 2 internists using the TOASTcriteria. The internists were less likely to classify patients inthe undetermined category than the neurologists. One of theneurologists was unsure whether one of the patients actuallyhad a stroke. The 4 physicians agreed in subtype diagnosis foronly 2 of the 14 cases (14%) using all 11 categories (k50.29;95% CI, 0.21 to 0.37,P,0.0001). The 4 raters were concor-dant in diagnostic assignment for 4 of the 14 cases (29%)when the classifications were collapsed into the 5 majoretiologic/pathophysiological groupings (overallk statistic forthe 4 evaluators was 0.42; 95% CI, 0.32 to 0.53;P,0.0001).The 2 neurologists’ classifications were concordant for 6(43%) of the 14 cases using the full 11 TOAST categories andfor 8 cases (57%) using the collapsed 5 categories. One of theinternists arrived at the same diagnoses for 6 of the 8 patients(75%) for whom the 2 neurologists agreed. The secondinternist concurred with only 4 of these 8 classifications(50%).

Development and Reliability of ComputerizedDiagnostic AlgorithmBecause of the relatively poor reliability found in thisinitial assessment, a standardized abstraction form wasdevised (Figure), and a computerized algorithm was de-veloped to categorize patients according to the publishedTOAST criteria.1 This was accomplished though a series ofiterations in which both the abstraction form and computer

TABLE 1. TOAST Diagnostic Classification

Diagnostic Group Collapsed Group

1 Atherosclerosis, probable Atherosclerosis

2 Atherosclerosis, possible

3 Cardioembolic, probable Cardioembolic

4 Cardioembolic, possible

5 Lacunar, probable Lacunar

6 Lacunar, possible

7 Other determined etiology, probable Other determined etiology

8 Other determined etiology, possible

9 Undetermined etiology, complete evaluation Undetermined etiology

10 Undetermined etiology, incomplete evaluation

11 Multiple possible etiologies

Details of the definitions of each category are provided in the originalpublications.1,2

1092 Stroke May 2001

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

programming were tested and refined (data not shown). Agroup of 11 synthetic cases was then created to fit each ofthe TOAST categories (Table 2). There was 100% agree-ment between classifications determined by the computer-ized algorithm and the intended diagnostic groups for thesynthetic cases.

The computerized algorithm was then applied to the 14 casesused in the baseline assessment with data from one of theneurologist’s abstractions. The algorithm-based diagnosis agreedwith the 2 neurologists for 8 (57%) and 10 (71%) of the 14 casesand with the internists for 7 (50%) and 8 (57%) of the cases,respectively. Because the computerized algorithm yields consis-

tent diagnostic categorizations in accord with the TOASTdefinitions, these discrepancies could only have been related todifferences in the data as abstracted by the different raters, ordifferences in their interpretations of these data.

Reliability of Data AbstractionA manual to guide data abstraction was then developed bycomparing disagreements among 3 experienced reviewers(data not shown). To test the revised abstraction methodologyand to systematically explore remaining sources of variabil-ity, an additional set of 17 cases was independently reviewedby 2 raters with the extracted data entered into the comput-

TABLE 2. Synthetic Cases

Diagnostic Group Case Description

1 Atherosclerosis, probable New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Doppler shows.95% stenosis in right ICA. Angiogram shows 80% stenosis right ICA withbranch occlusion in right MCA. Patient in normal sinus rhythm. ECG normal. Echocardiogram normal. Nocoagulopathy.

2 Atherosclerosis, possible New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Doppler shows 60% stenosis in right ICA. Angiogram shows,50% stenosis in right ICA withbranch occlusion in right MCA. Patient in normal sinus rhythm. ECG normal. Echocardiogram normal. Nocoagulopathy.

3 Cardioembolic, probable New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Doppler shows,50% stenosis in ICAs. Patient in atrial fibrillation. Echocardiogram dilated leftatrium without clot. No coagulopathy.

4 Cardioembolic, possible New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Patient in atrial fibrillation. Echocardiogram dilated left atrium without clot. No coagulopathy.

5 Lacunar, probable History of hypertension. New onset left hemiparesis affecting face, arm, and leg to same extent. No cognitive, visual,or sensory deficits. CT shows loss area of ill-defined decreased attenuation in right internal capsule. Dopplershows,50% stenosis in ICAs. Patient in normal sinus rhythm. ECG normal. Echocardiogram normal. No coagulopathy.

6 Lacunar, possible History of hypertension. New onset left hemiparesis affecting face, arm, and leg to same extent. No cognitive, visual,or sensory deficits. CT shows loss area of ill-defined decreased attenuation in right internal capsule. Dopplershows,50% stenosis in ICAs. Patient in normal sinus rhythm. ECG normal. Echocardiogram patent foramen ovale. Nocoagulopathy.

7 Other determined etiology,possible

History of DVT and spontaneous abortion. New onset left hemiparesis affecting face, arm, and leg to same extent. Nocognitive, visual, or sensory deficits. CT shows loss area of ill-defined decreased attenuation in right internal capsule.Doppler shows,50% stenosis in ICAs. Patient in normal sinus rhythm. ECG normal. Echocardiogram normal. PTTprolonged without anticoagulants.

8 Other determined etiology,probable

History of DVT and spontaneous abortion. Prior workup showed protein C deficiency. New onset left hemiparesisaffecting face, arm, and leg to same extent. No cognitive, visual, or sensory deficits. CT shows loss area of ill-defineddecreased attenuation in right internal capsule. Doppler shows,50% stenosis in ICAs. Patient in normal sinus rhythm.ECG normal. Echocardiogram normal.

9 Undetermined etiology, completeevaluation

New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Doppler shows,50% stenosis in ICAs. Angiogram normal. Patient in normal sinus rhythm.ECG normal. Echocardiogram normal. No coagulopathy.

10 Undetermined etiology, incompleteevaluation

New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Patient in normal sinus rhythm. No coagulopathy.

11 Multiple possible etiologies New onset left hemiparesis with sensory deficit affecting face and arm more than leg. Left homonymoushemianopsia. Left hemispatial neglect. CT shows loss area of ill-defined loss of gray-white junction in rightparietotemporal region. Doppler shows.70% stenosis in right ICA. Angiogram shows 80% stenosis in right ICA withbranch occlusion in right MCA. Patient in atrial fibrillation. Echocardiogram dilated left atrium without clot. Nocoagulopathy.

ICA indicates internal carotid artery; MCA, middle cerebral artery; DVT, deep vein thrombosis; and PTT, partial thromboplastin time.

Goldstein et al Improving the Reliability of Stroke Subgroup Classification 1093

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

erized algorithm. Using the 5 collapsed categories, the 2raters agreed in subtype diagnosis for 11 of the 17 cases(65%; k50.54; 95% CI, 0.26 to 0.82;P,0.05).

Examination of the raters’ data abstraction forms revealedthat disagreements in subtype diagnoses were primarily dueto differences in the interpretations of CT and MRI scans,cardiovascular evaluations, and carotid ultrasound results.Many of these disagreements occurred because raters reliedon different reports in the medical record. For example, onerater used official radiology reports, whereas the other usedinterpretations of the studies as reflected in physicians’ notes.As a result, the operations manual was revised to specify ahierarchy of test reports to be used for abstraction of theresults of the radiological tests (Appendix).

Final Interrater and Intrarater ReliabilityThe abstraction process was repeated for an additional set of20 cases with the use of the final operations manual (Appen-dix) with patients categorized into the 5 collapsed majoretiologic/pathophysiological groupings (Table 4). Reliabilityfurther improved, with the 2 raters agreeing in subtypediagnosis for 75% of the cases (k50.68; 95% CI, 0.44 to0.91; P,0.05). All of the differences in diagnostic assign-ment were due to differences in abstracted data. Examiningeach disagreement revealed that half were due to ambiguitiesin the medical record (eg, in one case medical notes indicatedresults of an MRI without other evidence in the record that thetest was actually performed; in another case a carotid duplexevaluation indicated mild to moderate stenosis), and half weredue to otherwise unexplained errors in data abstraction.

Intrarater reliability was assessed by having one observerabstract a set of 61 cases on 2 separate occasions 6 monthsapart, with the data entered into the computerized algorithmfor diagnostic categorization. Diagnoses agreed for 50 cases(82%; k50.74; 95% CI, 0.61 to 0.87). Again, discrepancieswere largely due to differences in classification of CT and

MRI scans, cardiovascular evaluations, and carotid ultra-sound results.

DiscussionPresumed stroke subtype diagnosis guides both clinical eval-uations and treatment decisions and may be important forunderstanding differences in the impact of a given interven-tion in the setting of clinical trials. The Oxfordshire criteriacategorize subtypes of ischemic stroke primarily on the basisof vascular territory.6 Although this classification is bothreliable and valid and provides information relevant toprognosis, it does not classify stroke patients with regard topathophysiology or etiology. The TOAST classificationscheme for ischemic stroke subtype is being used in bothprospective clinical trials and retrospective studies of patternsof care and stroke-related outcomes for this purpose. Evenwhen applied prospectively in a clinical setting, the TOASTinvestigators found that initial stroke subtype diagnosisshould be made cautiously. Initial clinical impression ofstroke subtype with the use of the TOAST criteria agreed withfinal diagnosis in only 62% of patients.7

We found that the reliability of the TOAST classificationwas only fair to moderate when the published definitionswere retrospectively applied to a randomly selected series ofcases. Unified central assessment of stroke subtype was usedin the TOAST trial itself to minimize interrater variability.3

The presence of this variability confirms that the publishedTOAST criteria should be used with caution unless theinvestigators can demonstrate acceptable levels of agreementwithin the context of an individual study.2

TABLE 3. Baseline Interrater Reliability

Case Neurologist 1 Neurologist 2 Internist 1 Internist 2

1 O1 O1 CE2 O1

2 U2 AT2 L2 AT2

3 U2 U2 AT1 AT2

4 AT1 AT1 AT1 AT2

5 AT1 U2 AT1 AT1

6 U2 CE2 CE1 CE2

7 L2 L1 L1 L1

8 U2 L2 L1 L2

9 M M L1 L2

10 L1 L1 L1 L1

11 ? Stroke U1 U1 L1

12 AT1 AT1 AT1 AT1

13 O1 O2 U1 O1

14 U2 AT1 AT1 M

O indicates other determined etiology; CE, cardioembolic; U, undeterminedetiology; AT, atherosclerosis; L, lacunar; M, multiple possible etiologies; 1,probable; and 2, possible.

TABLE 4. Final Interrater Reliability

Case Rater 1 Rater 2

1 AT AT

2 AT AT

3 CE CE

4 CE CE

5 L L

6 L L

7 L L

8 L L

9 L L

10 L L

11 O O

12 U U

13 U U

14 U U

15 U U

16 AT L

17 AT U

18 CE U

19 CE U

20 O U

Abbreviations are as defined in Table 3.

1094 Stroke May 2001

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

TOAST subtype classification.

Goldstein et al Improving the Reliability of Stroke Subgroup Classification 1095

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

We used a 2-phase process aimed at improving the reli-ability of the TOAST classification scheme. Creation of acomputerized algorithm (available at http://hsrd.durham.med.va.gov/) eliminated variability due to differences in theinterpretation of stroke-related characteristics for a givenpatient. Remaining discrepancies were related to differencesin the abstracted data, prompting the development of astandardized manual and procedures for extracting relevantinformation from the medical record. This improved reliabil-ity in the classification of stroke subtype to the substantial toalmost-perfect level for both intraobserver and interobserveragreement. Residual differences in diagnostic categorizationwere related to simple errors in abstraction or ambiguities inthe medical record, occurring in 25% of cases.

In practice, having each medical record abstracted by 2raters with the data entered into the computerized algorithmcould identify this remaining variability. Abstraction formsfor cases in which there is a difference in subtype diagnosiscould then be reviewed (focused on CT and MRI scan,cardiovascular, and carotid ultrasound results) and the rea-son(s) for the discrepancies identified and resolved. Our datashow that variability in stroke subtype diagnosis can bereduced to a minimum through the use of this rigorousmethodology. The generalizability of these results will needto be confirmed in other settings.

AcknowledgmentsThis work was supported by grants from the Department of VeteransAffairs (Health Services Research and Development Service [SDR

93-003] and Cooperative Studies/Epidemiologic Research and Infor-mation Center Programs [CSP/ERIC 602]). The authors also wish toacknowledge Dr David Good (Wake Forest University) and Dr JohnR. Feussner (Chief Research Development Officer, Department ofVeterans Affairs) for efforts on initial phases of the study and MarenOlsen, PhD (Duke University), for statistical advice. KathleenHoffmann, RN, Denice Wood, RN, and Diedra Coney, RN (DukeUniversity), performed some of the data abstractions used in refiningthe procedures developed as part of this study.

References1. Adams HP Jr, Bendixen BH, Kappelle LJ, Biller J, Love BB, Gordon DL,

Marsh EEI, for the TOAST Investigators. Classification of subtype ofacute ischemic stroke: definitions for use in a multicenter clinical trial.Stroke. 1993;24:35–41.

2. Gordon DL, Bendixen BH, Adams HP Jr, Clarke W, Kappelle LJ,Woolson RF, for the TOAST Investigators. Interphysician agreement inthe diagnosis of subtypes of acute ischemic stroke: implications forclinical trials.Neurology. 1993;43:1021–1027.

3. Adams HP Jr, Woolson RF, Helgason C, Karanjia PN, Gordon DL. Lowmolecular weight heparinoid, ORG 10172 (Danaparoid), and outcomeafter acute ischemic stroke: a randomized controlled trial.JAMA. 1998;279:1265–1272.

4. Fleiss JL. Measuring nominal scale agreement among many raters.Psychol Bull. 1971;76:378–382.

5. Kramer MS, Feinstein AR. Clinical biostatistics, LIV: the biostatistics ofconcordance.Clin Pharmacol Ther. 1983;29:111–123.

6. Bamford J, Sandercock P, Dennis M, Burn J, Warlow C. Classificationand natural history of clinically identifiable subtypes of cerebralinfarction.Lancet. 1991;337:1521–1526.

7. Madden KP, Karanjia PN, Adams HP Jr, Clarke WR. Accuracy of initialstroke subtype diagnosis in the TOAST study.Neurology. 1995;45:1975–1979.

Editorial Comment

Classifying the Mechanisms of Ischemic Stroke

Schemes of classification, in one form or another, continueto be used widely in many areas of stroke care. Theirorigins lie in the classic descriptions of parenchymal andarterial pathology, but by the 1950s they were beginning toincorporate clinically based anatomic and mechanisticsubdivisions that could be used in vivo. Initially, thediagnoses of the underlying mechanisms of stroke werebased mainly on clinical patterns derived retrospectivelyfrom autopsy studies, but over the years the definitionshave been refined to incorporate the results of frequentlyperformed, and increasingly complex, investigations. Nev-ertheless, the newer classifications have continued to use anumber of core mechanistic groupings (eg, large-vesselatherosclerosis, cardioembolism, small-vessel disease) thatwere present in earlier classifications.

In the introduction to one of the earliest attempts tosynthesize the various strands of classification, Millikan1

wrote, “Our ultimate objectives are to obtain greater clarity ofthinking in regard to cerebrovascular diseases, to compose agenerally acceptable classification, to establish reliable crite-ria for diagnosis, and to promote further research in thisfield.”

One suspects that, outside the centers of stroke research,such aspirations were considerably in advance of their time,and that for the majority of stroke patients worldwide,meaningful (if fairly basic) subclassification became a realityonly with the advent of CT and ultrasound scanning. Most ofthe early research that used mechanistic classifications wasobservational epidemiology, most notably that from the MayoClinic2 and later from the Stroke Data Bank collaborators3

and the Lausanne group.4 When the original classificationwas reviewed some 17 years later, at a time when the growthin stroke research in general, and clinical trials in particular,was beginning to expand dramatically, Millikan5 wrote: “Itcontinues to be evident that in such a complex set ofclinical-pathophysiological phenomena some standard refer-ence language or set of definitions should be used or theliterature of investigation will be uninterpretable.”

The point about the need for a common language ofcommunication continues to be of paramount importance inan era when the uses of a classification have broadened fromobservational epidemiology to clinical trials and, more re-cently, to the purchasing of healthcare. Perhaps most impor-tantly, there are the individual clinicians caring for stroke

1096 Stroke May 2001

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from

patients who use the classifications to put the results from theresearch centers into the context of their daily practice.

Clearly, any scheme of classification that is used needs tobe as reliable as possible, and the article by Goldstein et aldescribes their experience using a computer algorithm toimprove the reliability of the widely used TOAST (Trial ofOrg 10172 in Acute Stroke Treatment) classification,6 ascheme that had its origins in the Stroke Data Bank classifi-cation of the 1980s and whose originators recognized that“Interobserver agreement is essential to the reliability ofclinical data from cooperative studies and provides thefoundation for applying research results to clinical practice.”7

However, as Goldstein et al stress, their objective was onlyto try to standardize retrospective data collection. That isquite a different proposition from using the classificationprospectively either in clinical research or to manage indi-vidual patients. Here, it is important to remember thatdiagnostic reliability (ie, interrater or intrarater agreement)should not be equated with diagnostic accuracy, somethingthat requires a gold standard against which it can be judged,and which is lacking in vivo for many stroke mechanisms.Indeed, Johnson et al8 noted that in the absence of such a goldstandard, “the merit of a classification system depends on itsclarity, utility and reproducibility.”

It seems likely that the relatively modest interrater andintrarater agreement of the TOAST classification, when usedprospectively in clinical practice,9,10 is in part a consequenceof that rather nebulous, but extremely important, entity ofclinical acumen, a complex interaction of pattern recognitionand experience-influenced, repeated testing of a hypothesisagainst available evidence. Of course, such behavior does notsit easily alongside an administrative “bean-counting mental-ity,” in which it is more important to have everything in acategory, regardless of the accuracy of the categorization!

So have the various classifications of stroke mechanismserved us well over the last 40 years? It seems to me that theyhave been rather blunt tools. Even at the population level, weknow relatively little about the natural history of the group-ings. Furthermore, they have failed to identify subgroups ofpatients who would benefit from acute interventions (theoriginal raison d’être of the TOAST classification), andwhere secondary prevention treatments have been moresuccessful, they have been targeted at much more specificgroups, eg, patients with atrial fibrillation or carotid stenosis.One suspects that the multiple failures of acute interventiontrials will prompt a thorough review of this whole area, andalthough it has been shown that advances such as multimodalMR can improve the reliability of the TOAST classification,11

perhaps we should also consider other schemes that may have

fewer links with the established clinicopathological para-digm. On the other hand, I think the current classifications docontribute to individual patient management, and harkingback to Millikan’s original aspirations, I am sure that many ofus will continue to use the basic skeleton of the classificationto bring greater “clarity of thinking” to our clinical practice.Indeed, Gross et al7 observed that clinicians were able to usethe classification to synthesize a number of basic clinical andinvestigative findings with relatively poor interrater andintrarater reliability to form a much more reliable overalldiagnosis. However, I do not envisage sitting in the outpatientclinic using the algorithm of Goldstein et al on my Palm orPsion for diagnostic purposes!

Dr John Bamford, MD, FRCP, Guest EditorSt James’s University Hospital

Beckett StreetLeeds, UK

[email protected]

References1. Advisory Council for the National Institute of Neurological Diseases and

Blindness. A classification and outline of cerebrovascular diseases. Neu-rology. 1958;8:395–434.

2. Matsumoto N, Whisnant JP, Kurland LT, Okazaki H. Natural history ofstroke in Rochester, Minnesota, 1955 through 1969: an extension of aprevious study, 1945 through 1954.Stroke. 1973;4:20–29.

3. Foulkes MA, Wolf PA, Price TR, Mohr JP, Hier DB. The Stroke DataBank: design, methods, and baseline characteristics.Stroke. 1988;19:547–554.

4. Bogousslavsky J, van Melle G, Regli F. The Lausanne Stroke Registry:analysis of 1000 consecutive patients with first stroke.Stroke. 1988;19:1083–1092.

5. Advisory Council for the National Institute of Neurological Diseases andBlindness. A classification and outline of cerebrovascular diseases.Stroke. 1975;6:564–616.

6. Adams HP Jr, Bendixen BH, Kappelle LJ, Biller J, Love BB, Gordon DL,Marsh EE III, and the TOAST investigators. Classification of subtype ofacute ischemic stroke: definitions for use in a multicenter clinical trial.Stroke. 1993;24:35–41.

7. Gross CR, Shinar D, Mohr JP, Hier DB, Caplan LR, Price TR, Wolf PA,Kase CS, Fishman IG, Calingo S, Kunitz SC. Interobserver agreement inthe diagnosis of stroke type.Arch Neurol. 1986;43:893–898.

8. Johnson CJ, Kittner SJ, McCarter RJ, Sloan MA, Stern BJ, Buchholz D,Price TR. Interrater reliability of an etiologic classification of ischemicstroke.Stroke. 1995;26:46–51.

9. Madden KP, Karanjia PN, Adams HP Jr, Clarke WR, and the TOASTInvestigators. Accuracy of initial stroke subtype diagnosis in the TOASTstudy.Neurology. 1995;45:1975–1979.

10. Gordon DL, Bendixen BH, Adams HP Jr, Clarke W, Kappelle LJ,Woolson RF, and the TOAST Investigators. Interphysician agreement inthe diagnosis of subtypes of acute ischemic stroke: implications forclinical trials.Neurology. 1993;43:1021–1027.

11. Lee LJ, Kidwell CS, Alger J, Starkman S, Saver JL. Impact on strokesubtype diagnosis of early diffusion-weighted magnetic resonanceimaging and magnetic resonance angiography.Stroke. 2000;31:1081–1089.

Goldstein et al Improving the Reliability of Stroke Subgroup Classification 1097

by guest on March 20, 2014http://stroke.ahajournals.org/Downloaded from