Quality Assessment Tools Project Report - CADTH

139
Canadian Agency for Drugs and Technologies in Health Agence canadienne des médicaments et des technologies de la santé Supporting Informed Decisions HTA Quality Assessment Tools Project Report July 2012

Transcript of Quality Assessment Tools Project Report - CADTH

Canadian Agency forDrugs and Technologies

in Health

Agence canadienne des médicaments et des technologies de la santé

Supporting Informed Decisions

HTAQuality Assessment Tools Project Report

July 2012

Until April 2006, the Canadian Agency for Drugs and Technologies in Health (CADTH) was known as the Canadian Coordinating Office for Health Technology Assessment (CCOHTA).

Cite as: Bai A, Shukla VK, Bak G, Wells G. Quality Assessment Tools Project Report. Ottawa: Canadian Agency for Drugs and Technologies in Health; 2012. Production of this report is made possible by financial contributions from Health Canada and the governments of Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Saskatchewan, and Yukon. The Canadian Agency for Drugs and Technologies in Health takes sole responsibility for the final form and content of this report. The views expressed herein do not necessarily represent the views of Health Canada or any provincial or territorial government. Reproduction of this document for non-commercial purposes is permitted provided appropriate credit is given to CADTH. CADTH is funded by Canadian federal, provincial, and territorial governments. Legal Deposit – 2012 National Library of Canada ISBN: 978-1-897465-88-2 (print) ISBN: 987-1-897465-89-9 (online) July 2012 PUBLICATIONS MAIL AGREEMENT NO. 40026386 RETURN UNDELIVERABLE CANADIAN ADDRESSES TO CANADIAN AGENCY FOR DRUGS AND TECHNOLOGIES IN HEALTH 600-865 CARLING AVENUE OTTAWA ON K1S 5S8

Publications can be requested from:

CADTH 600-865 Carling Avenue

Ottawa ON Canada K1S 5S8 Tel.: 613-226-2553 Fax: 613-226-5392

Email: [email protected]

or downloaded from CADTH’s website: http://www.cadth.ca

Reviewers CADTH staff would like to thank the following people for their time, assistance, and expert input throughout the project, including guidance on the approach and methods, and constructive feedback on drafts of this report. Gordon H. Guyatt, BSc, MD, MSc, FRCPC Distinguished Professor (Clinical Epidemiology & Biostatistics, and Medicine) McMaster University Co-founder and co-chair of the GRADE working group.

Andy Oxman Research Director Global Health Unit Norwegian Knowledge Centre for the Health Services Norway

Holger Schünemann, M.D., Ph.D. (Epi), M.Sc., FRCP(C) Chair and Professor Department of Clinical Epidemiology and Biostatistics McMaster University Hamilton, Ontario, Canada

Kari A.O. Tikkinen Visiting Research Fellow (clinical epidemiologist and urologist) McMaster University and University of Helsinki

1 Canadian Agency for Drugs and Technologies in Health (CADTH), Ottawa, Ontario, Canada 2 Corresponding Author 3 University of Ottawa Heart Institute, Ottawa, Ontario

Canadian Agency for Drugs and Technologies in Health

Quality Assessment Tools Project Report

Annie Bai, MSc, MD1 Vijay K. Shukla, RPh, PhD1

Greg Bak, MLIS, PhD1 George Wells, PhD2,3

July 2012

Acknowledgments The authors acknowledge the continuing effort and support of whole QAT working group. We appreciate the considerable support, commitment, and contributions from CADTH’s internal staff at the time of this work: Hayley Fitzsimmons (information specialist) who validated the original search strategy then updated the literature search; Samantha Verbrugghe (research assistant) who helped manage literature selection results and all references involved in the project. In addition, we would like to extend our appreciation to external researchers: Sarah Milne who participated in two key steps of QAT project, that is, identifying and evaluating potential tools; Kathleen Duclos and Renee Lafleur who checked the data of the original QAT project and selected the literature for updating QAT project; and Kasey Parker who provided input in writing the discussion session of this report. We owe our thanks as well to external experts for their valuable input: Dr. David Atkins, Dr. Brian Haynes, Dr. David Moher, Dr. Cynthia Mulrow, Dr. Andy Oxman, Dr. Barnaby Reeves, Dr. Beverley Shea, and Dr. Paul Shekelle as well as Ms. Pam McLean-Veysey. We would also like to thank Denis Bélanger, Barb Shea and the late Barbara Wells for their support to this project.

Abbreviations AHRQ Agency for Healthcare Research and Quality

CADTH Canadian Agency for Drugs and Technologies in Health

CCOHTA Canadian Coordinating Office for Health Technology Assessment

COMPUS Canadian Optimal Medication Prescribing and Utilization Service

EPCs evidence-based practice centers

EGSs evidence grading systems

OBSs observational studies

QAIs quality assessment instruments

QAT quality assessment tools

RCTs randomized controlled trials

SRs systematic reviews

Quality Assessment Tools Project Report i

TABLE OF CONTENTS EXECUTIVE SUMMARY ........................................................................................................... iii 1 INTRODUCTION ................................................................................................................. 1

1.1 Rationale for the QAT Project ................................................................................... 1 1.2 Goals of this QAT Report ......................................................................................... 1 1.3 Quality Assessment in Systematic Reviews of Scientific Evidence ........................... 2 1.3.1 Assessing the quality of individual studies ..................................................... 2 1.3.2 Grading the strength of a body of evidence ................................................... 2 1.4 Existing QAIs and EGSs ........................................................................................... 3 1.5 AHRQ Evidence Report1 .......................................................................................... 3 1.6 Objective of the QAT Project .................................................................................... 4

2 METHODS ........................................................................................................................... 5

2.1 Overview of QAT Project Methods............................................................................ 5 2.2 Details of QAT Project Methods ............................................................................... 8

3 RESULTS .......................................................................................................................... 15

3.1 Summary of QAT Project Results ............................................................................15 3.2 Collection of existing QAIs and EGSs ......................................................................17 3.2.1 Collection of existing QAIs and EGSs from review articles ...........................17 3.2.2 Conducted initial expert consultation (First round expert consultation — ..... Step 4) .........................................................................................................19 3.2.3 Searched and selected individual QAIs and EGSs (QAT 2: 2000 to August 2005) (Step — 5) ..............................................................................19 3.3 Identification of potential QAIs and EGSs for evaluation (Second round combined analysis — Step 6) ..................................................................................20 3.4 Evaluation of the potential QAIs and EGSs identified ..............................................23 3.4.1 Evaluation of the potential QAIs for SRs ......................................................23 3.4.2 Evaluation of the potential QAIs for RCTs ...................................................23 3.4.3 Evaluation of the potential QAIs for OBSs ....................................................24 3.4.4 Evaluation of the potential EGSs ..................................................................24 3.5 Consultation on the QAIs and EGSs selected .........................................................25 3.5.1 Conducted second expert consultation (Step — 8) .......................................25 3.5.2 Chose QAIs and EGSs for CADTH (Step — 9) ............................................25 3.5.3 Conducted stakeholder consultation (Step — 10) .........................................26 3.6 Updating QAIs and EGSs ........................................................................................26 3.6.1 Updating QAT 1 (Step — 11) .......................................................................26 3.6.2 Updating QAT 2 (Step — 12) .......................................................................26

4 DISCUSSION .................................................................................................................... 27

4.1 Tools selected through QAT project ........................................................................27 4.1.1 AMSTAR for SRs .........................................................................................27 4.1.2 SIGN 50 checklist for RCTs .........................................................................27 4.1.3 SIGN 50 checklist for OBSs .........................................................................28 4.1.4 GRADE 2004 for EGS ..................................................................................28 4.2 Application of the evaluation tools selected .............................................................29

Quality Assessment Tools Project Report ii

4.3 Methodological issues .............................................................................................29 4.3.1 Literature search ..........................................................................................29 4.3.2 Study funding ...............................................................................................30 4.4 Strengths and limitations of the QAT project ............................................................31

5 CONCLUSION ................................................................................................................... 31 6 REFERENCES .................................................................................................................. 32 APPENDIX A: Products of QAT Project ............................................................................... A-1 APPENDIX B: Search strategy ........................................................................................... A-10 APPENDIX C: Selection criteria ......................................................................................... A-22 APPENDIX D: Literature selection ..................................................................................... A-26 APPENDIX E: Reference lists of QAIs and EGSs .............................................................. A-32 APPENDIX F: Evaluation results ........................................................................................ A-70 APPENDIX G: QAIs and EGSs selected ............................................................................. A-79

Quality Assessment Tools Project Report iii

EXECUTIVE SUMMARY Introduction In March 2004, the Canadian Optimal Medication Prescribing and Utilization Service (COMPUS) was launched by the Canadian Coordinating Office for Health Technology Assessment (CCOHTA) — now the Canadian Agency for Drugs and Technologies in Health (CADTH) — as a service to federal, provincial, and territorial jurisdictions and other stakeholders. COMPUS was a nationally coordinated program, funded by Health Canada. To meet the goals of the COMPUS program, relevant and rigorously derived evidence-based information was required for making recommendations on optimal drug prescribing and use. However, the quality of scientific evidence varies, depending on the study design, conduct, and analysis; as well existing quality assessment tools also vary. CADTH staff embarked on this Quality Assessment Tools (QAT) project to identify the most appropriate tools for evaluating and grading evidence. The term “quality” means different things in different contexts. For this review the quality terminology is considered in the context of “risk of bias.” CADTH used a systematic approach to identify the most appropriate quality assessment instruments (QAIs) for assessing the quality of systematic reviews (SRs), randomized controlled trials (RCTs) and observational studies (OBSs) (mainly cohort and case-control studies), as well as evidence grading systems (EGSs) for rating the strength of a body of evidence. The tools selected by CADTH should be of the highest scientific credibility, be user-friendly, and be supported by most experts and stakeholders. The QAT project was originally conducted from January 2005 to October 2005 and subsequently updated from September 2007 to December 2007. The work of this project has been used to support the CADTH optimal use projects at that time and has been referenced in publications and presented at conferences. To date, many requests for details about this project were received from outside organizations. Given the interest and the important role of quality assessment tools in technology assessment, the authors are sharing the approach they took to conduct this project, as well as their findings and conclusions in the current report. Objective The objective of this QAT project was to identify appropriate QAIs by various study designs and EGSs for CADTH optimal use work. The objective of this report is to document the work undertaken between 2005 and 2007 through the QAT project. Methods Accepting the findings from Agency for Healthcare Research and Quality (AHRQ) evidence report No. 47 1, published in 2002, and building upon this report, CADTH applied a systematic

Quality Assessment Tools Project Report iv

process to select QAIs for assessing the quality of SRs, RCTs and OBSs (cohort and case-control studies), as well as to select an EGS, mainly through: updating the review of QAIs and EGSs; and identifying the most appropriate QAIs and EGSs, that is the ones that were most feasible and efficient for practice use. The whole process was divided into 10 steps for the original project, plus two steps for updating. The QAT working group classified these steps into five main sections as follows: Step — 1: Assemble the QAT working group Collection of existing QAIs and EGSs Step — 2: Search and select review articles (January 2000 to February 2005) Step — 3: Identify existing QAIs and EGSs from review articles Step — 4: Consult experts with above collection Step — 5: Search and select individual QAIs and EGSs (January 2000 to August 2005) Identification of potential QAIs and EGSs Step — 6: Identify potential QAIs and EGSs from existing ones for evaluation Evaluation of potential QAIs and EGSs Step — 7: Evaluate potential QAIs and EGSs identified Consult on appropriate QAIs and EGSs selected Step — 8: Consult experts with evaluation results Step — 9: Choose appropriate QAIs and EGSs for COMPUS Step —10: Collect stakeholders’ input on CADTH’s choices Update the original QAT project (2005 to September 2007) Step —11: Update the search and selection of review articles Step —12: Update the search and selection of individual QAIs and EGSs

Building upon the original search strategy used in the AHRQ report for literature dating from 1995 to June 2000, and taking into account the recommendations of the AHRQ team, a very sensitive search for systematic review articles of QAIs and EGSs was designed and carried out. Published literature was identified by searching the following bibliographic databases: PubMed, MEDLINE, Embase, BIOSIS Previews, and The Cochrane Library. CADTH’s methodological filter was applied to limit retrieval to systematic reviews. The search was also limited to documents published between January 2000 and February 2005. Later, a highly specific supplemental literature search was conducted for the articles applying or reporting individual QAIs or EGSs, published between January 2000 and August 2005. This supplemental search was run on PubMed, MEDLINE, Embase, BIOSIS Previews, and The Cochrane Library. While updating the report, the two original search strategies were modified and validated for searching the relevant literature for both reviews and articles containing individual tools between 2005 and

Quality Assessment Tools Project Report v

September 2007. All searches were without language restriction. See Appendix B for the detailed search strategies. Two reviewers independently conducted the literature selection with agreement on the final articles included and excluded. Review articles were included if they systematically collected and evaluated existing QAIs for SRs, RCTs or OBSs and EGSs or contained a comparison or evaluation of at least two existing QAIs or EGSs. QAIs and EGSs were collected by types of study from the included review articles and reference lists of these existing QAIs and EGSs were created after removing duplicates. Additional QAIs and EGSs were collected from experts’ input and supplemental literature search. Potential tools were identified among existing generic checklists or scales of QAIs for SRs, RCTs, and OBSs (cohort and case-control), and generic EGSs if they were recommended in the review articles or had not been assessed before. Potential QAIs and EGSs identified were evaluated by using the AHRQ evaluation grids, which consist of domain criteria and elements created within each domain based on generally accepted standard epidemiologic methods. Strengths and weaknesses of different instruments or systems were shown in grids. The highest scoring QAIs and EGSs from each grid represented the proposed selections. Furthermore, the rigour of the development process, inter-rater reliability, and instructions provided as well as the length of tools, ease of use, and time consumed were considered for the instruments or systems with the same highest evaluation scores within each type of study. Any disagreement was resolved by group discussion. The entire process for the project was repeated in carrying out the update in 2007 to identify QAIs and EGSs between 2005 and September 2007. Results Overall, out of 4,126 citations retrieved from the original literature searches (3,006 in the search for review articles and 1,120 in the supplemental search), plus 122 references selected from expert consultations, 267 QAIs (57 for SRs, 94 for RCTs, 99 for OBSs, and 17 for multiple designs) and 60 EGSs were identified. Among these existing tools, 75 generic checklists and scales of QAIs (20 for SRs, 32 for RCTs, and 23 for OBSs) and 23 generic EGSs were identified using the pre-specified selection criteria as potential included instruments and were assessed using the AHRQ evaluation grids. Seven QAIs (two for SRs, three for RCTs, and two for OBSs) and 10 EGSs fully address all evaluation domains and obtained the same highest scores within each type of study. Among them, four QAIs (two for SRs, one for RCTs, and one for OBSs) and six EGSs were selected for further expert consultation after removing three less efficient instruments and four non-guideline systems respectively. With the input from experts and stakeholders, plus the consideration of CADTH needs, four checklists of QAIs and one EGS were chosen: • AMSTAR 2005 for SRs (unpublished) then AMSTAR 20072 • SIGN 50 2004 for RCTs3 • SIGN 50 2004 for cohort studies4 and case-control studies 5 • GRADE 2004 EGS6

Quality Assessment Tools Project Report vi

Above results are summarized in the following table:

Number of tools QAIs EGSs Total No. of

QAIs SRs RCTs OBSs Multiple

design Existing tools collected

Review articles 233 51 77 88 17 49 Expert input 21 4 9 8 0 7 Supplemental search 8 2 6 0 0 2 Check from QAT group

5 0 2 3 0 2

Total No. of collection

267 57 94 99 17 60

Potential tools evaluated 75 20 32 23 0 23 Tools with highest scores 7 2 3 2 0 10 Tools for expert consultations 4 2 1 1 0 6 Appropriate tools selected 3 1 1 1* 0 1

EGS = evidence grading system; OBS = observation study; QAI = quality assessment instrument; QAT = quality assessment tool; RCT = randomized controlled trial; SR = systematic review. *There are two methodological checklists provided for cohort and case-control studies respectively. No new QAIs and EGSs were identified through reviewing 1,601 citations retrieved from the updating literature searches (825 in the search for review articles and 776 in the supplemental search). Conclusions In the QAT project, a total of 267 existing QAIs and 60 existing EGSs were collected according to the type of study. Through the analyses and evaluation conducted by the QAT working group, four QAIs (one for each type of study) and one EGS were selected among those existing tools as the most appropriate ones for CADTH to use to make evidence-based recommendations on optimal drug prescribing and use. The project involved four separate literature searches, two rounds of combined analyses, two expert consultations, and one stakeholder consultation. Those selected QAIs and EGS have been applied systematically and consistently in CADTH optimal use projects to make evidence evaluation more transparent, and thus helped reviewers and expert panels more effectively translate evidence into comprehensive, reliable, and practical recommendations. We are confident that the work and selections of the QAT project are an important piece of CADTH’s evaluation methodology and will solidify our evaluation foundation.

Quality Assessment Tools Project Report 1

1 INTRODUCTION This report documents the work undertaken between 2005 and 2007 by the quality assessment tools (QAT) project. The goal of this project is to identify the most appropriate quality assessment tools for evaluating and grading evidence. The term “quality” means different things in different contexts. For example, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group defines quality as confidence in estimate. For this review we used the Agency for Healthcare Research and Quality (AHRQ) perspective of quality as “risk of bias” or “internal validity.” 1.1 Rationale for the QAT Project In March 2004, the Canadian Optimal Medication Prescribing and Utilization Service (COMPUS) was launched by the Canadian Coordinating Office for Health Technology Assessment (CCOHTA) — now the Canadian Agency for Drugs and Technologies in Health (CADTH) — as a service to federal, provincial, and territorial jurisdictions and other stakeholders. COMPUS was a nationally coordinated program, funded by Health Canada. The goal of COMPUS was to optimize drug-related health outcomes and the cost-effective use of drugs by identifying and promoting optimal drug prescribing and use. To meet this goal, CADTH required relevant and rigorously derived evidence-based information; however, the quality of scientific evidence varies, depending on the study design, and how it was conducted and analyzed. Thus, CADTH staff embarked on this project to identify the most helpful quality assessment tools. Quality assessment instruments (QAIs) are normally applied to transparently evaluate information from various types of studies, while evidence grading systems (EGSs) are applied to systematically rate the strength of a body of evidence. Choosing appropriate tools among the various existing QAIs and EGSs is not only an academic exercise, but the choice must also meet the practical needs, such as the efficiency of tool use. CADTH’s mission is to provide timely, relevant, and rigorously derived evidence-based information to decision-makers, and to support the decision-making processes. Building on this foundation, plus considering the study designs most commonly encountered in clinical research on drugs, CADTH used a systematic approach to identify the most appropriate QAIs for assessing the quality of systematic reviews (SRs), randomized controlled trials (RCTs) and observational studies (OBSs) (mainly cohort and case-control studies), as well as EGSs for rating the strength of a body of evidence. The tools selected by CADTH should have the highest scientific credibility, be user-friendly, and be supported by most experts and stakeholders. 1.2 Goals of this QAT Report The QAT project was originally conducted in 2005 and subsequently updated in 2007. A summary of the original QAT project was posted on the CADTH website (formerly known as CCOHTA) in December 2005 (Appendix A-1). Since that time, the work has been referenced in publications7,8, presented at conferences9-11(Appendix A-2 and Appendix A-3), and used to

Quality Assessment Tools Project Report 2

support the work of CADTH. To date, many requests for details about this project were received from outside organizations. Given the interest and the important role of quality assessment tools in technology assessment, the authors are sharing the approach they took to conduct this work, as well as their findings and conclusions. 1.3 Quality Assessment in Systematic Reviews of Scientific

Evidence Systematic reviews (SRs) represent a rigorous approach for compiling scientific evidence to answer specific research questions, and are increasingly being used to support evidence-based health care decisions. Compared with narrative reviews, one of the strengths of SRs is that they provide a measure of quality for each study included in the review by assessing how well the study was designed, conducted, and analyzed. Researchers place more emphasis on the results from studies with higher quality.1 The evidence showed that poor quality studies may bias pooled estimates in SRs and that synthesizing studies of varying quality together may bias the combined effect measures of interest.12 To take account of differences in study quality and the impact of those differences on interpretations of the scientific evidence, a direct and explicit approach to assessing study quality and rating strength of evidence in reviews is needed. 1.3.1 Assessing the quality of individual studies

Study quality usually refers to the internal validity of a study. It is defined in some research as the extent to which a study’s design, conduct, and analysis has minimized selection, measurement, and confounding biases.1 Different instruments used for assessing the quality of the same study can lead to different quality rankings.13 The type(s) of study design(s) being considered by researchers play an important role in the conduct of SRs, and the features that are important to assess differ by study types (e.g., SRs, RCTs, and OBSs). Therefore, “one size fits all” QAIs may be less efficient in measuring the quality of different types of study designs. QAIs were usually designed as components, checklists, or scales. Checklists and scales were more commonly used for rating study quality. 1 1.3.2 Grading the strength of a body of evidence

Grading the strength of a body of evidence, which incorporates judgments of study quality as one of several factors, is crucial for developing evidence-based clinical recommendations. Strength refers to the size of the estimated risk and its accompanying confidence intervals.1 Frameworks for grading the strength of a body of evidence are much less uniform than QAIs. Confidence in a recommendation is affected by three well-established attributes—quality, quantity, and consistency, which are defined as follows:

Quality is concerned with the quality of all relevant studies for a given topic. Quantity encompasses several aspects, such as the number of studies that have evaluated the question, the overall sample size across all of the studies, and the magnitude of the treatment effect, which is along the lines of ‘strength’ from causality assessment. Consistency is whether

Quality Assessment Tools Project Report 3

investigations with both similar and different study designs report similar findings and can be assessed only if numerous studies are done.1

EGSs should incorporate all three of these attributes. 1.4 Existing QAIs and EGSs Overall, the existing instruments varied widely.14Many QAIs are available as checklists and scales for RCTs and observational studies;15-17 however, fewer instruments have been developed specifically for assessing systematic reviews.1 Many instruments were modified from generic counterparts for topic-specific applications; and many were developed based on expert opinion rather than empirical research, and few amongst them used rigorous development techniques.1 Approaches for grading the strength of a body of evidence are rapidly evolving. The earliest approach to grade the level of evidence based on study design alone was published by the Canadian Task Force on the Periodic Health Examination in 1979.18 Since then, a number of derivative systems have been proposed and mainly used for clinical practice guidelines.1 Although the EGSs relying on a study design hierarchy are simple, and easy to understand and use, they are increasingly unacceptable because of the lack of consideration of quality, quantity, and consistency of primary studies. The framework of EGSs is much less uniform than QAIs, which complicates the selection of one or more EGSs to be applied.1 At the time of the QAT project initiation, the number of comprehensive evaluations of QAIs and EGSs was limited. Moher et al. 1995 identified 25 scales and nine checklists available for RCTs.16 Deeks et al. 2003 conducted a health technology assessment and identified a total of 194 QAIs available for non-randomized interventions.17 An international expert group critically appraised six prominent EGSs and concluded that all of them had important shortcomings, and that a new system should be developed to address the major limitations.19 A systematic review on evaluation tools, published by the AHRQ in 2002, identified and assessed 121 QAIs by different types of study designs, as well as 40 EGSs for grading the strength of a body of evidence.1 1.5 AHRQ Evidence Report1 AHRQ, a national agency in the United States, focuses its mission “to improve the quality, effectiveness, and appropriateness of clinical care by facilitating the translation of evidence-based research findings into clinical practice” through its Evidence-Based Practice Centers (EPCs).1 Building on an earlier report14 and contributions from collaborating experts from the public and private sectors, AHRQ systematically identified and examined tools to rate the strength of scientific evidence for use in making evidence-based health-care decisions. The evaluation results were presented in Evidence Report / Technology Assessment Number 47.

Such tools included:

• quality scales, quality checklists, and study design characteristics (components) for rating the quality of individual articles

Quality Assessment Tools Project Report 4

• methodologies for grading the strength of a body of scientific evidence; that is, an accumulation of many individual articles that address a common scientific issue.

Following is a brief summary of the AHRQ project: A MEDLINE search (1995 to June 2000) was conducted for relevant articles published in English on either rating the quality of individual studies or grading a body of scientific evidence. Information from existing bibliographies, members of a technical expert panel, EPCs, and review groups was also sought to supplement these sources. Four study quality grids were developed with methodological domains, accounting for differences in study designs — systematic reviews and meta-analyses, randomized controlled trials, observational studies, and diagnostic studies — to compare and characterize existing instruments for assessing the quality of individual studies. One evidence strength grid was developed to assess the systems for rating the strength of bodies of evidence. Overall, 121 existing tools (20 for systematic reviews, 49 for RCTs, 19 for observational studies, 18 for diagnostic test studies, and 40 for grading a body of evidence) were identified, compared, and evaluated in this report. Among them, 19 generic QAIs and seven EGSs that fully address their key quality domains were identified and recommended as starting point tools when conducting systematic reviews. The advice and assistance of international experts were solicited in the preliminary stages of this project, and the entire report was subjected to extensive peer review by experts in the field and AHRQ staff.

However, this report did not provide guidance on the specific QAIs and EGSs to use. Potential users were encouraged to consider the feasibility, ease of use, and likely applicability of QAIs and EGSs to their own particular projects. 1.6 Objective of the QAT Project The objective of this QAT project was to identify specific QAIs by various study designs and EGSs for CADTH. To accomplish this we built on the AHRQ evidence report No. 471 published in 2002 by: updating the review of QAIs and EGSs; and identifying the most appropriate QAIs and EGSs, that is the ones that were most feasible and efficient for CADTH work.

Quality Assessment Tools Project Report 5

2 METHODS 2.1 Overview of QAT Project Methods Accepting the findings from AHRQ evidence report No. 471and building upon this report, CADTH applied a systematic process to select QAIs for assessing the quality of SRs, RCTs, and OBSs (cohort and case-control studies) as well as to select an EGS. This project was originally conducted from January 2005 to October 2005 and subsequently updated from September to December 2007. The whole process was divided into 10 steps for the original project (Steps 1 to 10) plus two steps for updating (Steps 11 to 12). The overall road map of the project methods is presented in Figure 1, and the key concepts referred to in the AHRQ report and applied to this project are presented in Table1. Generally, the whole project was broken down into five main sections as follows. Within the first section, two comprehensive literature searches and reviews (QAT 1 and QAT 2) were conducted for collecting existing QAIs and EGSs respectively. • Collection of existing QAIs and EGSs evaluated in reviews, applied in study reports, and

raised from expert consultation (Steps 2 to 5). o QAT 1 focused on groups of QAIs and EGSs (more than one at least for each group)

evaluated in existing review articles (Step 2). o QAT 2 focused on individual QAIs and EGSs used in specific study reports

(Step 5). • Identification of potential QAIs and EGSs from existing ones for further evaluation

(Step 6). • Evaluation of the potential QAIs and EGSs identified using AHRQ evaluation grids

(Step 7). • Consultation with panel experts and stakeholders on QAIs and EGSs selected

(Steps 8 to 10). • Formal updating of the project following the same process (Steps 11 to 12).

Quality Assessment Tools Project Report 6

Figure 1: Road Map of QAT Project Methods

Assemble QAT working group

QAT 1: Search and select review articles (2000 to Feb. 2005)

Second round combined analysis: Identify QAIs and EGSs for evaluation

Evaluation: Evaluate potential QAIs and EGSs identified

First round combined analysis: Identify QAIs and EGSs from review articles

First round expert consultation: input from experts on the collection of QAT 1

QAT 2: Search and select individual QAIs and EGSs (2000 to Aug. 2005)

Second round expert consultation: Collect experts’ comments on evaluation results

Choose appropriate QAIs and EGSs for COMPUS

Stakeholder consultation: Collect stakeholders’ comments on the choices

Updating QAT 1: Select review articles (2005 to Sept. 2007)

Updating QAT 2: Select individual QAIs and

EGSs (2005 to Sept. 2007)

Collect existing QAIs and EGSs

Evaluate potential QAIs and EGSs

Consult on QAIs and EGSs selected

Update QAIs and EGSs

Identify potential QAIs and EGSs

Step 1

Step 2

Step 10

Step 11

Step 12

Step 8

Step 3

Step 4

Step 5

Step 6

Step 7

Step 9

Quality Assessment Tools Project Report 7

Table 1: Key Concepts Referred to in the AHRQ Report1 and Applied to the QAT Project Concepts Definitions

Generic instrument Instrument could be used to assess quality of any study of the type considered on that grid (page 331).

Specific instrument Instrument is designed to be used to assess study quality for a particular type of outcome, intervention, exposure, test, etc. (page 331).

Type of instrument

Scale Instruments that contain several quality items that are scored numerically to provide a quantitative estimate of overall study quality (page 331).

Checklist Instruments that contain a number of quality items, none of which is scored numerically (page 331).

Component Individual aspect of study methodology (e.g., randomization, blinding, follow-up) that has a potential relation to bias in the estimation of effect (page 331).

Guidance document

Publication in which study quality is defined or described, but does not provide an instrument that could be used for evaluative applications (page 331).

Domain of study methodology

A domain of study methodology or execution reflects factors to be considered in assessing the extent to which the study’s results are reliable or valid (i.e., study quality). Each domain has specific “elements” that one might use in determining whether a particular instrument assessed that domain; in some cases, only one element defines a domain (page 341).

Domains for rating the overall strength of a body of evidence

Quality The quality of all relevant studies for a given topic, where “quality” is defined as the extent to which a study’s design, conduct, and analysis has minimized selection, measurement, and confounding biases (page 421).

Quantity

The construct “quantity” refers to the extent to which there is a relationship between the technology (or exposure) being evaluated and the outcome, as well as to the amount of information supporting that relationship. Three main factors contribute to quantity: • the magnitude of treatment effect • the number of studies that have evaluated the given topic • the overall sample size across all included studies (page 421).

Consistency Consistency is the degree to which a body of scientific evidence is in agreement with itself and with outside information. More specifically, a body of evidence is said to be consistent when numerous studies done in different populations using different study designs to measure the same relationship produce essentially similar or compatible results (page 431). For any given topic, it refers to the extent to which similar findings are reported from work using similar and different study designs (page 421).

AHRQ = Agency for Healthcare Research and Quality; QAT = Quality Assessment Tool.

Quality Assessment Tools Project Report 8

2.2 Details of QAT Project Methods Step 1: Assemble QAT working group The working group was comprised of internal researchers (AB, VS, and SV), information specialists (GB and HF), a methodology expert (GW), and external researchers (SM, KD and RL). The group was mandated to direct and supervise the process of the selection and evaluation of QAIs and EGSs. Group members took charge of different roles, mainly literature search, as well as the collection and evaluation of QAIs and EGSs, and consultation. Step 2: Search and select review articles (QAT 1: 2000 to February 2005) To achieve the initial identification of QAIs and EGSs, review articles were targeted if they formally evaluated existing QAIs or EGSs. Formal evaluation was defined as the existing QAIs or EGSs were reviewed with domains, validity, and reliability.

Search strategy Building upon the original search strategy used in the AHRQ report for literature dating from 1995 to June of 2000, and taking into account the recommendations of the AHRQ team, a sensitive search for systematic review articles of QAIs and EGSs was designed and carried out by an Information Specialist (GB). Published literature was identified by searching the following bibliographic databases: PubMed; MEDLINE; Embase; BIOSIS Previews; and The Cochrane Library. CADTH’s methodological filter was applied to limit retrieval to systematic reviews. The search was also limited to documents published between January 2000 and February 2005 without language restrictions. Targeted hand and grey literature searches were also conducted. The QAT 1 search strategy is presented in Appendix B-1. Selection criteria Specifically, the review articles were included for further analysis if they systematically collected and evaluated existing QAIs for SRs/meta-analyses, RCTs or OBSs, and EGSs; or if they contained a comparison or evaluation of at least two existing QAIs or EGSs. To avoid missing any relevant review articles, first level selection included the articles regarding the principles, methods and tools of methodological quality assessment and critical appraisal, and articles generally introducing evidence-based medicine, methodological issues, levels of evidence, research designs, biases, and development of evidence-based practice guidelines. At second level selection, the articles were excluded if they only provided general methodological knowledge of evidence-based medicine, critical appraisal, SRs, and guidelines; or if they introduced only one individual QAI or EGS. The detailed selection criteria of QAT 1 are presented in Appendix C-1. Selection methods Two reviewers (AB and VS) independently screened the titles and abstracts at the first level selection. The full texts of potentially relevant articles identified by either reviewer were retrieved and reviewed at the second level. Two reviewers discussed to resolve any disagreements of selection at the second level. If consensus could not be reached between them, the conflict was resolved by the third party (GW).

Quality Assessment Tools Project Report 9

Potentially relevant non-English full articles were screened by one reviewer using three questions (i.e., Did the article have more than one QAI or EGS included? Did the article compare QA tools? Was the article a systematic review?), which were consistent with the inclusion criteria. The full articles that passed the screening were translated into English and English-version full articles were independently reviewed by the two reviewers. The screening questions are presented in Appendix C-2. Step 3: Identify QAIs and EGSs from review articles (first round combined analysis) The key components of all included review articles were extracted and tabulated by the number and type of QAIs evaluated and the number of EGSs, search date and database, evaluation domains, recommendations, etc. Individual QAIs and EGSs were identified and collected by one reviewer (AB) from the included review articles. This extraction was checked twice by other two reviewers (SM and KD) respectively. All extracted QAIs and EGSs were listed in one table by study design; that is, QAIs for SRs, RCTs and OBSs/non-randomized studies, multiple design type studies, and EGSs. Duplicates of QAIs or EGSs were identified within each type of study. Reference lists of QAIs collected by the type of study and EGS were created after removing duplicates. Step 4: Conduct initial expert consultation (first round) Based on the search experience of the AHRQ report and QAT 1, it was determined that any search for individual QAIs and EGSs was unlikely to be comprehensive. Therefore, besides a sensitive search, external experts were consulted to ensure comprehensiveness. The consulted experts included all lead authors of the included review articles of QAT1, the expert panel of AHRQ evidence report No.47,1 and experts who worked in the evidence evaluation methodology field and who were recommended by the QAT working group and COMPUS advisory committee. A package was sent to Canadian and international experts identified to ensure that important review articles on QAT and/or EGS were not missed. The package included a covering letter; an introduction to the CADTH optimal use program; and details about the QAT project, search strategy of and selection of review articles of interest. Experts were asked to help in identifying relevant review articles, other than those included, as well as individual QAIs and EGSs. The input from experts was tabulated by study designs and then compared with the reference lists of existing QAIs and EGSs derived from the included review articles of QAT 1 (SV). The table was checked (RL) to ensure the all experts’ input had been fully considered. The additional full articles identified from experts’ input were reviewed by two reviewers (AB and VS) and added to the existing reference lists if they met the selection criteria of QAT 1 or provided individual generic QAIs or EGSs.

Quality Assessment Tools Project Report 10

Step 5: Search and select individual QAIs and EGSs (QAT 2: 2000 to August 2005) A second literature search was conducted to capture individual QAIs and EGSs published after 2000, as a supplement of the collection from the review articles and experts’ input. Search strategy A highly specific supplemental literature search was conducted (GB) for the articles applying or reporting individual QAIs or EGSs. Published literature was identified by searching the following bibliographic databases: PubMed and The Cochrane Library including the methodology register. The search was limited to documents published between 2000 and August 2005, without language restriction. See Appendix B-2 for the detailed QAT 2 search strategy.

Selection criteria Generic checklists or scales of QAIs for SRs, RCTs, non-RCTs, OBSs (cohort and case-control) and generic EGSs were collected from various types of publications. Considering the further application and completeness of quality assessment, instruments for specific use and quality assessment components were excluded respectively. Guidance documents, such as QUOROM for SRs and CONSORT for RCTs, were also excluded because they have not been developed as tools for assessing the quality of individual studies per se. Selection criteria are presented in Appendix C-3. Selection methods First stage selection: The titles and abstracts were screened by two reviewers (AB and VS) independently at the first level selection, and potentially relevant ones that contained individual QAIs or EGSs were identified by either of the reviewers. The full text of potentially relevant articles was retrieved and independently reviewed by the same two reviewers at the second level screening to identify QAIs and EGSs of interest, presented in references, tables, or appendices. Any disagreement was resolved by consensus between the two reviewers or with the judgment from a third party (GW). Non-English full articles were reviewed by the persons who knew those languages. Second stage selection: An additional selection stage was set up for further checking the QAIs and EGSs identified at the second level selection. The individual QAIs and EGSs of interest directly shown in the included articles were reviewed by two reviewers (AB and VS) to identify generic QAIs and EGSs, then they were compared with the existing reference lists derived from QAT 1 and expert consultation to remove duplicates (SV). If included articles only provided the references of QAIs and EGSs of interest, the references were extracted in one Excel table and compared with the existing reference lists by one person (SV) and checked by another person (KD). The full articles of the references not listed were retrieved and reviewed by the two reviewers (AB and VS). All generic checklists or scales of QAIs and generic EGSs identified were added to the existing reference lists for further analysis. Step 6: Identify potential QAIs and EGSs for evaluation (second round combined analysis) Potential QAIs and EGSs, namely ones recommended in review articles plus generic checklists or scales of QAIs and generic EGSs not assessed before, were identified for further evaluation by

Quality Assessment Tools Project Report 11

types of study. The identification was conducted by one reviewer (AB) and checked by another person (KD) as follows: • QAIs and EGSs recommended by any review articles were included in the further evaluation.

If any tools overlapped between reviews, AHRQ evaluation results were referred to first before the recommendations from the other reviews where considered. If there was a big difference in the criteria of the evaluation and recommendation between the AHRQ report and other review articles, a further selection of the tools contained in the other reviews was conducted based on the criteria modified by the working group.

• If some review articles only evaluated QAIs and EGSs, but did not provide any recommendation information, tools from those reviews were included for further evaluation after confirming that they were not covered by the reviews with recommendations, regardless of which tools were recommended.

• Generic checklists or scales of QAIs and generic EGSs identified by expert consultation and QAT 2 were included for further evaluation as they were not assessed by any review article.

Full articles of the potential QAIs and EGSs collected from reviews other than the AHRQ report were retrieved. Two reviewers (AB and VS) independently checked them using the selection criteria of QAT 2 to ensure that generic checklists or scales of QAIs and generic EGSs were included in the further evaluation. Step 7: Evaluate potential QAIs and EGSs identified Briefly, the potential QAIs and EGSs identified above were evaluated by the types of study using the AHRQ evaluation grids, including domain criteria and elements created within each domain. Strengths and weaknesses of different instruments or systems were shown in grids, with columns denoting the evaluation domains of interest and the rows for the individual tools. Each of the QAIs and EGSs was awarded either a "yes" ( fully addressed a domain), "partial" ( addressed a domain to some extent), or a "no" ( did not deal with a domain) for each domain.1 The domains applied in the QAT project are presented in Table 2.

Table 2: AHRQ Evaluation Grids1 Applied in the QAT Project by Types of Study Type

of Study

Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 6 Domain 7

QAI for SR

research questions

search strategy* in/exclusion criteria

data extraction

study quality / validity*

data synthesis*

funding*

QAI for

RCT

study population

randomization* blinding* interventions outcomes

statistical analysis*

funding*

QAI for

OBS

compara-bility

of subjects*

exposure / intervention

outcome measure

statistical analysis

funding* NA NA

EGS quality quantity consistency NA NA NA NA AHRQ = Agency for Healthcare Research and Quality; EGS = evidence grading system; NA = not available; OBS = observational study; QAI = quality assessment instrument; QAT = quality assessment tool; RCT = randomized controlled trial; SR = systematic review. *Domains containing empirical elements.

Quality Assessment Tools Project Report 12

The details of the evaluation grids (domains and elements) and evaluation rules were presented in the methods section of the AHRQ report, and a brief summary is presented in the following paragraph. The majority of domains and their elements were based on standard “good practice” epidemiologic methods that were generally accepted. Methodological research has proved that elements with a demonstrable basis in empirical research can affect the conduct and analysis of a study.1If domains have empirical elements, they were generally assigned more weight (see domains* in Table 2). For example, the domain of randomization contains three empirical elements (adequate sequence generation, adequate concealment, and baseline similarity) for RCTs. Most of the domains in Table 2 had at least one element identified as essential to be considered for judging whether instruments fully cover that domain or not; for example, the element adequate concealment method for the domain randomization. For domains with multiple elements, a “yes” rating was required to consider whether that instrument had to address some specified elements or a majority of elements. The remaining domains presented in Table 2 (i.e., those not marked by an asterisk) contain elements that were derived from best practices, and were considered critical for study design, but had not been tested in real life. Specifically to the three domains for EGSs, quality included only one element based on methodological rigour, the extent to which bias was minimized; quantity combined three elements — numbers of studies, sample size or power, and the magnitude of effect with a full “yes” requiring two of the three elements to be covered; consistency had only one element for a summary finding from more than one study reviewed, with a dichotomous “yes” or “no” indicating whether a system took consistency into account in its view of the strength of evidence. All of these domains with either empirical or best practice elements were applied in the AHRQ report as the criteria to identify the acceptable QAIs and EGSs with confidence and without major modifications for the current use.1 The evaluation results of the QAIs and EGSs recommended in the AHRQ report were directly applied in this project. In order to be consistent with the AHRQ evaluation, two reviewers (SM and AB) randomly selected one QAI for each type of study and one EGS from the AHRQ recommendations to repeat the evaluation. After resolving the discrepancy between the AHRQ evaluation and the two reviewers, one potential instrument identified from other than the AHRQ report for each type of study was randomly selected for the evaluation conducted by two reviewers independently. After reaching consensus between them, one reviewer (SM) evaluated the rest of potential QAIs and EGSs identified and the other reviewer (AB) checked the evaluation results. Any disagreement between the two reviewers was solved by group discussion (SM, AB, VS, and GW). The highest scoring QAIs and EGSs from each grid represented the proposed selections. To identify more appropriate tools for use, the descriptive information of tools provided in the AHRQ report was also referred to when further considering the instruments or systems with the same highest evaluation scores within each type of study, including rigour of development process, inter-rater reliability, instructions provided, etc. Moreover, the length of tools, ease of use, and time consumed for assessing one article were considered. After removing relatively unpractical ones, the choices of tools were proposed.

Quality Assessment Tools Project Report 13

Step 8: Conduct the second expert consultation (second round) The QAIs for each type of study and EGSs, which had highest scoring and were also judged more practical for use, were sent to experts who responded to the initial expert consultation for their review and comments. The package for this expert consultation included a covering letter, a brief description of the quality assessment review work and request for expert advice for the selection of instrument and EGS, and full details of the QAI and EGS that were shortlisted. The feedback from experts was collated (SV), summarized (VS), and discussed in the working group. Step 9: Choose QAIs and EGSs for CADTH (final choice) Based on experts’ input, the QAT working group finalized the QAIs and EGSs proposed, including some modifications. Step 10: Conduct a stakeholder consultation The proposed QAIs and EGSs were posted on the CADTH website, along with an online feedback form, to encourage stakeholder input from all interested parties. Feedback was reviewed and considered by the QAT working group. Step 11: Updating QAT 1 Since the original QAT project was completed in October 2005, updating was conducted in 2007 to identify QAIs and EGSs between 2005 to September 2007. Search strategy To update the report, the original broad QAT 1 search strategy was tightened by removing extraneous terminology (HF). The modified, focused search was validated by re-running it for the original search period between 2000 and February 2005 and ensuring all reviews included from the original search were retrieved. The focused strategy to update the QAT 1 search between 2005 and September 2007 is presented in Appendix B-3, and the validation of it is presented in Appendix B-4. Selection criteria and methods Two reviewers (KD and RL) independently conducted a two-level literature selection using the same selection criteria and following the same selection methods as those in the original selection of QAT 1, including retrieving non-English articles. Any disagreement was resolved by judgment from a third party (AB or VS). Identification of new QAIs and EGSs for further evaluation One reviewer (KD) compared all QAIs and EGSs contained in the included review articles published between 2005 and September 2007 with the existing reference lists of QATs and EGSs obtained in the original QAT. All new references, that is, ones not on the existing lists, were added by study design. Selection and evaluation of new QAIs and EGSs The full texts of the new references published in 2005 or later were retrieved and reviewed by two reviewers (AB and VS) independently to identify generic checklists or scales of QAIs and

Quality Assessment Tools Project Report 14

generic EGSs for further evaluation, following the same evaluation methods. Any disagreement was resolved by discussion and consensus between the two reviewers. The assessment results were added to the original evaluation grids and the final selection results of the QAT project were updated where necessary. Step 12: Updating QAT 2 Search strategy To update the report, the original QAT 2 search was re-run in PubMed, The Cochrane Library and the original search syntax was adjusted for the Embase and BIOSIS Previews databases to be searched on a different platform (OVID instead of Dialog). The updated QAT 2 search was validated by re-running it for the original search period between 2000 and August 2005 with similar search results in PubMed, Embase, and Biosis Previews, but there was a big discrepancy in Cochrane results (HF). After contacting Cochrane, the discrepancy was determined to be due to retroactively populating The Cochrane database. Considering that many published articles could be retrieved from both Cochrane and other databases and that the Cochrane database includes many grey literature items, it was decided to exclude the Cochrane database in updating the search (2005 to September 2007). The revised search strategy is presented in Appendix B-5 and the detailed validation is shown in Appendix B-6. Selection criteria and methods Two reviewers (KD and RL) independently conducted a two-level literature selection using the same selection criteria as those in the original QAT 2. The selection methods were slightly different from those in the original QATs due to time limits and consideration of comprehensiveness; and the inclusion from either reviewer at the second level selection was considered in the second stage, that is, there was no consensus on the discrepancy on the second level selection results. Identification of new QAIs and EGSs for further evaluation Due to tight timeline, the tables, appendices, or references of individual tools identified by either reviewer in the full articles included at the second level selection were compared with each other to remove duplicates. Then, the unique new references were compared with the existing references and the tables or appendices were checked using the selection criteria of original QAT 2 to identify new individual QAIs and EGSs (RL or KD). Any questions or confusion were resolved by judgment from the third party (VS or AB). New references were grouped by study designs. Selection and evaluation of new QAIs and EGSs Similar as updating QAT 1, the full articles of the new references published in 2005 or later were retrieved and reviewed. The new generic QAIs and EGSs were evaluated and their references were added to the existing reference lists. The evaluation results of the QAT project were updated.

Quality Assessment Tools Project Report 15

3 Results This part of the report documents the results of the QAT project in six sections, including a summary of the results and detailed findings corresponding to five main sections described in the methods; that is, existing QAIs and EGSs collected, potential QAIs and EGSs identified, potential QAIs and EGSs evaluated, and final choices of the original QAT project, plus new QAIs and EGSs from carrying out an update of the search. 3.1 Summary of QAT Project Results Overall, out of 4,126 citations retrieved from the original literature searches (3,006 in QAT 1 and 1,120 in QAT 2) plus 122 references selected from expert panel consultations, 267 QAIs [57 for SRs, 94 for RCTs, 99 for OBSs, 17 for multiple designs] and 60 EGSs were identified. After applying the inclusion/exclusion criteria for further review, 192 QAIs and 37 EGSs were excluded for various reasons. The remaining 75 QAIs (20 for SRs, 32 for RCTs, and 23 for OBSs) and 23 EGSs were assessed using the AHRQ evaluation grids (Appendix F-1 to Appendix F-4). After the second round of expert consultation and stakeholder input, four QAIs and one EGS were chosen as appropriate tools: AMSTAR 2005 for SRs2, SIGN 50 2004 for RCTs3, cohorts4 and case-controls,5 and GRADE 2004 for EGS6. No new QAIs and EGSs were identified through reviewing 1,601 citations retrieved from the updating literature searches (825 in updating QAT 1 and 776 in updating QAT 2). Figure 2 shows the result summary on the road map of QAT project methods.

Quality Assessment Tools Project Report 16

Figure 2: Summary of QAT Project Results

QAIs and EGS with highest scoring: AMSTAR 2005 for SR SIGN 50 checklist for RCT SIGN 50 checklist for Cohort SIGN 50 checklist for case-control GRADE 2004 as EGS

QAT 2 (2000 to Aug. 2005)

1,120 citations

8 QAIs (2 for SRs and 6 for RCTs) and 2 EGS

Second round combined analysis

75 QAIs identified for evaluation (20 for SRs, 32 for RCTs, and

23 for OBSs) and 23 EGSs

5 QAIs (2 for RCTs and 3 for OBSs) and 2 EGS identified as additional tools

Evaluation

Second round expert

consultation

Final choice

Stakeholder consultation

Updating QAT 1 (2005 to Sept. 2007)

Updating QAT 2

(2005 to Sept. 2007)

825 citations

776 citations

7 reviews articles 0 new

QAI and EGS 32

references

Step 1

Step 2

Step 10

Step 11

Step 12

Step 8

Step 3

Step 4

Step 5

Step 6

Step 7

Step 9

QAT working group

QAT 1 (2000 to Feb. 2005)

First round combined analysis

First round expert consultation

3,006 citations

9 review articles

233 QAIs (51 for SRs, 77 for RCTs, 88 for OBSs, 17 for multiple) and 49 EGSs

17 of 122 references included for 21 QAIs (4 for SRs, 9 for RCTs, and

8 for OBSs) and 7 EGSs

Internal reviewers, information specialists, and external methodologists

Quality Assessment Tools Project Report 17

3.2 Collection of existing QAIs and EGSs Existing QAIs and EGSs were collected through multiple sources to ensure comprehensiveness. 3.2.1 Collection of existing QAIs and EGSs from review articles

Review articles were the most important information source to collect existing QAIs and EGSs in the QAT project. a) Searched and selected the review articles containing the evaluation of QAIs and

EGSs (QAT 1: 2000 to February 2005 — Step 2) Overall, 3,002 citations were identified by a sensitive search for systematic reviews of QAIs and EGSs between the years 2000 and 2005 (2,988 from electronic database searching and 14 from grey literature searching). Among them, 141 potentially relevant citations were identified for further selection. Eleven reviews were included based on the selection criteria: seven reviews identified from 141 full articles and four identified from reference lists. After removing two partial duplicates, nine of 11 review articles were selected by the working group for further analysis.1,17,19-25 A QUOROM flow chart and an inclusion list of the review articles are presented in Appendix D-1 and D-2 respectively. A total of 134 of 141 full texts were excluded mainly due to the following reasons: general knowledge of evidence-based medicine, critical appraisal, systematic reviews, and meta-analysis; introduction, development, or application of one individual QAI and EGS; methodological issues; and development of guideline approaches or assessment of guidelines. The main exclusion reasons are presented in the above mentioned QUOROM flow chart (Appendix D-1). In addition, 18 non-English articles among the 141 were excluded by screening mainly due to zero or less than two QAIs or EGSs addressed, and no comparison of tools included. These articles were in eight languages: French, German, Portuguese, Italian, Spanish, Dutch, Polish, and Japanese. b) Identified QAIs and EGSs from the included review articles (First round combined

analysis — Step 3) Nine included review articles consisted of five systematic reviews,1,17,20,21,24 including the AHRQ report1, and four narrative reviews.19,22,23,25 They were all published between 2001 and 2005; however, the search ending dates of three of the SRs17,21,24 were 1999, before the June 2000 searching end date of the AHRQ report1. One systematic review20 published in 2004 reported that the search of the AHRQ report was used as its starting point; however, it did not provide its own search ending date. Three of the other four reviews22,23,25 contained QAIs or EGSs up until or before 2000, while one of the four19 evaluated EGSs published in 2001. The numbers of QAIs and EGSs collected or evaluated ranged from three to 182 in those nine reviews, and study types of QAIs included SRs, RCTs, non-randomized studies, OBSs, and diagnostic test studies. The important difference among these reviews is that the selection criteria and evaluation methods varied.

Quality Assessment Tools Project Report 18

All QAIs and EGSs presented and evaluated in these nine review articles were extracted into five excel tables and grouped by study design, except for those in Deeks’ report.17 Deeks’ report17 evaluated 182 instruments for non-randomized intervention studies and selected 60 of them as top ones given that they covered at least five of the six internal validity domains (creation of treatment group, blinding, soundness of information, follow-up, analysis-comparability, and analysis-outcome). To focus on “appropriate” instruments, only these top 60 were extracted from Deeks’ report. Extracted QAIs and EGSs are referred to by the principle author, by the organization, or by the instrument name. Their original reference numbers assigned in the review articles were kept in the tables. The detailed extraction tables are not presented in this report. Table 3 shows the collection summary, including all tools collected from the AHRQ report and additional tools collected from the other reviews after removing duplicates.

Table 3: Summary of Existing QAIs and EGSs Collected from the Nine Review Articles Information source

Review articles QAIs collected EGSs collected First author and

published date Total no. of

tools contained SRs RCTs OBSs Multiple

design QAIs EGSs

Main source West 20021 (AHRQ report)

88* 40 20 49 19 NA 40

Additional source

Brouwers et al. 200525

3 NA NA 0 NA NA NA

Atkins et al. 200419

NA 6 NA NA NA NA 3

Katrak et al. 200420

103 NA 13 16 7 17 NA

Deeks et al. 200317†

182 NA NA 5 49 NA NA

Saunders et al. 200321

18 NA NA NA 17 NA NA

Colle et al. 200223 16 NA NA 8 NA NA NA Liberati et al. 200122

NA 9 NA NA NA NA 7

Shea et al. 200124 24 NA 21 NA NA NA NA Duplicates between the eight reviews 3 1 4 NA 1

Total No. of Collection 51 77 88 17 49

AHRQ = Agency for Healthcare Research and Evaluation; EGS = evidence grading system; NA = not available, information not provided; OBS = observational study; QAI = quality assessment instrument RCT = randomized controlled trial; SR = systematic review. *QAIs for multiple designs study types were counted more than once in the AHRQ report. 1 †Only top 60 QAIs were collected from Deeks’ report. 17 In summary, 233 QAIs (51 for SRs, 77 for RCTs, 88 for OBSs, and 17 for multiple types of study designs) and 49 EGSs were collected from the nine review articles after eliminating duplicates within each type of study. Other than the AHRQ report, the eight reviews provided a total of 145 additional QAIs (31 for SRs, 28 for RCTs, 69 for OBSs, and 17 for multiple types of studies) and nine EGSs. Among 145 instruments and nine systems, those published in or after the year 2000 were four for SRs (13% of 31 additional QAIs), 12 for RCTs (43%), five for OBSs (7%), six for multiple types of studies (35%), and two EGSs (22%).

Quality Assessment Tools Project Report 19

Correspondingly, the reference lists of all QAIs collected by each type of study and EGSs were created and are presented in Appendix E. Links between the references of individual tools and the nine review articles are shown in the “sources” column using the format: The order number of the included review article in the inclusion list of QAT 1 (Appendix D-2) - original reference number of the tool in the review article. For example, the first one of the QAI for SR is 9-4, which means Reference No.4 of the AHRQ report. 3.2.2 Conducted initial expert consultation (First round expert consultation —

Step 4)

A total of 34 experts were consulted in eight countries, including nine primary authors of the included review articles of QAT 1,1,17,19-25 10 panel members of the AHRQ report,1 and the remaining recommended by the QAT working group, COMPUS advisory committee, and researchers. Nine of 34 experts provided input. There were 140 references regarding the relevant reviews and individual QAIs and EGSs received from nine experts. After removing 18 duplicates from 140 references, 122 unique references were compared with the reference lists of existing QAIs and EGSs collected from QAT 1; 97 references were identified that were not on the lists. The abstracts or full articles of 97 references were reviewed, and 80 failed to meet the selection criteria of QAT 2. Finally, 17 references2,6,26-40 were included to provide 21 instruments (four for SRs, nine for RCTs, and eight for OBSs) as well as seven EGSs, and added on to the reference lists of existing QAIs and EGSs as the potential QAIs and EGSs for further evaluation (Appendix E). 3.2.3 Searched and selected individual QAIs and EGSs (QAT 2: 2000 to August

2005) (Step — 5)

There were 1,120 citations screened at the title and abstract level and of those, 445 full articles were reviewed. Overall, 238 of 445 full articles were found containing individual QAIs or EGSs — 416 references and 32 tables or appendices of QAIs and EGSs were identified in the 238 articles. After comparing with the updated reference lists of existing QAIs and EGSs from QAT 1 and expert consultation and removing duplicates, 81 new references were identified and their full texts were retrieved and reviewed at the second stage selection of QAT 2 to produce three additional QAIs for RCTs and one additional EGS. A total of 32 tables or appendices were also reviewed to produce five QAIs (three for RCTs and two for SRs). One Canadian EGS was also added for further consideration by the QAT working group.40 The main exclusion reasons were no QAI or EGS provided, tools for specific use, or quality assessment component instruments. At the end, eight new QAIs (two for SRs and six for RCTs) and two new EGSs were collected and added to the reference lists as potential tools for further evaluation (Appendix E).40-49 A QUOROM flow chart for QAT 2, with a summary of exclusion reasons, is presented in Appendix D-3.

Overall, a total of 262 existing QAIs (57 for SRs, 92 for RCTs, 96 for OBSs, and 17 for multiple designs) and 58 existing EGSs were collected through the review articles, expert consultation, and additional literature search for individual tools. The summary of the overall collection is shown in Table 4 below, and corresponding references are presented in Appendix E.

Quality Assessment Tools Project Report 20

Table 4: Summary of Overall Collection of Existing QAIs and EGSs Information source Number of existing QAIs collected No. of existing

EGSs collected

Total For SRs

For RCTs For OBSs For multiple design

Review articles (QAT 1) 233 51 77 88 17 49 Expert input 21 4 9 8 0 7 Additional search (QAT 2)

8 2 6 0 0 2

Total 262 57 92 96 17 58

EGS = evidence grading system; OBS = observational study; QAI = quality assessment instrument; QAT = quality assessment tool; RCT = randomized controlled trial; SR = systematic review.

3.3 Identification of potential QAIs and EGSs for evaluation (Second round combined analysis — Step 6)

The variety of the selection criteria and evaluation methods of the nine review articles led to the collection of various existing QAIs and EGSs, including generic or specific ones, checklists, scales, components, guidance, and even some tools not used for assessing internal validity of studies. Focusing on generic QAIs (checklists or scales) and generic EGSs, and following the rules described in the methods, the existing ones collected were reviewed and the potential ones among them (i.e., ones recommended in review articles and generic ones not assessed before) were identified for further evaluation as follows. The identification was conducted among the existing tools collected from the nine review articles. In particular, additional selection was tailored and conducted in some of the following review reports. The AHRQ report1: The AHRQ report recommended 19 instruments (five checklists or scales for SRs,50-54 eight for RCTs, 55-62 six for OBSs56,58,59,63-65), and seven EGSs.66-72 All of these recommended tools were identified as potential instruments. Katrak et al. report20: This report did not recommend any instruments or systems. After removing duplicates with the other eight reviews, 31 unique references were identified, and their full texts were subsequently retrieved and reviewed. Nine of 31 references were included, while 22 references were excluded with reasons; for example, specific instruments, guidance, or misclassification of study designs. Out of the nine included references, one provided two full articles containing multiple design type instruments73 and EGSs74 respectively. Another reference provided four full articles containing instruments for SRs,75 RCTs,76 cohort,77 and case-control studies78 respectively. The remaining seven references provided seven full articles.66,79-84 All these included full articles provided 17 QAIs (four for SRs,73,75,83,84 eight for RCTs,66,73,76,79,81-84 and five for OBSs66,73,77,78,82,83 (one instrument for cohort77and one instrument for case-control study78 from the same organization were counted as one QAI for OBSs) and two EGSs.74,80 The detailed selection results are not presented in this report.

Quality Assessment Tools Project Report 21

Table 5: Summary of Potential QAIs and EGSs Identified for Further Evaluation Information

source Existing QAIs collected Potential QAIs identified Existing

EGSs collected

Potential EGSs

identified Total No.

SRs RCTs OBSs Multiple design

Total No.

SRs RCTs OBSs

Review articles (QAT 1) West 20021 (AHRQ report)

88 20 49 19 NA 19 5 8 6 40 7

Brouwers et al. 200525

0 NA 0 NA NA 0 NA 0 NA NA NA

Atkins et al. 200419 NA NA NA NA NA NA NA NA NA 3 2 Katrak et al. 200420 53 13 16 7 17 17 4 8 5 NA 2 Deeks et al. 200317#

54 NA 5 49 NA 4 NA 0 4 NA NA

Saunders et al. 200321

17 NA NA 17 NA 0 NA NA 0 NA NA

Colle et al. 200223 8 NA 8 NA NA 1 NA 1 NA NA NA Liberati et al. 200122

NA NA NA NA NA NA NA NA NA 7 3

Shea et al. 200124 21 21 NA NA NA 5 5 NA NA NA NA Duplicates between eight reviews

8 3 1 4 0 NA NA NA NA 1 NA

Subtotal 233 51 77 88 17 46 14 17 15 49 14 Expert input 21 4 9 8 0 21 4 9 8 7 7 Additional search for individual tools (QAT 2)

8 2 6 0 0 8 2 6 0 2 2

Total 262 57 92 96 17 75 20 32 23 58 23

AHRQ = Agency for Healthcare Research and Quality; EGS = evidence grading system; NA = not available, not provided; OBS = observation study; QAI = quality assessment instrument; QAT = quality assessment tool; RCT = randomized controlled trial; SR = systematic review.

Quality Assessment Tools Project Report 22

Shea’s report24: This report evaluated 24 instruments for SRs, without any recommendation. Comparing the evaluation domains used in this report with those of the AHRQ report, six common domains were identified: objective, description of the searching method, selection, validity assessment, data abstraction, and quantitative data analysis. Seven instruments that satisfied at least five out of the six above domains were considered as recommended ones. After checking whether they were generic checklists or scales, five instruments were identified for further evaluation.85-89 The detailed selection results are not presented in this report. Deeks’ report17: Among 60 top instruments assessed in this report, 14 were considered the best ones, meeting at least three of the four core items of internal validity domains (how allocation occurred, any attempt to balance groups by design, identification of prognostic factors, and case-mix adjustment). The best 14 were included as recommended tools. After removing duplicates with the AHRQ report, four instruments were identified for OBSs for further evaluation.90-93 Through the selection above, two QAIs for RCTs,66,83 three QAIs for OBSs,66,77,78,83 and two EGSs,74,80 were identified for further evaluation and added to the existing reference lists in Appendix E. Four of these seven tools were found when checking the related multiple design tools for other types of study, while the other three were reported under wrong types of study designs in the original review. Overall, totals of 75 of 267 existing QAIs (20 for SRs, 32 for RCTs, and 23 for OBSs) and 23 of 60 existing EGSs were identified as potential tools for further evaluation. Correspondingly, the references of potential QAIs and EGSs were highlighted in the updated reference lists of existing QAIs and EGSs (Appendix E). There were 192 QAIs and 37 EGSs excluded for one of the following reasons: it was not recommended by review articles, it was a specific QAI or EGS (not generic), it was determined not to be a QAI or EGS, it was a duplicate, or it was unavailable. There were another three reasons for the exclusion of QAIs: it was not included by further selection using tailored criteria in Shea et al. and Deeks et al. reviews,17,24 it was a guidance document, or it was an instrument for the study type not of interest. The summary of exclusion is showed in Table 6 and individual reasons for exclusion are presented in Appendix E.

Quality Assessment Tools Project Report 23

Table 6: Summary of Exclusion from Existing Tools to Potential Tools Number of tools QAIs EGSs

For SRs For RCTs For OBSs For multiple design

Number of existing tools 57 94 99 17 60 Number of excluded tools by reasons

Not recommended by review articles

15 47 66 0 37

Specific QAI or EGS 0 4 1 2 0 Not a QAI or EGS 4 5 1 0 0 Duplicate 0 4 0 13 0 Unavailable 0 1 0 0 0 Not included using tailored criteria

14 0 5 0 0

Guidance document 3 1 2 0 0 For study type not of interest 1 0 1 2 0

Number of excluded tools 37 62 76 17 37 Number of potential tools for evaluation 20 32 23 0 23

EGS = evidence grading system; OBS = observational study; QAI = quality assessment instrument; RCT = randomized controlled trial; SR = systematic review.

3.4 Evaluation of the potential QAIs and EGSs identified To clearly understand the following evaluation based on domains and elements, the assessment scheme is repeated here: • “Yes” (●, the system fully addressed the domain).

• “Partial” (◐, the system addressed the domain to some extent).

• “No” (○, the system did not address the domain at all). 3.4.1 Evaluation of the potential QAIs for SRs

There were seven domains considered in the evaluation, and four of them had empirical elements (search strategy, study quality/validity, data synthesis, and funding). Out of 20 QAIs evaluated for SRs,2,34,36,42,43,50-54,73,75,83-89,94 only four reported the type and source of funding.2,51-53The best scores were seen in the study question and search strategy domains, which were associated with 18 and 17 “yes” evaluations respectively. Both study quality/validity and data synthesis obtained 16 “yes” evaluations. Sacks et al. 199651 and AMSTAR 2005 (unpublished) fully addressed all seven domains. Specifically, the published version of AMSTAR2 was provided by the author in 2007 with some minor changes. Consequently, the AMSTAR 2005 instrument was labelled as unpublished in this report and with the reference published in 2007.2 The evaluation table is presented in Appendix F-1. 3.4.2 Evaluation of the potential QAIs for RCTs

There were seven domains considered in the evaluation, and four of them had empirical elements (randomization, blinding, statistical analysis, and funding). Out of 32 QAIs evaluated for RCTs,3,27,30,31,33,34,36,38,41,44,45,47-49,55-62,66,73,76,79,81-84,95,96 only four reported the type and source of

Quality Assessment Tools Project Report 24

funding. 3,36,55,56 The best scores were seen in the randomization and blinding domains, which were associated with 29 and 26 “yes” evaluations respectively. Statistical analysis was associated with 23 “yes” assessments. Chalmers et al. 1981,55 Reisch et al. 1989,56and SIGN 50 20043 fully addressed all seven domains. From the AHRQ report,1 the former two did not follow a rigorous development process. In addition, Chalmers’ instrument had 27 items to be assessed plus the calculation of total points earned as overall quality score55 and Reisch et al.’s checklist contained 58 main evaluation components under 12 categories plus calculating the ratio of the starred items marked by the reviewer to the maximum total possible.56 Compared with the SIGN 50 checklist with 10 internal validity components as well as considerations of overall score and study funding3, the former two55,56 were longer and less efficient to use in practice. Consequently, SIGN 50 20043 was selected for further consultation. The evaluation table is presented in Appendix F-2. 3.4.3 Evaluation of the potential QAIs for OBSs

There were five domains considered in the evaluation, and two of them had empirical elements (comparability of subjects and funding). A total of one unpublished, Reeves BC and Deeks JJ 2003 (Dr. Barnaby Reeves, University of Bristol, Bristol, UK: personal communication, 2005 Jul 22), and 22 published QAIs were evaluated for observation studies.4,5,28,30,33,34,36,38,56,58,59,63-

66,73,77,78,82,83,90-93 Among them, two instruments had two separate checklists for cohort and case-control studies,4,5,77,78 and one instrument was unpublished and collected by personal communication. Out of 23 instruments, only three reported the type and source of funding,4,5,36,56 and 114,5,28,30,34,56,58,59,63-65,77,78 fully addressed elements associated with the comparability domain. The best scores were seen in the exposure/intervention and statistical analysis domain, both receiving 15 out of 23 “yes” evaluations. Reisch et al. 198956 and SIGN.50 20044,5 fully addressed all five domains. The Reisch et al. 198956 study was excluded for further consultation (section 3.4.2). The evaluation table is presented in Appendix F-3. 3.4.4 Evaluation of the potential EGSs

The EGSs evaluation considered three domains: quality, quantity, and consistency. Out of the 23 EGSs evaluated,6,26,27,29,35,39,40,46,66-72,74,80,82,97-101 eleven addressed the consistency of work using similar and different study designs.6,35,66-72,100,101 Four partially29,40,82,99 and nine fully26,27,39,46,74,80,97,98,100 did not meet elements associated with the quantity domain. The best score was seen in the quality domain, which was associated with 19 out of 23 EGSs receiving “yes” evaluations. Ten EGSs fully addressed all three domains;6,35,66-72,101 however, four of them were not considered further because three EGSs70-72 were developed as non-guideline systems and one EGS35 was in progress. Consequently, the remaining six EGSs by Gyorkos et al. 1994,66 Briss et al. 2000,67 Greer et al. 2000,68 Harris et al. 2001,69 GRADE 2004,6 and SIGN 50 2004101 were considered further. The evaluation table is presented in Appendix F-4. Overall, seven QAIs2-5,51,55,56 and six EGSs6,66-69,101 fully addressed all evaluation domains and obtained the highest scores. Among them, four QAIs2-5,51 and six EGSs6,66-69,101 were selected for further consideration. Table 7 shows the evaluation summary.

Quality Assessment Tools Project Report 25

Table 7: Summary of the Evaluation of Potential QAIs and EGSs Number of tools Total No. of QAIs SRs RCTs OBSs EGSs Tools evaluated 75 20 32 23 23 Tools with highest scores 7 2 3 2 10 Tools for further consideration

4 2 1 1 6

EGS = evidence grading system; OBS = observational study; QAI = quality assessment instrument; RCT = randomized controlled trial; SR = systematic review.

3.5 Consultation on the QAIs and EGSs selected QAIs and EGSs earning the highest score and considered more practical for each study type were selected for further consultation. • QAIs for SRs: AMSTAR 20052and Sack et al. 1996.51 • QAIs for RCTs: SIGN 50 2004 methodological checklist for RCTs.3 • QAIs for OBSs: SIGN 50 2004 methodological checklists for cohort4 and case-control

studies.5 • EGSs: Gyorkos et al. 1994,66 Briss et al. 2000,67 Greer et al. 2000,68 and Harris et al. 2001,69

GRADE 20046, and SIGN 50 2004.101 3.5.1 Conducted second expert consultation (Step — 8)

Eight of nine experts who responded to the initial consultation provided input on the proposed QAIs and EGSs. Positive feedback was received from five experts for AMSTAR, five for SIGN 50 for RCTs, four for SIGN 50 for cohort studies, and three for SIGN 50 for case-control studies. Four experts ranked GRADE to be superior, while another four ranked SIGN 50 EGS as superior. 3.5.2 Chose QAIs and EGSs for CADTH (Step — 9)

After incorporating the input from the second round of expert consultation, a list of four QAIs and two EGSs were chosen as potential tools for CADTH. • AMSTAR 2005 for SRs (unpublished).2 • SIGN 50 2004 for RCTs.3 • SIGN 50 2004 for cohort studies.4 • SIGN 50 2004 for case-control studie.s5 • GRADE 2004 EGS6/ SIGN 50 2004 EGS.101 Considering the practical use, SIGN 50 checklists4,5,101 were modified through keeping all 10 items on internal validity of section 1, one item for overall assessment of the study of section 2, and one item regarding study funding of section 3 while removing the others. Permission was obtained from the author of AMSTAR and the SIGN organization. A further decision was made to apply the GRADE 2004 system6 as an EGS for CADTH projects. The details of each tool selected are presented in Appendix G-1 to Appendix G-5.

Quality Assessment Tools Project Report 26

3.5.3 Conducted stakeholder consultation (Step — 10)

A broad stakeholder consultation was conducted, with eight responses mainly focusing on some specific methodological issues; for example, validation of AMSTAR, literature search before 2000, and incorporation of economic evidence. Stakeholders’ input did not lead to any changes in the project results. 3.6 Updating QAIs and EGSs 3.6.1 Updating QAT 1 (Step — 11)

From 825 citations, 40 potentially relevant ones were identified and then reviewed in full text. Two articles in German and two in Dutch were not selected due to the unavailability of translation resources. Of the remaining 36 articles, seven reviews were included for further analysis. One review identified in QAT 2 later was also included. Comparing the references of the QAIs and EGSs contained in the eight reviews with the existing references from the original QAT, 38 additional references were identified. However, all of 38 were published before 2005. Consequently, no new QAI or EGS published in 2005 or afterward was identified by updating QAT1. A QUOROM flow chart is presented in Appendix D-4. 3.6.2 Updating QAT 2 (Step — 12)

From 776 citations, 294 potentially relevant ones were identified and then reviewed in full text. Except for three articles unavailable, two duplicates, and one in German where translation resources were unavailable, the remaining 288 articles were selected. One reviewer (RL) included 133 full articles out of the remaining 288 and identified the references, tables, and appendices of QAIs and EGSs from these included articles. After comparing the references identified above with the existing references of the original QAT and checking the tables and appendices using the selection criteria for the original QAT 2, 67 unique additional references were identified, including 17 published in 2005 or later. Eleven full articles of the 17 references were retrieved and checked and no new QAI or EGS was identified from them. The remaining six references were excluded at the abstract level because they were guidance documents, component tools, and an inaccessible web link. A QUOROM flow chart is presented in Appendix D-5. The second reviewer (KD) selected 121 from the 288 full articles. Comparing the references, tables and appendices of QAIs and EGSs contained in the 121 included articles with those from the first reviewer and then with the existing references of the original QAT, 56 unique additional references were identified, including 15 published in 2005 or later. The abstracts or full text of the 15 articles were checked. Specifically, three additional generic EGSs were identified in three articles published in 2006; however, they were excluded for further evaluation because one was too complex for use,102and the other two were considered too simple, only considering study designs.103,104 Overall, no new QAIs and EGSs were identified from the 15 articles, and the main exclusion reason was the specific use of tools. A QUOROM flow chart is presented in Appendix D-6.

Quality Assessment Tools Project Report 27

4 Discussion This section discusses the tools selected by the QAT project and their application, the conduct of the QAT project, and some related methodological issues. 4.1 Tools selected through QAT project As final choices, AMSTAR 20052 was selected for the quality assessment of SRs, SIGN 50 2004 was selected for RCTs and observational studies (cohort and case-control studies),3-5 and GRADE 20046 was selected as the most appropriate EGS. These tools were chosen mainly because they received the highest evaluation scores. 4.1.1 AMSTAR for SRs

AMSTAR 20052 was selected as the most appropriate QAI for SRs, with a perfect score on the AHRQ domain criteria. To date, a QAI developed by Oxman (unpublished)87 is widely accepted and used. However, on the AHRQ domain criteria, it did not meet criteria for data extraction, funding domain, and inclusion/exclusion criteria. Although the original Oxman index was deemed a valid tool and received reasonable to excellent agreement and reliability [> 0.5 intraclass correlation coefficient (ICC)] among reviewers in 1991 (research assistants, clinicians with research training, and experts in research methodology), nothing has since been published on other important validity and reliability measures.105,106 AMSTAR 2005 was formed using the Oxman index87 and a checklist reported by Sacks et al. in 1987,88 as well as three additional items that emerged as important methodological features (language restriction, publication bias, and publication status). Several publications have emerged that report both the validity and reliability of AMSTAR 2005. Shea et al. in 2007 published two separate studies indicating good face and content validity; and moderate to almost perfect inter-observer agreement [kappa > 0.75, 95% confidence interval (CI), 0.55 to 0.96)], excellent reliability (kappa 0.84, 95% CI, 0.67 to 1.00; Pearson's R 0.96, CI, 0.92 to 0.98), and good construct validity versus a global assessment (Pearson's R 0.72, 95% CI, 0.53 to 0.84).2,8 In 2009, Shea et al. conducted another reliability and validity study and the results support the preceding findings.107 In addition to meeting all the AHRQ domains, AMSTAR 2005 also addresses the status of publication, and determines if the scientific quality of included studies is used appropriately in formulating conclusions, publication bias, and conflict of interest. Furthermore, AMSTAR 2005 provides more descriptive explanations as to whether or not the item in question is met. This makes it much easier for reviewers to interpret and consistently answer in the same manner. It is for the above mentioned reasons that AMSTAR 2005 is a more appropriate QAI to use when assessing SRs. It is important to highlight that AMSTAR is a checklist and there is a tendency among researchers to use it as scale. Using AMSTAR as a scale may be misleading. 4.1.2 SIGN 50 checklist for RCTs

SIGN 50 2004 was selected as the most appropriate QAI for RCTs,3 with a perfect score on the AHRQ domain criteria. For a time period, the Jadad scale 1998 was widely accepted and used as the “gold standard” by reviewers, but its use has become controversial because the tool focuses on quality of reporting rather than methodological quality of the trial.108 For some reviewers, the

Quality Assessment Tools Project Report 28

Jadad scale had to be modified to obtain a higher inter-rater reliability score.79 Particularly for the withdrawal question, Clark et al. indicated that there was high variability in inter-rater agreement.109 This tool did not pass through to the evaluation stage of this project. SIGN 50 2004 addresses 23 items (12 were applicable to this project) compared with the Jadad scale from 1998, which only addresses three items (plus two bonus questions; a maximum of five points can be awarded).3,108 It is clear that SIGN 50 2004 covers sufficiently more detail related to the design and conduct of trials. After the end of this project in 2007, SIGN 50 was updated in 2008; however, the methodological checklist for RCTs was unchanged. Also in 2008, another QAI emerged from the Cochrane Collaboration Methods Group called the risk of bias assessment tool.110 This tool has six items that address the following: sequence generation, allocation concealment, blinding of participants, incomplete outcome data, selective outcome reporting, and other potential threats to internal validity. Answer types include “yes” (low risk of bias), “unclear” (uncertain risk), and “no” (high risk of bias) for all six items. Clearly described criteria are provided for each part of the tool. A benefit of this tool is that it has been incorporated into Review Manager Software (version 5.0 and connected to GRADE approach. Since 2008, the risk of bias assessment tool has gained international recognition and is currently being used as a QAI.

The AHRQ report found that the older QAIs tend to have more inclusive quality domains and be long and potentially cumbersome to complete, while more recently developed instruments tend to be shorter and focus mainly on empirical criteria, which provide sufficient information on study quality.1Although the movement from longer and more inclusive instruments to shorter ones was a pattern observed for the last two decades, it was emphasized that the shorter ones should be equivalently reliable and valid.1 The same situation was found in the evaluation of the QAT project: Chalmers et al. 198155 and Reisch et al. 198956 versus SIGN 50 2004 for RCTs,3 and Reisch et al. 198956 versus SIGN 50 2004 for observational studies.4,5 4.1.3 SIGN 50 checklist for OBSs

SIGN 50 2004 was selected as the most appropriate tool for observational studies (cohorts4 and case-controls5), with a perfect score on the AHRQ domain criteria. Downs and Black 1998 is a widely accepted QAT.58 However, this tool did not meet the AHRQ criteria for type and source of funding. Aside from missing on the funding domain, both QAIs are comparable, but Black and Downs 1998 can also be used to score studies with a randomized design. Consequently, many of the 27-items do not apply to either cohort or case-control design features (e.g., blinded to intervention, compliance to intervention, randomization, allocation concealment, calculation of power, and two items related to loss to follow up). If an item does not apply to a particular design feature, the study loses a point, but the final score denominator is not altered. This automatically results in a lower final score. Some reviewers have opted to omit the items that do not apply, thus making a more representative score. Compared with Downs and Black, SIGN 50 2004 is adopted worldwide and is updated on a regular basis to consider emerging design features. As such, SIGN 50 2004 was updated in 2008, but the methodological checklists for cohort and case-control studies were unchanged. 4.1.4 GRADE 2004 for EGS

The AHRQ report found that 30% of the EGSs published before 2000 and 82% of the EGSs published in 2000 or later fully or partially dealt with all three domains (quality, quantity, and

Quality Assessment Tools Project Report 29

consistency) to some degree and explained that this wide disparity was attributed to the consistency domain, which began to be addressed more frequently from 2000 onward.1 The framework of GRADE 20046 was selected as the most appropriate EGS, with a perfect score on the AHRQ domain criteria. Although the EGS of SIGN 50 2004101 also received this score, GRADE was selected because it not only has gained more international recognition, but also met the needs of the COMPUS expert committee to make optimal drug use recommendations. Both systems are comparable, but GRADE 20046 focuses more on relevant/critical outcomes (while also considering other outcomes in the decision process), financial costs, and factors in the lowest quality studies in the decision process; whereas SIGN 50 2004101 covers a broader spectrum and does not incorporate these factors. Since GRADE 20046 was evaluated in the QAT project, the GRADE approach has been largely improved and broadly applied in developing clinical practice guidelines and evidence-based recommendations in the world. The articles related to the updated GRADE methodology are presented on its website.111

On the other hand, the SIGN 50 evidence grading system remains unchanged, even though whether and to what extent SIGN should adopt the GRADE approach is under discussion.37 4.2 Application of the evaluation tools selected The AMSTAR instrument was applied to evaluate the quality of existing systematic reviews and meta-analyses in all CADTH projects; SIGN 50 methodological checklists were used to assess the quality of RCTs in CADTH projects on proton pump inhibitors, blood glucose test strips, and second-line therapy for patients with type 2 diabetes, and to evaluate the cohort studies in the blood glucose test strips project; the GRADE approach was followed to create the CADTH process to make recommendations.112-116 CADTH evaluation results of using AMSTAR in the proton pump inhibitors project were used to analyze the external validity of AMSTAR.2,8 In addition, the set of QAIs and EGS selected by the QAT project was also applied in the Common Drug Review program of CADTH as guidance for critical appraisal. 4.3 Methodological issues 4.3.1 Literature search

The AHRQ report mentioned that their formal literature searches were the least productive source of evaluation tools, particularly for evidence grading systems. AHRQ’s searches turned up 30 of the 121 tools analyzed in the report. They found that the majority of the relevant publications were identified through hand searches and contacts with experts in the field. The key weakness of electronic searching for individual QAIs and EGSs was the relatively undeveloped state of the National Library of Medicine’s Medical Subject Headings (MeSH) in expressing methodological concepts inherent to quality assessment and evidence-based medicine more generally.1

Quality Assessment Tools Project Report 30

The search strategy used by CADTH took into account the recommendations of the AHRQ team and the preliminary and scoping searches confirmed the assessment of the AHRQ report authors.1 Since the publication of the AHRQ report there had not been any major additions to MeSH that would help to identify individual QAIs and EGSs. At that time, MeSH remained inadequate as a tool to express the subject content of these tools. Therefore any MeSH-based search remained largely ineffective as a means of identifying individual tools and keywords were searched extensively resulting in large numbers of false hits and unknown levels of comprehensiveness, even when running a very sensitive search. Based on this experience, it was determined that a search for individual QAIs and EGSs was neither feasible nor desirable at the beginning of the QAT project. Therefore, it was decided to run a very sensitive search for systematic reviews of these tools first. 4.3.2 Study funding

Interestingly, 19 of 75 QAIs (four for SRs, eight for RCTs, and seven for observational studies) were close to receiving the highest score for each study type, but consistently missed on one item — identifying the type and source of funding. Funding had only a single empirical element in AHRQ evaluation grids.1 Out of the QAI evaluations assessed, four of 20 instruments for SRs scored six out of seven,36,50,54,86 eight of 32 instruments for RCTs scored six out of seven, 30,34,57-

62 and seven of 23 instruments for OBSs scored four out of five, 30,58,59,63-65,77,78 respectively. Having funds to support a study or project “may result in biases in design, outcome, and reporting.”117 Jorgensen et al. 2008 conducted a meta-analysis on head-to-head drug trials and found, using the 10-item Oxman-Guyatt index,87 that non-profit or no support funding scored higher on quality (six on 18 meta-analyses) compared with industry sponsored (2.5 on 10 meta-analyses), and undeclared support (three on 11). It was determined that industry-sponsored studies or projects are less transparent in their methodological procedures (e.g., selection of studies, comprehensiveness of search, and use of appropriate criteria). The same study also concluded that 40% (4/10) of industry sponsored meta-analyses compared with 22% of meta-analyses with non-profit or no support (4/18) recommended the experimental drug intervention without reservations. All four of the industry-sponsored meta-analyses were favouring the experimental drug and not the control drug.118 Other studies on various topics in the medical field have found similar results.119-121 This highlights the importance for QAIs to identify type and source of funding when quality assessing evidence, as the methodological quality may be compromised depending on the source. Reviewers should also be skeptical when the funding source is not reported, as reviewers cannot appropriately judge if the evidence is reliable. Funding, as such, is not a direct factor possibly affecting the internal validity of a systematic review, RCT or OBS. However, it could affect the other risk of bias factors. Therefore, if funding is considered as an independent factor for quality, it could lead to a degree of double counting in assessing the overall quality. The SIGN 50 checklist considers funding as a separate factor from the internal validity criteria.

Quality Assessment Tools Project Report 31

4.4 Strengths and limitations of the QAT project The strengths of the CADTH QAT project are that we built on a previous systematic review report published by AHRQ (report 47),1comprehensively searched and collected existing tools, provided thorough and updated reference lists of existing QAIs and EGSs, retrieved international experts’ input, and incorporated experience from the researchers and experts who were involved in the CADTH optimal use program. The major limitations of the QAT project are challenges with the search strategy, as many concepts are not standardized in the included databases and there has been no updating of the literature search since September 2007. Only QAIs or EGSs published before September 2007 were included in the review. New developments in the area, in particular the Cochrane risk of bias tool, are not included. Moreover, the online references of some existing instruments and systems had been updated when we retrieved them for further selection and evaluation or cited them as the references of this report. That is why the references of a few instruments and systems are different between the reference lists of existing ones in Appendix E (extracted directly from the review articles, experts’ input, and additional publications) and the bibliography of this report (retrieved during the QAT project), particularly in publication years. The AHRQ criteria were used for assessing the quality of QATs and evidence grading systems in order to build upon the existing AHRQ review. There are other criteria available to assess the quality of the instruments such as reviewing the range of attributes that the instrument should have as explained by Feinstein et al.122 In addition, our collection and evaluation only focused on generic QAIs and EGSs without consideration of specific ones, because the specific ones were applied more narrowly. Considering the completeness of quality assessment, our evaluation only focused on checklists and scales of QAIs without quality assessment components. To avoid repeating previous efforts, we did not work on the relevant systematic reviews published before 2000. However, we found that the more recent review articles included in this report had cited the earlier publications. 5 CONCLUSION To identify the most appropriate QAIs and EGSs for making evidence-based recommendations on optimal drug prescribing and use, we conducted four separate literature searches plus one expert consultation to identify approximately 6,000 literature citations and references, and then reviewed more than 900 full articles to collect a total of 267 existing quality assessment instruments and 60 existing evidence grading systems. Further analysis identified 75 potential QAIs (20 for SRs, 32 for RCTs, and 23 for OBSs) and 23 potential EGSs among the existing ones. Through using the AHRQ evaluation grids based on key domains to evaluate the candidate tools, considering the feasibility for use, and incorporating the input from experts and stakeholders, AMSTAR 2005 for SRs,2 SIGN 50 2004 for RCTs3, cohort4 and case-control studies,5 and GRADE 2004 for EGS6 were selected as the most appropriate tools for use.

Quality Assessment Tools Project Report 32

Applying QAIs and EGSs systematically and consistently can make our evaluations more transparent, and thus, can help reviewers, expert panels, or government agencies more effectively translate evidence into more comprehensive, reliable, and practical recommendations. Thus, we are confident that the work and selections of the QAT project are an important piece of CADTH’s evaluation methodology and will solidify our evaluation foundation. 6 REFERENCES 1. West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, et al. Systems to rate the

strength of scientific evidence [Internet]. Rockville (MD): Agency for Healthcare Research and Quality; 2002 Mar. AHRQ Publication No 02-E016. [cited 2005 Mar 10]. (Evidence report/technology assessment no 47). Available from: http://www.thecre.com/pdf/ahrq-system-strength.pdf

2. Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol [Internet]. 2007 [cited 2007 Nov 22];7:10. Available from: http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1810543&blobtype=pdf

3. Scottish Intercollegiate Guidelines Network. SIGN 50: a guideline developers' handbook [Internet]. Edinburgh: The Network; 2008. Annex C. Methodology checklist 2: randomised controlled trials; p. 52. [cited 2008 Jun 6]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/checklist2.html

4. Scottish Intercollegiate Guidelines Network. Methodology checklist 3: cohort studies. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter Annex C [cited 2008 Jun 6]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/checklist3.html.

5. Scottish Intercollegiate Guidelines Network. Methodology checklist 4: case-control studies [Internet]. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter Annex C [cited 2008 Jun 6]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/checklist4.html.

6. Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. Grading quality of evidence and strength of recommendations. BMJ. 2004 Jun 19;328(7454):1490.

7. Oxman AD, Schunemann HJ, Fretheim A. Improving the use of research evidence in guideline development: 8. Synthesis and presentation of evidence. Health Res Policy Syst [Internet]. 2006 [cited 2012 Feb 13];4:20. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1702353

8. Shea BJ, Bouter LM, Peterson J, Boers M, Andersson N, Ortiz Z, et al. External validation of a measurement tool to assess systematic reviews (AMSTAR). PLoS ONE [Internet]. 2007 [cited 2012 Feb 13];2(12):e1350. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2131785

Quality Assessment Tools Project Report 33

9. Shukla VK, Bai A, Milne S, Wells G. Systematic review of quality assessment instruments for randomized controlled trials: selection of SIGN50 methodological checklist [oral presentation]. In: XV Cochrane Colloquium; 2007 Oct; Sao Paulo (Brazil). London: Cochrane Collaboration; 2007 Oct.

10. Wells G, Shukla VK, Bai A, Milne S. Systematic review of quality assessment instruments for randomized controlled trials: selection of SIGN50 methodological checklist [oral presentation]. In: Canadian Cochrane Symposium; 2008 Mar; Edmonton (AB). Ottawa: Canadian Cochrane Centre; 2008 Mar.

11. Shukla VK, Bai A, Milne S, Wells GA. Systematic review of evidence grading systems for grading levels of evidence [poster]. Poster presented at: Evidence in the era of globalization. XVI Cochrane Colloquium; 2008 Oct; Freiburg (Germany).

12. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995 Feb 1;273(5):408-12.

13. Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999 Sep 15;282(11):1054-60.

14. Lohr KN, Carey TS. Assessing "best evidence": issues in grading the quality of studies for systematic reviews. Jt Comm J Qual Improv. 1999 Sep;25(9):470-9.

15. Moher D, Jadad AR, Tugwell P. Assessing the quality of randomized controlled trials. Current issues and future directions. Int J Technol Assess Health Care. 1996;12(2):195-208.

16. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995 Feb;16(1):62-73.

17. Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii-173.

18. The periodic health examination. Canadian Task Force on the Periodic Health Examination. CMAJ [Internet]. 1979 Nov 3 [cited 2008 Jun 6];121(9):1193-254. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1704686

19. Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches. The GRADE Working Group. BMC Health Serv Res [Internet]. 2004 Dec 22 [cited 2005 Sep 8];4(1):38. Available from: http://www.pubmedcentral.gov/picrender.fcgi?artid=545647&blobtype=pdf

20. Katrak P, Bialocerkowski AE, Massy-Westropp N, Kumar S, Grimmer KA. A systematic review of the content of critical appraisal tools. BMC Med Res Methodol [Internet]. 2004

Quality Assessment Tools Project Report 34

Sep 16 [cited 2005 Sep 8];4(1):22. Available from: http://www.biomedcentral.com/1471-2288/4/22

21. Saunders LD, Soomro GM, Buckingham J, Jamtvedt G, Raina P. Assessing the methodological quality of nonrandomized intervention studies. West J Nurs Res. 2003 Mar;25(2):223-37.

22. Liberati A, Buzzetti R, Grilli R, Magrini N, Minozzi S. Which guidelines can we trust?: Assessing strength of evidence behind recommendations for clinical practice. West J Med. 2001 Apr;174(4):262-5.

23. Colle F, Rannou F, Revel M, Fermanian J, Poiraudeau S. Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain. Arch Phys Med Rehabil. 2002 Dec;83(12):1745-52.

24. Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews: the QUOROM statement compared to other tools. 2nd ed. In: Egger M, Smith GD, Altman DG, editors. Systematic reviews in health care: meta-analysis in context. London: BMJ Publishing Group; 2005. p. 122-39. Chapter 7.

25. Brouwers MC, Johnston ME, Charette ML, Hanna SE, Jadad AR, Browman GP. Evaluating the role of quality assessment of primary studies in systematic reviews of cancer practice guidelines. BMC Med Res Methodol [Internet]. 2005 Feb 16 [cited 2005 Aug 29];5(1):8. Available from: http://www.biomedcentral.com/content/pdf/1471-2288-5-8.pdf

26. Guyatt G, Schünemann HJ, Cook D, Jaeschke R, Pauker S. Applying the grades of recommendation for antithrombotic and thrombolytic therapy: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest. 2004 Sep;126(3 Suppl):179S-87S.

27. Eccles M, Mason J. How to develop cost-conscious guidelines. Health Technol Assess. 2001;5(16):1-69.

28. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J. Methodological index for non-randomized studies (minors): development and validation of a new instrument. ANZ J Surg. 2003 Sep;73(9):712-6.

29. Soldani F, Ghaemi SN, Baldessarini RJ. Research reports on treatments for bipolar disorder: preliminary assessment of methodological quality. Acta Psychiatr Scand. 2005 Jul;112(1):72-4.

30. MacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4(34):1-154.

31. van Tulder M, Furlan A, Bombardier C, Bouter L. Updated method guidelines for systematic reviews in the cochrane collaboration back review group. Spine. 2003 Jun 15;28(12):1290-9.

Quality Assessment Tools Project Report 35

32. Thomas BH, Ciliska D, Dobbins M, Micucci S. A process for systematically reviewing the literature: providing the research evidence for public health nursing interventions. Worldviews on Evidence-Based Nursing. 2004;1(3):176-84.

33. Cochrane Effective Practice and Organisation of Care Review Group (EPOC). The data collection checklist [Internet]. Ottawa: Institute of Population Health, University of Ottawa; 2002. 26 p. [cited 2006 Sep 6]. Available from: http://epoc.cochrane.org/sites/epoc.cochrane.org/files/uploads/datacollectionchecklist.pdf

34. Effective Practice Institute. Handbook for the preparation of explicit evidence-based clinical practice guidelines. Wellington (NZ): New Zealand Guidelines Group; 2003.

35. Coleman K, Norris S, Weston A, Grimmer K, Hillier S, Merlin T, et al. NHMRC additional levels of evidence and grades for recommendations for developers of guidelines. Pilot program 2005 - 2007 . Canberra: National Health and Medical Research Council; 2005.

36. Oregon Evidence-based Practice Center for the Drug Effectiveness Review Project. Systematic review methods [Internet]. In: Drug effectiveness review project. Portland: Oregon Health and Science University; 2008 [cited 2008 Feb 28]. Available from: http://www.ohsu.edu/drugeffectiveness/methods/index.htm.

37. Scottish Intercollegiate Guidelines Network. SIGN 50: a guideline developers' handbook [Internet]. Revised edition. Edinburgh: The Network; 2008 Jan. [cited 2012 Feb 13]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/ Revised November 2011.

38. Independent evidence-based health care. In: Bandolier Forum. Oxford: Bandolier; 2003.

39. A guide to the development, evaluation and implementation of clinical practice guidelines [Internet]. Canberra: National Health and Medical Research Council; 1999. [cited 2008 Apr 3]. Available from: http://nhmrc.gov.au/publications/synopses/cp30syn.htm

40. CTFPHC history/methodology [Internet]. [place unknown]: Canadian Task Force on Preventive Health Care; 2003. [cited 2008 Apr 7]. Available from: http://www.canadiantaskforce.ca/_archive/index.html

41. Turlik MA, Kushner D, Stock D. Assessing the validity of published randomized controlled trials in podiatric medical journals. J Am Podiatr Med Assoc. 2003 Sep;93(5):392-8.

42. Glenny AM, Esposito M, Coulthard P, Worthington HV. The assessment of systematic reviews in dentistry. Eur J Oral Sci. 2003 Apr;111(2):85-92.

43. Goodwin DM, Higginson IJ, Edwards AG, Finlay IG, Cook AM, Hood K, et al. An evaluation of systematic reviews of palliative care services. J Palliat Care. 2002;18(2):77-83.

44. Braunschweig CL, Levy P, Sheean PM, Wang X. Enteral compared with parenteral nutrition: A meta-analysis. Am J Clin Nutr [Internet]. 2001 [cited 2007 Sep 13];74(4):534-42. Available from: http://www.ajcn.org/cgi/reprint/74/4/534

Quality Assessment Tools Project Report 36

45. Yang Q, Peters TJ, Donovan JL, Wilt TJ, Abrams P. Transurethral incision compared with transurethral resection of the prostate for bladder outlet obstruction: A systematic review and meta-analysis of randomized controlled trials. J Urol. 2001;165(5):1526-32.

46. Ellis J. Sharing the evidence: clinical practice benchmarking to improve continuously the quality of care. J Adv Nurs. 2000 Jul;32(1):215-25.

47. Chalmers I, Enkin M, Keirse MJNC. Effective care in pregnancy and childbirth. Oxford: Oxford University Press; 1985. Unpublished.

48. Kleijnen J, de Craen AJ, van Everdingen J, Krol L. Placebo effect in double-blind clinical trials: a review of interactions with medications. Lancet. 1994 Nov 12;344(8933):1347-9.

49. Greenhalgh T, Donald A. Papers that report drug trials (randomized controlled trials of therapy). In: Evidence based health care workbook : understanding research ; for individual and group learning. London: BMJ Books; 2000. p. 59.

50. Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz G, Chalmers TC, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med. 1994 Apr 15;120(8):667-76.

51. Sacks HS, Reitman D, Pagano D, Kupelnick B. Meta-analysis: an update. Mt Sinai J Med. 1996 May;63(3-4):216-24.

52. Auperin A, Pignon JP, Poynard T. Review article: critical review of meta-analyses of randomized clinical trials in hepatogastroenterology. Aliment Pharmacol Ther. 1997 Apr;11(2):215-25.

53. Barnes DE, Bero LA. Why review articles on the health effects of passive smoking reach different conclusions. JAMA. 1998 May 20;279(19):1566-70.

54. Center for Reviews and Dissemination. Undertaking systematic reviews of research on effectiveness: CRD's guidance for carrying out or commissioning reviews. 2nd ed. York (UK): University of York; 2001 Jan. Report No.: 4

55. Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials. 1981 May;2(1):31-49.

56. Reisch JS, Tyson JE, Mize SG. Aid to the evaluation of therapeutic studies. Pediatrics. 1989 Nov;84(5):815-27.

57. Sindhu F, Carpenter L, Seers K. Development of a tool to rate the quality assessment of randomized controlled trials using a Delphi technique. J Adv Nurs. 1997 Jun;25(6):1262-8.

58. Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998 Jun;52(6):377-84.

Quality Assessment Tools Project Report 37

59. Harbour R, Miller J. A new system for grading recommendations in evidence based guidelines. BMJ. 2001 Aug 11;323(7308):334-6.

60. Liberati A, Himel HN, Chalmers TC. A quality assessment of randomized control trials of primary treatment of breast cancer. J Clin Oncol. 1986 Jun;4(6):942-51.

61. van der Heijden GJ, van der Windt DA, Kleijnen J, Koes BW, Bouter LM. Steroid injections for shoulder disorders: a systematic review of randomized clinical trials. Br J Gen Pract [Internet]. 1996 May [cited 2012 Feb 13];46(406):309-16. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1239642

62. de Vet HC, de Bie RA, van der Heijden GJ, Verhagen AP, Sijpkes P, Knipschild PG. Systematic reviews on the basis of methodological criteria. Physiotherapy. 1997 Jun;83(6):284-9.

63. Spitzer WO, Lawrence V, Dales R, Hill G, Archer MC, Clark P, et al. Links between passive smoking and disease: a best-evidence synthesis. A report of the Working Group on Passive Smoking. Clin Invest Med. 1990 Feb;13(1):17-42.

64. Goodman SN, Berlin J, Fletcher SW, Fletcher RH. Manuscript quality before and after peer review and editing at Annals of Internal Medicine. Ann Intern Med. 1994 Jul 1;121(1):11-21.

65. Zaza S, Wright-De Aguero LK, Briss PA, Truman BI, Hopkins DP, Hennessy MH, et al. Data collection instrument and procedure for systematic reviews in the Guide to Community Preventive Services. Task Force on Community Preventive Services. Am J Prev Med. 2000 Jan;18(1 Suppl):44-74.

66. Gyorkos TW, Tannenbaum TN, Abrahamowicz M, Oxman AD, Scott EA, Millson ME, et al. An approach to the development of practice guidelines for community health interventions. Can J Public Health. 1994 Jul;85 Suppl 1:S8-13.

67. Briss PA, Zaza S, Pappaioanou M, Fielding J, Wright-De Agüero L, Truman BI, et al. Developing an evidence-based Guide to Community Preventive Services--methods. Am J Prev Med. 2000;18(1 Suppl 1):35-43.

68. Greer N, Mosser G, Logan G, Halaas GW. A practical approach to evidence grading. Jt Comm J Qual Improv. 2000 Dec;26(12):700-12.

69. Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, et al. Current methods of the US Preventive Services Task Force: a review of the process. Am J Prev Med. 2001 Apr;20(3 Suppl):21-35.

70. Clarke M, Oxman AD. Cochrane reviewer's handbook 4.0. Oxford (UK): The Cochrane Collaboration; 1999. 284 p.

71. Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ, Green L, Naylor CD, et al. Users' guides to the medical literature: XXV. Evidence-based medicine: principles for applying the

Quality Assessment Tools Project Report 38

Users' Guides to patient care. Evidence-Based Medicine Working Group. JAMA. 2000 Sep 13;284(10):1290-6.

72. Centre for Evidence-Based Medicine [Internet]. Oxford (UK): Centre for Evidence-Based Medicine; c2012. CEBM (Centre for Evidence-Based Medicine) levels of evidence; 2001 [cited 2012 Feb 13]. Available from: http://www.cebm.net/index.aspx?o=5653 Last edited 06 Feb 2012.

73. Pearson A. Rapid appraisal protocol internet database. RAPid User manual vers 1.1. Adelaide: The Joanna Briggs Institute; 2005.

74. Systematic reviews - the review process. Adelaide: The Joanna Briggs Institute; 2005.

75. Critical Appraisal Skills Programme (CASP). 10 questions to help you make sense of reviews [Internet]. Oxford: Public Health Resource Unit (PHRU); 2006. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20Systematic%20Reviews%20Assessment%20Tool.pdf

76. Critical Appraisal Skills Programme (CASP). 10 questions to help you make sense of randomised controlled trials [Internet]. Oxford: Public Health Resource Unit (PHRU); 2002. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20RCTs%20Assessment%20Tool.pdf Updated 2006.

77. Critical Appraisal Skills Programme (CASP). 12 questions to help you make sense of a cohort study [Internet]. Oxford: Public Health Resource Unit (PHRU); 2004. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20Cohort%20Assessment%20Tool.pdf

78. Critical Appraisal Skills Programme (CASP). 11 questions to help you make sense of a case control study [Internet]. Oxford: Public Health Resource Unit (PHRU); 2006. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20Case-Controls%20Assessment%20Tool.pdf

79. Oremus M, Wolfson C, Perrault A, Demers L, Momoli F, Moride Y. Interrater reliability of the modified Jadad quality scale for systematic reviews of Alzheimer's disease drug trials. Dement Geriatr Cogn Disord. 2001 May;12(3):232-6.

80. Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL. Report of the Canadian Hypertension Society Consensus Conference: 1. Introduction. CMAJ. 1993 Aug 1;149(3):289-93.

81. Moseley AM, Herbert RD, Sherrington C, Maher CG. Evidence for physiotherapy practice: a survey of the Physiotherapy Evidence Database (PEDro). Aust J Physiother. 2002;48(1):43-9.

Quality Assessment Tools Project Report 39

82. How to use the evidence: assessment and application of scientific evidence [Internet]. Canberra: National Health and Medical Research Council; 2000. [cited 2008 Apr 3]. Available from: http://www.nhmrc.gov.au/publications/synopses/_files/cp69.pdf

83. Crombis IK. The pocket guide to critical appraisal: A handbook for health care professionals. London: BMJ Publishing Group; 1996.

84. FOCUS critical appraisal tool. London: The Royal College of Psychiatrists; 2005.

85. Goldschmidt PG. Information synthesis: a practical guide. Health Serv Res. 1986 Jun;21(2 Pt 1):215-37.

86. Nony P, Cucherat M, Haugh MC, Boissel JP. Critical reading of the meta-analysis of clinical trials. Therapie. 1995 Jul;50(4):339-51.

87. Shea B, Dube C, Moher D. Appendix 2: Quality of meta-analysis: Oxman and Guyatt's index of scientific quality of research overviews. In: Egger M, Smith GD, Altman GD, editors. Systematic reviews in health care: meta-analysis in context. London: BMJ; 2001. p. 137-9. Chapter 7.

88. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987 Feb 19;316(8):450-5.

89. Wilson A, Henry DA. Meta-analysis: part 2: assessing the quality of published meta-analyses. Med J Aust. 1992 Feb 3;156(3):173-4, 177-8, 180, 184-7.

90. Cowley DE. Prostheses for primary total hip replacement: a critical appraisal of the literature. Int J Technol Assess Health Care. 1995;11(4):770-8.

91. DuRant RH. Checklist for the evaluation of research articles. J Adolesc Health. 1994 Jan;15(1):4-8.

92. Hadorn DC, Baker D, Hodges JS, Hicks N. Rating the quality of evidence for clinical practice guidelines. J Clin Epidemiol. 1996 Jul;49(7):749-54.

93. Vickers A. Critical appraisal: how to read a clinical research paper. Complement Ther Med. 1995;3:158-66.

94. Scottish Intercollegiate Guideline Network. Methodology checklist 1: systematic reviews and meta-analyses [Internet]. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004 [cited 2008 Jun 6]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/checklist1.html.

95. van Tulder MW, Malmivaara A, Esmail R, Koes BW. Exercise therapy for low back pain [Cochrane review]. Cochrane Database Syst Rev. 2000;(2):CD000335.

96. Thomas H, Effective Public Health Practice Project. Quality assessment tool for quantitative studies. London (ON): McMaster University; 2004. Unpublished.

Quality Assessment Tools Project Report 40

97. Jovell AL, Navarro-Rubio MD. Evaluación de la evidencia científica. Med Clin (Barc ). 1995;105:740-3.

98. Liddle J, Williamson M, Irwig L. Method for evaluating research guideline evidence: improving health care and outcomes [Internet]. North Sydney: New South Wales Department of Health; 1996. [cited 2005 Jul 7]. Available from: http://www.health.nsw.gov.au/pubs/1996/pdf/mergetot.pdf

99. Guyatt G, Schunëmann H, Cook D, Jaeschke R, Pauker S, Bucher H. Grades of recommendation for antithrombotic agents. Chest [Internet]. 2001 Jan [cited 2005 Mar 10];119(1 Suppl):3S-7S. Available from: http://www.chestjournal.org/cgi/reprint/119/1_suppl/3S

100. Phillips B, Ball C, Sackett D, Badenoch D, Straus S, Haynes B, et al. Oxford Centre for Evidence-based Medicine levels of evidence [Internet]. Oxford: Centre for Evidence Based Medicine; 2001. [cited 2008 Apr 7]. Available from: http://www.cebm.net/index.aspx?o=1047

101. Scottish Intercollegiate Guidelines Network. Forming guideline recommendations [Internet]. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter 6 [cited 2005 Nov 17]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/section6.html.

102. Treadwell JR, Tregear SJ, Reston JT, Turkelson CM. A system for rating the stability and strength of medical evidence. BMC Med Res Methodol [Internet]. 2006 [cited 2007 Nov 28];6:52. Available from: http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1624842&blobtype=pdf

103. Smith SC, Jr., Feldman TE, Hirshfeld JW, Jr., Jacobs AK, Kern MJ, King SB, III, et al. ACC/AHA/SCAI 2005 Guideline Update for Percutaneous Coronary Intervention-Summary Article: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (ACC/AHA/SCAI Writing Committee to Update the 2001 Guidelines for Percutaneous Coronary Intervention). J Am Coll Cardiol. 2006 Jan 3;47(1):216-35.

104. Committee for Practice Guidelines (CPG) of the European Society of Cardiology. Recommendations for guideline production [Internet]. Sophia Antipolis (France): European Society of Cardiology; 2010. [cited 2012 Feb 13]. Available from: http://www.escardio.org/guidelines-surveys/esc-guidelines/Documents/ESC%20Guidelines%20for%20Guidelines%20Update%202010.pdf

105. Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271-8.

106. Oxman AD, Guyatt GH, Singer J, Goldsmith CH, Hutchison BG, Milner RA, et al. Agreement among reviewers of review articles. J Clin Epidemiol. 1991;44(1):91-8.

Quality Assessment Tools Project Report 41

107. Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, et al. AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009 Feb 18;62(10):1013-20.

108. Berger VW. Is the Jadad score the proper evaluation of trials? J Rheumatol. 2006 Aug;33(8):1710-1.

109. Clark HD, Wells GA, Huët C, McAlister FA, Salmi LR, Fergusson D, et al. Assessing the quality of randomized trials: reliability of the Jadad scale. Control Clin Trials. 1999 Oct;20(5):448-52.

110. Higgins JPT, Green S, (eds.). Cochrane Handbook for Systematic Reviews of Interventions 4.2.5. Updated May 2005. Chichester (UK): John Wiley & Sons, Ltd.; 2008.

111. The GRADE working group. GRADE [Internet]. The GRADE working group; c2005. List of GRADE working group publications and grants; 2011 [cited 2012 Feb 13]. Available from: http://www.gradeworkinggroup.org/publications/index.htm

112. Canadian Agency for Drugs and Technologies in Health. Evidence for PPI use in gastroesophageal reflux disease, dyspepsia and peptic ulcer disease: scientific report [Internet]. Ottawa: The Agency; 2007 Mar. (Optimal therapy report; vol. 1 no. 2). [cited 2007 Mar 28]. Available from: http://www.cadth.ca/media/compus/reports/compus_Scientific_Report_final.pdf

113. Canadian Agency for Drugs and Technologies in Health. Long-acting insulin analogues for the treatment of diabetes mellitus: meta-analyses of clinical outcomes [Internet]. Ottawa: The Agency; 2008 Mar. (Optimal therapy report; vol. 2 no. 1). [cited 2008 Apr 9]. Available from: http://cadth.ca/media/compus/reports/compus_Long-Acting-Insulin-Analogs-Report_Clinical-Outcomes.pdf

114. Canadian Agency for Drugs and Technologies in Health. Rapid-acting insulin analogues for the treatment of diabetes mellitus: meta-analyses of clinical outcomes [Internet]. Ottawa: The Agency; 2008 Jan. (Optimal therapy report; vol. 2 no. 2). [cited 2008 Apr 9]. Available from: http://cadth.ca/media/compus/reports/compus_Rapid-Acting-Insulin-Analogues-Report_Clinical=Outcomes.pdf

115. Canadian Agency for Drugs and Technologies in Health. Systematic review of use of blood glucose test strips for the management of diabetes mellitus [Internet]. Ottawa: The Agency; 2009. (Optimal therapy report; vol. 3 no. 2). [cited 2009 Oct 9]. Available from: http://www.cadth.ca/media/pdf/BGTS_SR_Report_of_Clinical_Outcomes.pdf

116. Canadian Agency for Drugs and Technologies in Health. Second-line therapy for patients with type 2 diabetes inadequately controlled on metformin: a systematic review and cost-effectiveness analysis [DRAFT]. Ottawa: The Agency; 2010. (Optimal therapy report; vol. 4 no. 2).

117. Bero LA, Rennie D. Influences on the quality of published drug studies. Int J Technol Assess Health Care. 1996;12(2):209-37.

Quality Assessment Tools Project Report 42

118. Jorgensen AW, Maric KL, Tendal B, Faurschou A, Gotzsche PC. Industry-supported meta-analyses compared with meta-analyses with non-profit or no support: differences in methodological quality and conclusions. BMC Med Res Methodol [Internet]. 2008 [cited 2012 Feb 13];8:60. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553412

119. Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical industry sponsorship and research outcome and quality: systematic review. BMJ. 2003 May 31;326(7400):1167-70.

120. Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Association of funding and conclusions in randomized drug trials: a reflection of treatment effect or adverse events? JAMA. 2003 Aug 20;290(7):921-8.

121. Bhandari M, Busse JW, Jackowski D, Montori VM, Schünemann H, Sprague S, et al. Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials. CMAJ. 2004 Feb 17;170(4):477-80.

122. Levitt SH, Aeppli D, Nierengarten MB. Evidence-based medicine: its effect on treatment recommendations as illustrated by the changing role of postmastectomy irradiation to treat breast cancer. Int J Radiat Oncol Biol Phys. 2003 Mar 1;55(3):645-50.

A-1

APPENDIX A: PRODUCTS OF QAT PROJECT Appendix A-1: CADTH Website Post

A-2

A-3

A-4

A-5

A-6

A-7

Appendix A-2: QAIs for RCTs — Oral Presentation in Cochrane Colloquium in 20079 and Oral Presentation in Canadian Cochrane Conference in 200810

Systematic Review of Quality Assessment Instruments for Randomized Control Trials: Selection of SIGN 50 Methodological Checklist Vijay K. Shukla, Annie Bai, Sarah Milne and George Wells Canadian Agency for Drugs and Technologies in Health and Ottawa Heart Institute Background Many quality assessment instruments (QAIs) are available for evaluating randomized control trials (RCTs). Among them, the Jadad scale is the most frequently used. In recent years, limitations to the Jadad scale have been identified and its use has been questioned by many experts. This study was undertaken to explore options for replacing the Jadad scale for the quality assessment of RCTs. Objective To identify the existing QAIs for RCTs and find the most appropriate one using a systematic approach. Method A comprehensive review article on the QAIs, Evidence Report Number 47 from the Agency for Healthcare Research and Quality (AHRQ), was identified as the starting point by the review team. This report evaluated all the QAIs identified by a systematic review of the literature from1995 to mid-2000. A multi-prong strategy was used to identify QAIs reported post mid- 2000. A comprehensive search was conducted to identify and collect further new review articles on QAIs published from 2000 to 2005. From this search, review articles reporting information on two or more QAIs regarding methodological domains, validity, and reliability were selected. The list of all the identified review articles was sent to national and international experts to determine: • if any important review articles were missing • if the experts were aware of any new QAI not covered in the listed reviews. In addition, a second literature search was performed to identify if any new QAIs had been developed since 2000. All instruments recommended by the new review articles, or identified as unique QAIs in the reviews or by experts, were selected for further evaluation using the same seven, key methodological domains utilized in the AHRQ report. The review team evaluated the highest-scoring instruments. After receiving input from national and international experts and stakeholders, the final selection of QAIs was made by the review team.

Results There were 3,006 citations identified by a highly sensitive search strategy run on PubMed, Medline, Embase, BIOSIS Previews and The Cochrane Library, as well as a targeted hand searching and grey literature searching. Five reviews of QAIs for RCTs, including the AHRQ Report (Number 47), were identified by two reviewers. Thirty-four experts (20 international and

A-8

14 Canadian) were further contacted to identify any review article or instrument. Nine experts (five international and four Canadian) responded. Seventy-three existing QAIs were identified from the five reviews after eliminating duplicates. A total of 30 QAIs (18 of 73 QAIs from the reviews, 10 from expert consultation, and two from the 1,120 citations of the second search result of the individual QAIs) were evaluated. Three QAIs, including Chalmers et al. (1996), Reisch et al.(1989), and SIGN 50 (2004), obtained perfect scores of seven using the AHRQ criteria, about study population, randomization, blinding, interventions, outcomes, statistical analysis, and funding while the Jadad scale scored 2.5. These three QAIs were further evaluated by the review team and consensus was reached for the selection of the SIGN 50 (2004) methodological checklist after considering some important descriptive factors, such as a rigorous development process. The selection decision of SIGN 50 was sent to nine experts who had responded to the previous request. Five experts gave positive feedback, two gave ambiguous feedback, one gave negative feedback, and one expert did not respond.

Conclusion Based on this work, the authors conclude that the SIGN 50 checklist is the most appropriate QAI for RCTs because: • it includes the most important methodological domains • it provides detailed instructions for application to produce overall quality assessment scores.

A-9

Appendix A-3: EGSs – Cochrane Colloquium in 200811

Systematic Review of Evidence Grading Systems for Grading Levels of Evidence Vijay K. Shukla, Annie Bai, Sarah Milne and George Wells Background To facilitate moving from evidence to recommendations, evidence grading systems (EGSs) have been developed to assess the strength of a body of evidence. This study was undertaken to systematically review existing EGSs. Objective To conduct a systematic review of EGSs and identify the most appropriate one for grading levels of evidence. Methods The Evidence Report Number 47 from the Agency for Healthcare Research and Quality (AHRQ) evaluated the EGSs developed between 1979 and 2001. This comprehensive review was used as the starting point for this study. A multi-prong strategy was used to identify EGSs reported since the AHRQ search date (mid- 2000), including a comprehensive search for new review articles on EGSs and individual EGSs developed from 2000 to September 2007, as well as consultation with nine experts in the area. EGSs recommended in review articles and unique EGSs from the reviews without recommendations were identified and evaluated using the AHRQ criteria for quality, quantity, and consistency. A list of the highest scoring EGSs was sent to experts for consultation and the preferred ESGs were identified based on experts’ feedback. Results Four reviews (including AHRQ report) of EGSs were identified from 3,006 citations and, after eliminating duplicates, 51 existing EGSs were identified from these reviews. Fourteen of 51 EGSs plus nine additional EGSs identified by experts and one identified from the 1,120 citations from the search for individual EGSs were evaluated. Six EGSs obtained highest scores and were sent to the experts for ranking by preference. Four experts ranked GRADE (2004) as the preferred ESG while another four experts ranked SIGN 50 (2004) as the preferred ESG; one expert did not respond to the survey. After considering the rigorous development process, user-friendliness, and option of incorporating systematic reviews or meta-analyses as a level of evidence, there was consensus of the authors that the GRADE and SIGN 50 systems be recommended as the preferred ESGs. Conclusion Based on this work, the authors conclude that the GRADE and SIGN 50 systems are the most appropriate EGSs for use in grading evidence for the purpose of recommendations.

A-10

APPENDIX B: SEARCH STRATEGY Appendix B-1: QAT 1 Search Strategy for Review Articles

OVERVIEW Interface: Dialog Databases: The Cochrane Library

EMBASE MEDLINE BIOSIS Previews Note: Subject headings have been customized for each database. Duplicates between databases were removed in Dialog.

Date of Search: February 2005 Search Update: Focused update run September 25, 2007, see Appendix B-3 for details. Study Types: Systematic reviews Limits: Publication years 2000 - Feb. 2005

No language restriction SYNTAX GUIDE /de At the end of a phrase, searches the phrase as a subject heading #n Adjacency within # number of words (in any order) ! Explode a subject heading /ti,ab Words or phrases in title or abstract

Multi-Database Strategy 1. bias (epidemiology)!/de 2. (validation process OR analytical error)/de 3. (bias OR validation OR validity)/de 4. Data interpretation, statistical/de 5. Statistical analysis!/de 6. (statistical analysis OR data analysis)/de 7. reproducibility of results/de 8. reproducibility/de 9. (variability OR reproducibility)/de 10. research design!/de 11. experimental design/de 12. (methodology OR methodological approach)/de 13. sensitivity and specificity!/de 14. (sensitivity OR specificity)/de 15. technology assessment, biomedical!/de 16. biomedical technology assessment/de 17. s1:s16 18. ((evaluate OR evaluates OR evaluation OR evaluating OR grading OR grade OR grades strength OR quality OR levels)/ti,ab) (3n) (methodology OR methodologies OR methodological OR evidence OR recommendations OR assess OR assesses OR assessment OR assessing)/ti,ab 19. s17 AND s18 20. [CADTH systematic review filter] 21. s19 AND s20 22. s21/2000:2005

A-11

OTHER DATABASES PubMed 1. "study quality" OR "quality assessment" OR "quality markers" OR "strength of

evidence" OR "grading evidence" OR "levels of evidence" OR "critical appraisal" 2. in process [filter] OR publisher [filter] 3. s1 AND s2 4. [CADTH systematic review filter] 5. s3 AND s4 6. Limit: 2000:2005

A-12

Appendix B-2: QAT 2 Search Strategy for Individual QAIs and EGSs

OVERVIEW Interface: Dialog Databases: EMBASE

MEDLINE BIOSIS Previews Note: Subject headings have been customized for each database. Duplicates between databases were removed in Dialog.

Date of Search: August 11, 2005 Search updates: Focused update run September 25, 2007, see Appendix B-5 for details. Study Types: All Limits: Publication years 2000 – August 11, 2005

SYNTAX GUIDE /de At the end of a phrase, searches the phrase as a subject heading /maj At the end of a phrase, searches the phrase as a major subject heading #n Adjacency within # number of words (in any order ! Explode a subject heading /ti,ab Words or phrases in title or abstract MeSH conversion:

MeSH (154) Emtree (72) Biosis (55) meta analysis meta analysis Meta-analysis review literature systematic review cohort studies cohort analysis Cohort study case control studies methodology Case-control studies randomized controlled trials/st Randomized controlled trial;

TOO GENERAL Randomized controlled trial TOO GENERAL

randomized controlled trials/mt evidence based medicine/cl evidence based medicine/st

A-13

Multi-Database Strategy S1 (quality()assessment (2N) (tool OR tools OR instrument OR instruments OR checklists OR checklist OR check()list OR check()lists OR guidelines OR guideline OR scale OR scales))/ti,ab S2 meta-analysis/maj from 154 S3 meta analysis/maj from 72 S4 meta-analysis/de from 55 S5 review literature/maj from 154 S6 systematic review/maj from 72 S7 cohort studies/maj from 154 S8 cohort analysis/maj from 72 S9 cohort study/de from 55 S10 case-control studies/maj from 154 S11 methodology/maj from 72 S12 case-control studies/de from 55 S13 randomized controlled trials(l)st/maj from 154 S14 randomized controlled trials(l)mt/maj from 154 S15 evidence-based medicine(l)cl/maj from 154 S16 evidence-based medicine(l)st/maj from 154 S17 ((rating OR grade OR grading OR score OR scoring OR checklist OR checklists OR measure OR measuring OR assessing OR assess) (2N) ((level OR levels OR hierarchy OR hierarchies) (2N) evidence))/ti,ab S18 s1:s17 S19 quality/ti,ab S20 s18 AND s19 S21 s20/2000:2005

OTHER DATABASES PubMed "quality assessment tool" [tiab] OR "quality assessment tools" [tiab] OR "quality assessment

instrument" [tiab] OR "quality assessment instruments" [tiab] OR "quality assessment checklists" [tiab] OR "quality assessment check list" [tiab] OR "quality assessment check lists" [tiab] OR "quality assessment checklist" [tiab] OR "quality assessment guidelines" [tiab] OR "quality assessment guideline" [tiab] OR "quality assessment scale" [tiab] OR "quality assessment scales" [tiab] OR (meta analysis[MeSH Major Topic] OR review literature [mesh major topic] OR cohort studies[MeSH Major Topic] OR case control studies[MeSH Major Topic] OR randomized controlled trials/st[MeSH Major Topic] OR randomized controlled trials/mt[MeSH Major Topic]) AND quality [tiab] OR (in process [filter] OR publisher[filter]) AND "quality assessment" [tiab] AND (tool [tiab] OR tools [tiab] OR instrument [tiab] OR instruments [tiab] OR checklists [tiab] OR checklist [tiab] OR "check list" [tiab] OR "check lists" [tiab] OR guidelines [tiab] OR guideline [tiab] OR scale [tiab] OR scales[tiab]) OR (evidence based medicine/cl [mesh major topic] OR evidence based medicine/st [mesh major topic]) AND quality [tiab] OR ((rating [tiab] OR grade [tiab] OR grading [tiab] OR score [tiab] OR scoring [tiab] OR checklist [tiab] OR checklists [tiab] OR measure [tiab] OR measuring [tiab] OR assessing [tiab] OR assess [tiab]) AND ("level of evidence" [tiab] OR "levels of evidence" [tiab] OR "hierarchy of evidence" [tiab] OR "hierarchies of evidence" [tiab])) AND quality [tiab] Limit: 2000-2005

A-14

OTHER DATABASES

Cochrane Library

#1 MeSH [keywords] meta-analysis review literature cohort studies case control studies randomized controlled trials/st randomized controlled trials/mt evidence-based medicine/cl evidence-based medicine/st #2 Title/abstract search (rating OR grade OR grading OR score OR scoring OR checklist OR checklists OR measure OR measuring OR assessing OR assess) AND ("level of evidence" OR "levels of evidence" OR "hierarchy of evidence" OR "hierarchies of evidence") #3 Title/abstract search quality #4 COMBINING (#1 OR #2) AND #3 #5 Title/abstract search "quality assessment tool" OR "quality assessment tools" OR "quality assessment instrument" OR "quality assessment instruments" OR "quality assessment checklists" OR "quality assessment check list" OR "quality assessment check lists" OR "quality assessment checklist" OR "quality assessment guidelines" OR "quality assessment guideline" OR "quality assessment scale" OR "quality assessment scales" #6 COMBINING #4 OR #5

A-15

Appendix B-3: Updating QAT 1 Search Strategy

OVERVIEW Interface: OVID Databases: The Cochrane Library

EMBASE MEDLINE BIOSIS Previews Note: Subject headings have been customized for each database. Duplicates between databases were removed in OVID.

Date of Search Update:

September 25, 2007

Study Types: Systematic reviews Limits: Publication years Feb. 2005 – Sep. 25, 2007

No language restriction SYNTAX GUIDE / At the end of a phrase, searches the phrase as a subject heading .sh At the end of a phrase, searches the phrase as a subject heading exp Explode a subject heading * Before a word, indicates that the marked subject heading is a primary topic;

or, after a word, a truncation symbol (wildcard) to retrieve plurals or varying endings # Truncation symbol for one character ? Truncation symbol for one or no characters only ADJ# Adjacency within # number of words (in any order) .ti Title .ab Abstract .hw Heading Word; usually includes subject headings and controlled vocabulary .pt Publication type .rn CAS registry number

Multi-Database Strategy #1 Medline "Bias (Epidemiology)"/ OR Data Interpretation, Statistical/ OR "Reproducibility of Results"/ OR Research Design/ OR "Sensitivity and Specificity"/ OR Technology Assessment, Biomedical/ #2 Embase *validation process/ OR *analytical error/ or *Statistical Analysis/ or *Data Analysis/ or *reproducibility/ or *Experimental Design/ or *Methodology/ or *Sensitivity analysis/ or *"Sensitivity and specificity/ or *Biomedical technology assessment// or *Validation therapy/ or *Theory validation/ or *Validation study/ or *Instrument validation/ *Nonresponse bias/ or *Gender bias/ or *Observer bias/ or *external bias/ or *recall bias/ or *cultural bias/ or *internal bias/ or *Central tendency bias/ or *Interview bias/ *external validity/ or *validity/ or *content validity/ or *face validity/ or *predictive validity/ or *construct validity/ or *internal validity/ or *consensual validity/ or *criterion related validity/ or *concurrent validity/ or *qualitative validity/ or *discriminant validity/ #3 Biosis (bias or validation or validity or (statistical adj1 analysis) or methodology or methodological or (technolog$ adj1

A-16

Multi-Database Strategy assessment$) or (research adj1 design) or reproducibility or (experimental adj1 design) or sensitivity or specificity).mi,hw. #4 Cochrane ("Bias (Epidemiology)" or "Data Interpretation, Statistical" or "Reproducibility of Results" or "Research Design" or "Sensitivity and Specificity" or "Technology Assessment, Biomedical").kw. (bias or validation or validity or (statistical adj1 analysis) or methodology or methodological or (technolog$ adj1 assessment$) or (research adj1 design) or reproducibility or (experimental adj1 design) or sensitivity or specificity).kw,ti. #1 or #2 or #3 or #4 AND #5 Keyword search ((evaluate or evaluates or evaluation or evaluating or grading or grade or grades or strength or quality or levels) adj3 (methodology or methodologies or methodological or evidence or recommendations or assess or assesses or assessment or assessing)).ti,ab. AND [CADTH Systematic Review filter]

OTHER DATABASES PubMed #1 “Study quality” OR “quality assessment” OR “Quality markers” OR “strength or

evidence” OR “grading evidence” OR “levels of evidence” OR “critical appraisal” OR “quality assessment” #2 in process[filter] OR publisher[filter] #3 (#1 AND #2) #4 systematic[sb] OR (meta-analysis[pt] OR meta-analysis[tw] OR metanalysis[tw]) OR meta analy*[Title/Abstract] OR metaanaly*[Title/Abstract] OR met analy*[Title/Abstract] OR metanaly*[Title/Abstract] OR integrative research[Title/Abstract] OR integrative review*[Title/Abstract] OR integrative overview*[Title/Abstract] OR research integration*[Title/Abstract] OR research overview*[Title/Abstract] OR collaborative review*[Title/Abstract] OR collaborative overview*[Title/Abstract] OR systematic review*[Title/Abstract] OR health technology assessment*[tiab] OR "Technology Assessment, Biomedical"[mh] OR HTA*[tiab] OR "Cochrane Database Syst Rev"[Journal:__jrid21711] #5 (#3 AND #4) #6 Bias epidemiology[Majr] OR Data interpretation, statistical[Majr] OR Reproducibility of results[MeSH] OR Research design[Majr] OR Sensitivity and specificity[Majr] OR Technology Assessment, Biomedical[Majr] OR Observer Variation[Mesh] #7 (Evaluate[tiab] OR evaluates[tiab] OR evaluation[tiab] OR evaluating[tiab] OR grading[tiab] OR grade[tiab] OR grades[tiab] OR strength[tiab] OR quality[tiab] OR

A-17

OTHER DATABASES levels[tiab]) AND (methodology[tiab] OR methodologies[tiab] OR methodological[tiab] OR evidence[tiab] OR recommendations[tiab] OR assess[tiab] OR assesses[tiab] OR assessment[tiab] OR assessing[tiab]) #8 systematic[sb] OR (meta-analysis[pt] OR meta-analysis[tw] OR metanalysis[tw]) OR meta analy*[Title/Abstract] OR metaanaly*[Title/Abstract] OR met analy*[Title/Abstract] OR metanaly*[Title/Abstract] OR integrative research[Title/Abstract] OR integrative review*[Title/Abstract] OR integrative overview*[Title/Abstract] OR research integration*[Title/Abstract] OR research overview*[Title/Abstract] OR collaborative review*[Title/Abstract] OR collaborative overview*[Title/Abstract] OR systematic review*[Title/Abstract] OR health technology assessment*[tiab] OR "Technology Assessment, Biomedical"[mh] OR HTA*[tiab] OR "Cochrane Database Syst Rev"[Journal:__jrid21711] #9 (#6 AND #7 AND #8) #10 (#5 OR #9) Limit to Feb 2005 to 2007

A-18

Appendix B-4: Validation of QAT 1 Search Strategy

Database Searched

Original Search 2000-Feb. 2005

2007 Re-Running Slightly Modified Search (2000-Feb. 2005) (All 7 Included Studies From The Original Search Were Retrieved)

Updating Search 2005-Sept. 2007 (Using Slightly

Modified Version) Pubmed 1967 875 779 Embase 865 (845 Kept) 625 (this search was focused to main

subject headings only) 15

Biosis 23 36 29 Cochrane 225 (nothing kept:

Cochrane systematic reviews retrieved through PubMed)

4 10 (9 items from dare, 1 item from Cochrane

dsr)

A-19

Appendix B-5: Updating QAT 2 Search Strategy

OVERVIEW Interface: OVID Databases: EMBASE

BIOSIS Previews Note: Subject headings have been customized for each database. Duplicates between databases were removed in OVID.

Date of Search Update:

September 25, 2007

Study Types: All Limits: Publication years Aug 2005 – Sep. 25, 2007

No language restriction SYNTAX GUIDE / At the end of a phrase, searches the phrase as a subject heading .sh At the end of a phrase, searches the phrase as a subject heading exp Explode a subject heading * Before a word, indicates that the marked subject heading is a primary topic;

or, after a word, a truncation symbol (wildcard) to retrieve plurals or varying endings # Truncation symbol for one character ? Truncation symbol for one or no characters only ADJ# Adjacency within # number of words (in any order) .ti Title .ab Abstract .hw Heading Word; usually includes subject headings and controlled vocabulary .pt Publication type .rn CAS registry number

Multi-Database Strategy 1. (meta-analysis or cohort study or case-control studies).mi,hw. 2. 1 use b7o89 3. *meta analysis/ or *systematic review/ or *cohort analysis/ or *methodology/ 4. 3 use emef 5. ((rating or grade or grading or score or scoring or checklist or checklists or measure or measuring or assessing or assess) adj2 ((level or levels or hierarchy or hierarchies) adj2 evidence)).ti,ab. 6. 2 or 4 or 5 7. Quality.ti,ab. 8. 6 and 7 9. (quality adj2 assessment adj3 (tool or tools or instrument or instruments or checklists or checklist or (check adj1 list) or (check adj1 lists) or guidelines or guideline or scale or scales)).ti,ab. 10. 8 or 9 11. remove duplicates from 10 12. 11 use emef 13. limit 11 to yr="2005 - 2008"

A-20

OTHER DATABASES PubMed "quality assessment tool" [tiab] OR "quality assessment tools" [tiab] OR "quality assessment

instrument" [tiab] OR "quality assessment instruments" [tiab] OR "quality assessment checklists" [tiab] OR "quality assessment check list" [tiab] OR "quality assessment check lists" [tiab] OR "quality assessment checklist" [tiab] OR "quality assessment guidelines" [tiab] OR "quality assessment guideline" [tiab] OR "quality assessment scale" [tiab] OR "quality assessment scales" [tiab] OR (meta analysis[MeSH Major Topic] OR review literature [mesh major topic] OR cohort studies[MeSH Major Topic] OR case control studies[MeSH Major Topic] OR randomized controlled trials/st[MeSH Major Topic] OR randomized controlled trials/mt[MeSH Major Topic]) AND quality [tiab] OR (in process [filter] OR publisher[filter]) AND "quality assessment" [tiab] AND (tool [tiab] OR tools [tiab] OR instrument [tiab] OR instruments [tiab] OR checklists [tiab] OR checklist [tiab] OR "check list" [tiab] OR "check lists" [tiab] OR guidelines [tiab] OR guideline [tiab] OR scale [tiab] OR scales[tiab]) OR (evidence based medicine/cl [mesh major topic] OR evidence based medicine/st [mesh major topic]) AND quality [tiab] OR ((rating [tiab] OR grade [tiab] OR grading [tiab] OR score [tiab] OR scoring [tiab] OR checklist [tiab] OR checklists [tiab] OR measure [tiab] OR measuring [tiab] OR assessing [tiab] OR assess [tiab]) AND ("level of evidence" [tiab] OR "levels of evidence" [tiab] OR "hierarchy of evidence" [tiab] OR "hierarchies of evidence" [tiab])) AND quality [tiab] Limit: Aug 2005-2007

Cochrane Library

#1 (“meta-analysis” or "Review Literature" or “cohort studies” or “case-control studies” or “randomized controlled trials” or “evidence-based medicine”).kw #2 ((rating OR grade OR grading OR score OR scoring OR checklist OR checklists OR measure OR measuring OR assessing OR assess) AND ("level of evidence" OR "levels of evidence" OR "hierarchy of evidence" OR "hierarchies of evidence")).ti,ab. #3 Quality.ti,ab #4 (1 OR 2) AND 3 #5 ("quality assessment tool" OR "quality assessment tools" OR "quality assessment instrument" OR "quality assessment instruments" OR "quality assessment checklists" OR "quality assessment check list" OR "quality assessment check lists" OR "quality assessment checklist" OR "quality assessment guidelines" OR "quality assessment guideline" OR "quality assessment scale" OR "quality assessment scales").ti,ab #6 #4 OR #5

A-21

Appendix B-6: Validation of QAT 2 Search Strategy

Database Searched

Original Search 2000-Aug. 2005

Re-Running Original Search (2000-Aug. 2005) In 2007

Updating Search 2005-Sept. 2007

Pubmed 663 676 416 after removing duplicates Embase 68 87 (after removing duplicates -

not able to limit to aug 2005 so all of 2005 included)

100 after removing duplicates

Biosis 258 278 (after removing duplicates - not able to limit to aug 2005 so

all of 2005 included)

260 after removing duplicates

Cochrane 177 1207 (Cochrane has been retroactively populating their

database)

not done

A-22

APPENDIX C: SELECTION CRITERIA Appendix C-1: QAT 1 Selection Criteria for Selecting Review Articles

The First Level Selection Criteria of QAT 1

General selection criteria • Formal evaluation on existing quality assessment instruments (QAIs) or evidence grading

systems (EGSs). (The formal evaluation means that the existing QAIs or EGSs were reviewed with domains, validity, and reliability.)

Inclusion criteria • Review articles of evaluating existing QAIs for systematic reviews/meta-analysis, RCTs and

observational studies or EGSs. • Review articles containing the comparison or evaluation of at least two existing QAIs or

EGSs, even though they were specific to some clinical research questions or focused on development of a new QAI or EGS.

• Articles about principles, methods, and tools of methodological quality assessment and critical appraisal.

• Articles generally introducing the following: o evidence-based medicine o methodological issues o level of evidence o types of research design o biases o development of evidence-based practice guidelines.

• Relevant titles without abstracts.

Exclusion criteria • SRs about specific diseases, treatment, or technique and only containing the development,

test, and application of one or two QAIs or EGSs. • Instruments or tools introduced for the following:

o outcome measurements o quality of life o care delivery o risk assessment o disease diagnosis.

• Analysis of quality of published studies using one QAI. • Quality assessment of guideline implementations. • Statistical methods. • Economic models. • Survey, individual RCT /OBS, or pilot study containing QAI application. • Evidence-based laboratory medicine /animal study /gene research.

A-23

The Second Level Selection Criteria of QAT 1 (for selecting review articles)

General selection criteria • Formal evaluation on existing QAIs or EGSs.

(The formal evaluation means that the existing QAIs or EGSs were reviewed with domains, validity, and reliability.)

Inclusion criteria • SRs of evaluating existing QAIs for SRs/MAs, RCTs and OBSs or EGSs. • Articles containing comparison or evaluation on at least two existing QAIs or EGSs as a part

of the article. Exclusion criteria • General knowledge (e.g., principles, concepts, and methods) of evidence-based medicine,

critical appraisal, systematic reviews, and meta-analysis. • Introduction, development, or application of one individual QAI or EGS. • Methodological issues other than quality assessment of study and evidence grading. • General knowledge of guidelines, development of guideline approaches, or assessment of

guidelines. • Duplicates. • Others; e.g., SRs of QAIs for diagnostic studies or literature search.

A-24

Appendix C-2: QAT 1 Screening Questions for non-English Articles

The following question list is for the reviewers to help us check non-English articles. • Are there two or more quality assessment instruments or grading systems of evidence

mentioned in the article? • If so, is there a comparison of above instruments with different methodological domains and

tests of validity and reliability? • If so, is this article a systematic review of quality assessment instruments or grading systems

of evidence, or a systematic review of a specific disease, treatment or technique?

If the answer is “yes” for question 1 and 2, and “systematic reviews of QAIs and EGSs” for questions 3, the article will be translated fully and reviewed in second level selection.

A-25

Appendix C-3: QAT 2 Selection Criteria for Selecting Individual QAIs and EGSs

Quality assessment instruments (QAIs)

Inclusion criteria • Type of study

o Systematic reviews o Randomized control trials o Non-randomized control trials / quasi-experimental studies (including controlled clinical trial [CCT], controlled before and after [CBA], before

and after [BA]). o Observational studies (cohort, case-control/cross-sectional).

• Checklists or scales. • Generic use. • (Instruments could be used to assess the quality of any study of the type considered.*)

Exclusion criteria • Guidance or document. • (Publication in which study quality is defined or described, but does not provide an

instrument that could be used for evaluative application.*)

* Reference: Page 33 and 41 of the AHRQ report1

• Specific use. • (Instrument is designed to be used to assess study quality for a particular type of outcome,

intervention, exposure, test, etc.*) • Components of QA.

Evidence grading systems (EGSs)

Inclusion criteria • Generic schemes for grading the strength of entire bodies of scientific knowledge.*

Exclusion criteria • Specific use system.

A-26

APPENDIX D: LITERATURE SELECTION Appendix D-1: QAT 1 QUOROM Flow Chart

Excluded 2,861

(Refer to first Level exclusion criteria reason)

141 Potentially Relevant articles

(18 non English) Excluded 134 (116 English and18 non English)

• 43 — General knowledge of critical

appraisal, systematic review and meta-analysis

• 29 — Methodological issues; e.g., general knowledge of QA

• 28 — Individual QAIs, EGSs or other evidence-based approaches

• 18 — Guideline issues; e.g., general knowledge of guidelines, or guideline assessment

• 4 — Duplicates • 12 — Others; e.g., QAIs for

diagnostic studies or publication before 2000

Add 4 Review references

(References from the 141 potentially relevant articles)

Included 7 articles

Included 11 articles

3,002 citations from literature search (methodology review articles)

Medline, Embase, Biosis Preview: 2,779 Cochrane library: 209 (after removing 16 duplicates) Grey literature (HTA checklist): 14

2 partial duplicates

Included 9 unique review articles

First Level Selection Title and Abstract

Second Level Selection Full Text

A-27

Appendix D-2: QAT 1 Inclusion List of Review Articles

1. Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches. The GRADE Working Group. BMC Health Serv Res [Internet]. 2004 Dec 22 [cited 2005 Sep 8];4(1):38. Available from: http://www.pubmedcentral.gov/picrender.fcgi?artid=545647&blobtype=pdf

2. Brouwers MC, Johnston ME, Charette ML, Hanna SE, Jadad AR, Browman GP. Evaluating the role of quality assessment of primary studies in systematic reviews of cancer practice guidelines. BMC Med Res Methodol [Internet]. 2005 Feb 16 [cited 2005 Aug 29];5(1):8. Available from: http://www.biomedcentral.com/content/pdf/1471-2288-5-8.pdf

3. Colle F, Rannou F, Revel M, Fermanian J, Poiraudeau S. Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain. Arch Phys Med Rehabil. 2002 Dec;83(12):1745-52.

4. Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii-173.

5. Katrak P, Bialocerkowski AE, Massy-Westropp N, Kumar S, Grimmer KA. A systematic review of the content of critical appraisal tools. BMC Med Res Methodol [Internet]. 2004 Sep 16 [cited 2005 Sep 8];4(1):22. Available from: http://www.biomedcentral.com/1471-2288/4/22

6. Liberati A, Buzzetti R, Grilli R, Magrini N, Minozzi S. Which guidelines can we trust?: Assessing strength of evidence behind recommendations for clinical practice. West J Med. 2001 Apr;174(4):262-5.

7. Saunders LD, Soomro GM, Buckingham J, Jamtvedt G, Raina P. Assessing the methodological quality of nonrandomized intervention studies. West J Nurs Res. 2003 Mar;25(2):223-37

8. Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews: the QUOROM statement compared to other tools. 2nd ed. In: Egger M, Smith GD, Altman DG, editors. Systematic reviews in health care: meta-analysis in context. London: BMJ Publishing Group; 2005. p. 122-39. Chapter 7.

9. West S, King V, Carey TS, Lohr KN, McKoy N, Sutton SF, et al. Systems to rate the strength of scientific evidence [Internet]. Rockville (MD): Agency for Healthcare Research and Quality; 2002 Mar. AHRQ Publication No 02-E016. [cited 2005 Mar 10]. (Evidence report/technology assessment no 47). Available from: http://www.thecre.com/pdf/ahrq-system-strength.pdf

A-28

Appendix D-3: QAT 2 QUOROM Flow Chart

Excluded 675

445 Potentially Relevant Excluded 207 • 86 — No QAI or EGS • 47 — Component of

QA • 34 — Specific use • 17 — Guidance

documents • 10 — Study type not

of interest • 7 — Duplicates • 6 — Review articles

included in QAT 1 Identify all references related to QAI or EGS in

238 included articles

Included 238

416 references related to QAIs or EGSs

115 additional references of QAIs or EGSs

1,120 citations from literature search

301 on the reference lists

81 references of QAIs and EGSs

34 Duplicates

77 Excluded • 24 — Specific use • 18 — Already on

existing list • 16 — No QAI or EGS • 5 — Unavailable • 5 — Review articles

included in the AHRQ report1

• 3 — Guidance documents

• 3 — Component of QA • 3 — Others; e.g.,

duplicates

3 Instruments and 2 systems selected • 3 for RCTs • 2 EGS

32 tables directly from the full texts of

included studies

5 Instruments selected • 3 for RCTs • 2 for SRs

1 Canadian EGS added

8 Instruments and 2 systems selected

• 2 for SRs • 6 for RCTs • 2 EGS

27 Excluded • 15 — Specific

use • 4 — Component

of QA • 4 — Already on

existing list • 4 — Others, e.g.

guidance

First Level Selection (Title and Abstract)

Second Level Selection (Full Text)

Compare with the reference lists of QAT 1 and expert consultation

A-29

Appendix D-4: Updating QAT 1 QUOROM Flow Chart

Excluded 785

40 Potentially Relevant articles (6 non English)

Excluded 29

Included 8 review articles

825 citations from literature search (2005 – Sept. 2007)

(methodology review articles)

4 un-translated (2 German, 2 Dutch)

38 additional references about QAIs or EGSs identified

Comparing the references of QAIs or EGSs evaluated in the 8 review articles with the existing reference lists identified from the original QAT

All of 38 were published before 2005 No additional QAIs or EGSs identified

Added 1 review article from updating QAT 2

First Level Selection (Title and Abstract)

Second Level Selection (Full Text)

A-30

Appendix D-5: Updating QAT 2 QUOROM Flow Chart (1)

Excluded 482

294 Potentially Relevant articles (9 non English)

3 unavailable 2 duplicate 1 not translated (German)

Included 133 articles

776 citations from literature search from 2005 to Sept. 2007 (individual instruments)

Excluded 155

103 additional references (including 36 duplicates) about

QAIs or EGSs identified

Comparing the references of QAIs or EGSs identified in second level selection with the existing reference lists identified from the original QAT

17 of 67 references were published after 2005

No one included for further evaluation

6 of 17 were excluded by abstracts: • 1 guidance • 1 existing one in

Spanish article • 2 web link inaccessible • 2 QA components

11 of 17 were excluded by full articles: • 7 Specific ones • 3 Existing ones • 1 Pilot study

Checking the tables or appendices of QAIs or EGSs identified in second level selection using QAT2 selection criteria

First Level Selection (Title and Abstract)

Second Level Selection (Full Text)

A-31

Appendix D-6: Updating QAT 2 QUOROM Flow Chart (2)

Excluded 482

294

Potentially Relevant articles (9 non English)

• 3 unavailable • 2 duplicate • 4 not translated

(1 German, 1 Italian, 2 Spanish)

Included 121 articles

776 citations from literature search (2005 – Sept. 2007)

(individual instruments)

164 excluded

56 new references about QAIs or EGSs identified

• Comparing the tables, appendices, and references of QAIs or EGSs identified in second level selection with the first reviewer’s selection and identifying additional ones.

• Comparing the additional ones above with the existing reference lists identified from the original QAT

15 of 56 references were published in 2005 or later

No one included for further evaluation

6 of 15 were excluded by abstracts: • 3 QAIs before Aug.

2005 (ending date of QAT2)

• 1 Specific QAI • 1 Component QAI • 1 No QAI

9 of 15 were excluded by full articles: • 6 Specific QAIs • 2 Simple EGSs • 1 Complex EGS

First Level Selection (Title and Abstract)

Second Level Selection (Full Text)

A-32

APPENDIX E: REFERENCE LISTS OF QAIs AND EGSs Appendix E-1: Reference List of QAIs for SRs

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

1 Oxman and Guyatt

1991 Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44:1271-1278

9-4 Excluded -1

Oxman et al. 1991 Oxman AD, Guyatt GH, Singer J, et al. Agreement among reviewers of review articles. J Clin Epidemiol. 1991;44:91-98

9-5, 5-20

2 Irwig et al. 1994 Irwig L, Tosteson AN, Gatsonis C, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med. 1994 Apr 15;120:667-676.

9-6,5-36 Recommended by the AHRQ report

3 Sacks et al. 1996 Sacks HS, Reitman D, Pagano D, Kupelnick B. Meta-analysis: an update. Mt Sinai J Med. 1996;63:216-224.

9-7,5-21 Recommended by the AHRQ report

4 Auperin et al. 1997 Auperin A, Pignon JP, Poynard T. Review article: critical review of meta-analyses of randomized clinical trials in hepatogastroenterology. Alimentary Pharmacol Ther. 1997;11:215-225.

9-8,5-16,8-43 Recommended by the AHRQ report

5 Beck 1997 Beck CT. Use of meta-analysis as a teaching strategy in nursing research courses. J Nurs Educ. 1997;36:87-90.

9-9,5-18 Excluded-1

6 Smith 1997 Smith AF. An analysis of review articles published in four anaesthesia journals. Can J Anaesth. 1997;44:405-409.

9-10,5-22 Excluded-1

7 Barnes and Bero 1998 Barnes DE, Bero LA. Why review articles on the health effects of passive smoking reach different conclusions. JAMA.1998;279:1566-1570.

9-3 Recommended by the AHRQ report

8 Clarke and Oxman

1999 Clarke M., Oxman AD. Cochrane reviewer's handbook 4.0. The Cochrane Collaboration; 1999.

9-11 Excluded-1

9 Khan et al. 2000 Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness. CRD's guidance for carrying out or commissioning reviews: York (UK): University of York, NHS Centre for Reviews and Dissemination; 2000.

9-12 Recommended by the AHRQ report

10 New Zealand Guidelines Group

2000 New Zealand Guidelines Group. Tools for guideline development & evaluation [Internet]. Wellington: N ZGG; 2000. [cited 2000 Jul 10].

9-13 Excluded-1

A-33

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

Available from: http://www.nzgg.org.nz/. 11 Harbour and

Miller 2001 Harbour R, Miller J. A new system [Scottish Intercollegiate

Guidelines Network (SIGN)] for grading recommendations in evidence based guidelines. BMJ. 2001;323:334-336.

9-14 Excluded-1

12 Oxman et al. 1994 Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature. VI. How to use an overview. Evidence-Based Medicine Working Group. JAMA 1994;272:1367-1371

9-15,5-33 Excluded-1

13 Cook 1995 Cook DJ, Sackett DL, Spitzer WO. Methodologic guidelines for systematic reviews of randomized control trials in health care from the Potsdam Consultation on Meta-Analysis. J Clin Epidemiol. 1995;48:167-171

9-16,5-28,8-22 Excluded-1

14 Cranney 1997 Cranney A, Tugwell P, Shea B, Wells G. Implications of OMERACT outcomes in arthritis and osteoporosis for Cochrane metaanalysis. J Rheumatol. 1997;24:1206-1207.

9-17,5-29 Excluded-1

15 de Vet et al. 1997 de Vet HCW, de Bie RA, van der Heijden GJMG, Verhagen AP, Sijpkes P, Kipschild PG. Systematic reviews on the basis of methodological criteria. Physiotherapy. June 1997;83(6):284-289.

9-18 Excluded-1

16 Pogue and Yusuf 1998 Pogue J, Yusuf S. Overcoming the limitations of current meta-analysis of randomised controlled trials. Lancet. 1998;351:47-52.

9-19,5-34,8-37 Excluded-1

17 Sutton et al. 1998 Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Systematic review of trials and other studies. Health Technology Assess. 1998;2:1-276.

9-20 Excluded-1

18 Moher et al. 1999 Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup Df. Improving the quality of reports of meta-analysis of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354:1896-1900.

9-21,5-32 Excluded-1

19 NHMRC 2000 National Health and Medical Research Council (NHMRC). How to review the evidence: assessment and application of scientific evidence. Canberra, Australia: NHMRC; 2000.

9-22,5-2 Excluded-1

20 Stroup et al. 2000 Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008-2012

9-23,5-35 Excluded-1

A-34

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

21 Assendelft et al. 1995 Assendelft WJJ, Koes B, Knipschild PG, Bouter LM. The relationship between methodological quality and conclusions in reviews of spinal manipulation. JAMA 1995;274:1942-8.

8-42 Excluded-6

22 Blettner et al. 1999 Blettner M, Sauerbrei W, Schlehofer B, Scheuchenpflug T, Friedenreich C. Traditional reviews, meta-analysis and pooled analyses in epidemiology. Int J Epidemiol 1999;28:1-9.

8-21 Excluded-6

23 Carruthers et al. 1993 Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL. Report of the Canadian Hypertension Society Consensus Conference:1.Introduction. Can Med Assoc J. 1993;149:289-293.

5-19 Excluded-3

24 Charnock 1998 Charnock DF (Ed). The DISCERN Handbook: Quality criteria for consumer health information on treatment choices. Radcliffe Medical Press. 1998.

5-116 Excluded-3

25 Clarke and Oxman

2003 Clarke M, Oxman AD. Cochrane reviewer's handbook 4.2.0. The Cochrane Collaboration;2003.

5-4 Excluded-7

26 Crombie 1996 Crombie IK. The pocket guide to critical appraisal: A handbook for health care professionals. London: BMJ Publishing Group. 1996.

5-5 Evaluated Identified by the QAT working group for further evaluation

27 FOCUS 2001 FOCUS critical appraisal tool. London: The Royal College of Psychiatrists; 2001

5-27 Evaluated Identified by the QAT working group for further evaluation

28 Geller and Proschan

1996 Geller NL, Proschan M. Meta-analysis of clinical trials: a consumer's guide. J Biopharmaceut Stat 1996;6:377-394.

8-23 Excluded-6

29 Goldschmidt 1986 Goldschmidt PG. Information synthesis: a practical guide. Health Serv Res. 1986;21:215-37.

8-24 Evaluated Identified by the QAT working group for further evaluation

30 Greenhalgh 1997 Greenhalgh T. How to read a paper: papers that summarize other papers (systematic reviews and meta-analyses). BMJ. 1997;315:672-675.

5-15,8-25 Excluded-6

31 Guyatt et al. 1995 Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ. Users's guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based

5-30 Excluded-3

A-35

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

Medicine Working Group. JAMA. 1995;274:1800-1804. 32 Gyorkos et al. 1994 Gyorkos TW, Tannenbaum TN, Abrahamowicz M et al. An

approach to the development of practice guidelines for community health interventions. Can J Public Health. Revue Canadienne De Santé Publique. 1994;85 Suppl 1:S8-13.

5-31 Excluded-8

33 Joanna Briggs 1999 RAPid: Rapid appraisal protocol internet database. Adelaide: The Joanna Briggs Institute; 1999.

5-3 Evaluated Identified by the QAT working group for further evaluation

34 L'Abbe et al. 1987 L'Abbe KA, Detsky AS, O'Rourke K. Meta-analysis in clinical research. Ann Intern Med. 1987;107:224-233.

5-23,8-27 Excluded-6

35 Light and Pillemer

1984 Light RJ, Pillemer DB. The science of reviewing research. Cambridge, MA: Harvard University Press 1984:160-86

8-28 Excluded-6

36 Meinert 1989 Meinert CL. Meta-analysis: science of religion? Controlled Clin Trials. 1989;10:257S-263S.

8-29 Excluded-6

37 Mullen and Ramirez

1987 Mullen PD, Ramirez G. Information Synthesis and meta-analysis. Advan Health Educat Promot. 1987;2:201-39.

8-30 Excluded-7

38 Mulrow and Antonio

1987 Mulrow CD, Antonio S. The medical review article: state of the science. Ann Intern Med. 1987;106:485-488.

5-24,8-31 Excluded-6

39 Neely 1993 Neely JG. Literature review articles as a research form. Otolaryngol - Head & Neck Surg. 1993;108:743-748

8-32 Excluded-6

40 Nony et al. 1995 Nony P, Cucherat M, Haugh MC, Boissel JP. Critical reading of the meta-analysis of clinical trials. Therapie 1995;50:339-351

8-33 Evaluated Identified by the QAT working group for further evaluation

41 Ohlsson 1994 Ohlsson A. Systematic reviews - theory and practice. Scand J Clin Lab Invest. 1994;54:25-32

8-34 Excluded-7

42 Oxman 1994 Oxman AD. Checklists for review articles. BMJ. 1994;309:648-651.

8-35 Excluded-6

43 Oxman and Guyett

1988 Guidelines for reading literature reviews. Can Med Assoc J. 1988;138:697-703.

8-36 Excluded-6

44 Oxman 2001 Appendix 2. Quality of meta-analysis: Oxman and Guyatt's index of the scientific quality of research overviews. In: Systematic reviews in health care. London: BMJ books; 2001.

8-44 Evaluated Identified by the QAT working group for

A-36

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

p. 137-9. Chapter 7. further evaluation

45 PHRU 2002 Critical Appraisal Skills Programme (CASP). 10 questions to help you make sense of reviews. Oxford: Public Health Resource Unit (PHRU); 2006.

5-26,4-64 Evaluated Identified by the QAT working group for further evaluation

46 Sacks et al. 1987 Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316:450-454.

8-38 Evaluated Identified by the QAT working group for further evaluation

47 SIGN-CPD 2002 Continuing professional development: A manual for SIGN guideline developers. Edinburgh: Scottish Intercollegiate Guidelines Network (SIGN); 2002. Available from: http://www.sign.ac.uk/pdf/cpd.pdf

5-25 Excluded--3

48 Smith and Stullenbarger

1989 Smith MC, Stullenbarger E. Meta-analysis: an overview. Nursing Sci Q. 1989;2:114-115.

8-39 Excluded-6

49 Taylor Halvorsen 1994 Taylor Halvorsen K. The reporting format. In: Edited by Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994: 425-437.

8-26 Excluded-6

50 Thacker et al. 1996 Thacker SB, Peterson HB, Stroup DF. Meta-analysis for the obstetrician-gynocologist. Am J Obstet Gynecol. 1996;174:1403-1407.

8-40 Excluded-6

51 Wilson and Henry 1992 Wilson A, Henry DA. Meta-analysis. Med J Aust. 1992;156:173-187.

8-41,4-103 Evaluated Identified by the QAT working group for further evaluation

52 DERP 2002 Systematic Review Methods. In: Drug effectiveness review project. Portland: Oregon Health and Science University; 2008. Available from: http://www.ohsu.edu/drugeffectiveness/methods/index.htm

Expert input Evaluated Identified by the QAT working group for further evaluation

53 SIGN 50 2004 Scottish Intercollegiate Guideline Network. Methodology checklist 1: systematic reviews and meta-analyses. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Available from:

Expert input Evaluated Identified by the QAT working group for further evaluation

A-37

Order Instrument Year Pub. Full Reference Source(s) * Evaluation Status†

http://www.sign.ac.uk/guidelines/fulltext/50/checklist1.html. 54 NZGG 2001 Effective Practice Institute. Handbook for the preparation of

explicit evidence-based clinical practice guidelines. Wellington (NZ): New Zealand Guidelines Group; 2001. Updated 2003. Available from: http://www.nzgg.org.nz/download/files/nzgg_guideline_handbook.pdf

Expert input Evaluated Identified by the QAT working group for further evaluation

55 Shea 2007 (2005

unpublished)

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Medical Research Methodology. 2007, 7:10

Expert input Evaluated Identified by the QAT working group for further evaluation

56 GLENNY et al. 2003 Glenny AM, Esposito M, Coulthard P, Worthington HV. The assessment of systematic reviews in dentistry. Eur J Oral Sci. 2003 Apr;111(2):85-92.

Additional search

Evaluated Identified by the QAT working group for further evaluation

57 GOODWIN et al. 2002 Goodwin DM, Higginson IJ, Edwards AG, Finlay IG, Cook AM, Hood K, et al. An evaluation of systematic reviews of palliative care services. J Palliat Care. 2002;18(2):77-83.

Additional search

Evaluated Identified by the QAT working group for further evaluation

*The order number of the nine review articles in the inclusion list of QAT 1 (Appendix D–2) — the number of the reference originally assigned in the review article. †One of the following exclusion reasons was provided for each excluded instrument using the order number, even though some instruments had multiple reasons: it was not recommended by review articles; it was a specific QAI (not generic); it was determined not to be a QAI; it was a duplicate; it was unavailable; it was not included by further selection using tailored criteria in Shea et al. and Deeks et al. reviews 17,24; it was a guidance document; and it was a an instrument for the study type not of interest.

A-38

Appendix E-2: Reference List of QAIs for RCTs

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

1 Chalmers et al. 1981 Chalmers TC, Smith H Jr, Blackburn B, et al. A method for assessing the quality of a randomized control trial. Control Clin Trials. 1981;2:31-49.

9-24,5-61,4-51 Recommended by the AHRQ report

2 DerSimonian et al.

1982 DerSimonian R, Charette LJ, McPeek B, Mosteller F. Reporting on methods in clinical trials. N Engl J Med. 1982;306:1332-1337.

9-43,5-62 Excluded-1

3 Evans and Pollock

1985 Evans M, Pollock AV. A score system for evaluating random control clinical trials of prophylaxis of abdominal surgical wound infection. Br J Surg. 1985;72:256-260

9-25,5-41,3-20 Excluded-1

4 Liberati et al. 1986 Liberati A, Himel HN, Chalmers TC. A quality assessment of randomized control trials of primary treatment of breast cancer. J Clin Oncol. 1986;4:942-951.

9-26,5-48 Recommended by the AHRQ report

5 Poynard et al. 1987 Poynard T, Naveau S, Chaput JC. Methodological quality of randomized clinical trials in treatment of portal hypertension. In Methodology and Reviews of Clinical Trials in Portal Hypertension. Excerpta Medica; 1987:306-311.

9-44 Excluded-1

6 Colditz et al. 1989 Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy. I: Medical. Stat Med. 1989;8:441-454.

9-27,3-13 Excluded-1

7 Gotzsche 1999 Gotzsche PC. Methodology and overt and hidden bias in reports of 196 double-blind trials of non-steroidal anti-inflammatory drugs in rheumatoid arthritis. Control Clin Trials. 1999;10:31-56.

9-28,5-43,3-21 Excluded-1

8 Reisch et al. 1989 Reisch JS, Tyson JE, Mize SG. Aid to the evaluation of therapeutics studies. Pediatrics. 1989;84:815-827.

9-45,5-50,3-19 Recommended by the AHRQ report

9 Imperiale and McCullough

1990 Imperiale TK, McCullough AJ. Do corticosteroids reduce mortality from alcoholic hepatitis? A meta-analysis of the randomized trials. Ann Intern Med. 1990;113:299-307.

9-46,5-44,3-12 Excluded-1

10 Spitzer et al. 1990 Spitzer WO, Lawrence V, Dales R, et al. Links between passive smoking and disease: a best-practice synthesis. A report of the Working Group on Passive Smoking. Clin Invest Med. 1990;13:17-42; discussion 43-46.

9-47 Excluded-1

11 Kleijnen et al. 1991 Kleijnen J, Knipschild P, ter Riet G. Clinical trials of homoeopathy. BMJ. 1991;302:316-323.

9-29,5-47,3-17 Excluded-1

A-39

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

12 Detsky et al. 1992 Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbe KA. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol. 1992;45:255-265.

9-30,5-63,3-10 Excluded-1

13 Cho and Bero 1994 Cho MK, Bero LA. Instruments for assessing the quality of drug studies published in the medical literature. JAMA. 1994;272:101-104.

9-31,5-38, Excluded-1

14 Goodman et al. 1994 Goodman SN, Berlin J, Fletcher SW, Fletcher RH. Manuscript quality before and after peer review and editing at Annals of Internal Medicine. Ann Intern Med. 1994;121:11-21.

9-32 Excluded-1

15 Fahey et al. 1995 Fahey T, Hyde C, Milne R, Thorogood M. The type and quality of randomized controlled trials (RCTs) published in UK public health journals. J Public Health Med. 1995;17:469-474.

9-33,5-42 Excluded-1

16 Schulz et al. 1995 Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995;273:408-412.

9-51,5-69 Excluded-1

17 Jadad et al. 1996 Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17:1-12.

9-34,5-45,3-15,2-22

Excluded-1

18 Khan et al. 1996 Khan KS, Daya S, Collins JA, Walter SD. Empirical evidence of bias in infertility research: overestimation of treatment effect in crossover trials using pregnancy as the outcome measure. Fertil Steril. 1996;65:939-945.

9-35,5-46 Excluded-1

19 van der Heijden et al.

1996 van der Heijden GJ, van der Windt DA, Kleijnen J, Koes BW, Bouter LM. Steroid injections for shoulder disorders: a systematic review of randomized clinical trials. Brit J Gen Pract. 1996;46:309-316.

9-36,6-52, Recommended by the AHRQ report

20 Bender et al. 1997 Bender JS, Halpern SH, Thangaroopan M, Jadad AR, Ohlsson A. Quality and retrieval of obstetrical anaesthesia randomized controlled trials. Can J Anaesth. 1997;44:14-18.

9-37 Excluded-1

21 de Vet et al. 1997 de Vet HCW, de Bie RA, van der heijden GJMG, Verhagen AP, Sijpkes P, Kipschild PG. Systematic reviews on the basis of methodological criteria. Physiotherapy. June 1997; 83(6): 284-289.

9-18, 5-39 Recommended by the AHRQ report

22 Sindhu et al. 1997 Sindhu F, Carpenter L, Seers K. Development of a tool to rate the quality assessment of randomized controlled trials using a Delphi

9-38,5-51,2-24 Recommended by the AHRQ report

A-40

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

technique. J Adv Nurs. 1997;25:1262-1268.

23 van Tulder et al. 1997 van Tulder MW, Koes BW, Bouter LM. Conservative treatment of acute and chronic nonspecific low back pain. A systematic review of randomized controlled trials of the most common interventions. Spine. 1997;22:2128-2156.

9-39,5-53 Excluded-1

24 Downs and Black

1998 Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998;52:377-384.

9-40,5-40,2-25 Recommended by the AHRQ report

25 Moher et al. 1998 Moher D, Pham B, Jones A, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352:609-613.

9-41 Excluded-1

26 Verhagen et al. 1998 Verhagen AP, de Vet HC, de Bie RA, et al. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol. 1998;51:1235-1241.

9-48,5-71 Excluded-1

27 Khan et al. 2000 Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness. CRD's guidance for those carrying out or commissioning reviews: York (UK): University of York, NHS Centre for Reviews and Dissemination; 2000.

9-12,5-66 Excluded-1

28 New Zealand Guidelines Group

2000 New Zealand Guidelines Group. Tools for guideline development & evaluation [Internet]. Wellington: N ZGG; 2000. [cited 2000 Jul 10]. Available from: http://www.nzgg.org.nz/.

9-13 Excluded-1

29 NHMRC 2000 National Health and Medical Research Council (NHMRC). How to review the evidence: systematic identification and review of the scientific literature. Canberra, Australia: NHMRC; 2000.

9-49 Excluded-1

30 Harbour and Miller

2001 Harbour R, Miller J. A new system [Scottish Intercollegiate Guidelines Network (SIGN)] for grading recommendations in evidence based guidelines. BMJ. 2001;323:334-336.

9-14 Recommended by the AHRQ report

31 Turlik and Kushner

2000 Turlik MA, Kushner D. Levels of evidence of articles in podiatric medical journals. J Am Podiatr Med Assoc. 2000;90:300-302.

9-42 Excluded-1

A-41

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

32 Zaza et al. 2000 Zaza S, Wright-De Aguero LK, Briss PA, et al. Data collection instrument and procedure for systematic reviews in the Guide to Community Preventive Services. Task Force on Community Preventive Services. Am J Prev Med. 2000;18:44-74.

9-50,5-72 Excluded-1

33 Prendiville et al. 1988 Prendiville W, Elbourne D, Chalmers I. The effects of routine oxytocic administration in the management of the third stage of labour: an overview of the evidence from controlled trials. Br J Obstet Gynaecol. 1988;95:3-16.

9-52,5-68 Excluded-1

34 Guyatt et al. 1993 Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA. 1993;270:2598-2601.

9-54,5-65,4-79 Excluded-1

Guyatt et al. 1994 Guyatt GH, Sackett DL, Cook DJ. Users' guides to the medical literature. II. How to use an article about therapy or prevention. B. What were the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA. 1994;271:59-63.

9-53

35 Standards of Reporting Trials Group

1994 The Standards of Reporting Trials Group. A proposal for structured reporting of randomized controlled trials. JAMA. 1994;272:1926-1931.

9-55,5-70 Excluded-1

36 Asilomar Working Group

1996 The Asilomar Working Group on Recommendations for Reporting of Clinical Trials in the Biomedical Literature. Checklist of information for inclusion in reports of clinical trials. Ann Intern Med. 1996;124:741-743.

9-56 Excluded-1

37 Moher et al. 2001 Moher D, Schulz KF, Altman DG, for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA. 2001;285:1987-1991.

9-57,5-49 Excluded-1

38 Clarke and Oxman

1999 Clarke M, Oxman AD. Cochrane reviewer's handbook 4.0. The Cochrane Collaboration; 1999.

9-11 Excluded-1

39 Lohr and Carey 1999 Lohr KN, Carey TS. Assessing 'best evidence': issues in grading the quality of studies for systematic reviews. Joint Commission J Qual Improvement. 1999;25:470-479.

9-1 Excluded-1

A-42

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

40 Aronson et al. 1999 Aronson N, Seidenfeld J, Samson DJ, et al. Relative Effectiveness and cost-effectiveness of methods of androgen suppression in the treatment of advanced prostate cancer. evidence report/technology assessment No.4. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No.99-E0012; 1999.

9-58,6-60 Excluded-1

41 Chestnut et al. 1999 Chestnut RM, Carney N, Maynard H, Patterson P, Mann NC, Helfand M. Rehabilitation for traumatic brain injury. evidence report/technology assessment No. 2. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E006; 1999.

9-60 Excluded-1

42 Jadad et al. 1999 Jadad AR, Boyle M, Cunningham C, Kim M, Schachar R. Treatment of attention deficit/hyperactivity disorder. Evidence Report/Technology Assessment No 11. Rockville (MD).: Agency for Healthcare Research and Quality. AHRQ Publication No. 00-E005; 1999.

9-61 Excluded-1

43 Heidendreich et al.

1999 Heidenreich PA, McDonald KM, Hastie T, et al. An evaluation of beta-blockers, calcium antagonists, nitrates and alternative therapies for stable angina. Rockville (MD): Agency for Healthcare Research and Quality. AHRQ Publication No. 00-E003; 1999.

9-62 Excluded-1

44 Mulrow et al. 1999 Mulrow CD, Williams JW, Trivedi M, Chiquette E, Aguilar C, Cornerll JE. Treatment of depression: newer pharmacotherapies. Evidence Report/Technology Assessment No 7. Rockville (MD): Agency for Healthcare Research and Quality. AHRQ Publication No. 00-E003; 1999.

9-63 Excluded-1

45 Vickrey et al. 1999 Vickrey BG, Shekelle P, Morton S, Clark K, Pathak M, Kamberg C. Prevention and management of urinary tract infections in paralyzed persons. Evidence Report/Technology Assessment No 6. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E008; 1999.

9-64 Excluded-1

46 West et al. 1999 West SL, Garbutt JC, Carey TS et al. Pharmacotherapy for alcohol dependence. Evidence Report/Technology Assessment No 5. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E004; 1999.

9-65 Excluded-1

47 McNamara et al. 2001 McNamara RL, Miller MR, Segal JB et al. Management of new onset atrial fibrillation. Evidence Report/Technology Assessment No 12. Rockville (MD):Agency for Health Care Policy and

9-66,5-67 Excluded-1

A-43

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

Research. AHCPR Publication No. 01-E026; 2001.

48 Ross et al. 2001 Ross S, Eston R, Chopra S, French J. Management of newly diagnosed patients with epilepsy: a systematic review of the literature. Evidence Report/Technology Assessment No 39. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 01-E029; 2001.

9-67 Excluded-1

49

Goudas et al. 2000 Goudas L, Carr DB, Bloch R et al. Management of cancer pain. Evidence Report/Technology Assessment No 35. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E004; 2000.

9-68,5-64 Excluded-1

Lau et al. 2000 Lau J, Ioannidis J, Balk E et al. Evaluating technologies for identifying acute cardiac ischemia in emergency departments: Evidence Report/Technology Assessment No 26. Rockville, MD. Agency for Health Care Policy and Research. AHCPR Publication No. 01-E006; 2000.

9-59

50 Antczak 1986 Antczak AA , Tang J, Chalmers TC. Quality assessment of randomized control trials in dental research 1. Methods. J Perodontal Res 1986;21:305-14

4-76 Excluded-1

51 Beckerman et al. 1992 Beckerman H, de Bie RA, Bouter LM, De Cuyper HJ, Oostendorp RA. The efficacy of laser therapy for musculoskeletal and skin disorders. Phys Ther 1992;72:483-91.

3-14 Excluded-5‡

52 Carruthers et al. 1993 Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL. Report of the Canadian Hypertension Society Consensus Conference:1.Introduction. Can Med Assoc J. 1993;149:289-293.

5-19 Excluded-3

53 Clark et al. 2001 Clark O, Castro AA, Filho JV, Djubelgovic B: Interrater agreement of Jadad's scale. Annual Cochrane Colloqium Abstracts October 2001 Lyon.

5-56 Excluded-4

54 Clarke and Oxman

2003 Clarke M., Oxman AD. Cochrane reviewer's handbook 4.2.0. The Cochrane Collaboration; 2003.

5-4 Excluded-7

55 FOCUS 2001 FOCUS critical appraisal tool. London: The Royal College of Psychiatrists; 2001

5-27 Evaluated Identified by the QAT working group for further evaluation

A-44

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

56 Garbutt et al. 1999 Garbutt JC, West SL, Carey TS Lohr KN, Crews FT. Pharmacotherapy for alcohol dependence. Evidence Report/Technology Assessment No 3. Rockville, MD. Agency for Health Care Policy and Research. AHCPR Publication No. 99-E004; 1999.

5-54 Excluded-2

57 Guyatt et al. 1993 Guyatt GH, Sackett DL, Cook DJ. User's guides to the medical literature. II. How to use an article about therapy prevention. A. Are the results of the study valid? Evidence based working Group. JAMA 1993; 270:2598-601

4-79 Excluded-1

58 Haynes et al. 1994 Haynes BB, Wilczynski N, McKibbon A, Walker CJ, Sinclair J. Developing optimal search strategies for detecting clinically sound studies in MEDLINE. J Am Infomatics Assoc 1994;1:447-458.

5-73 Excluded--3

59 Joanna Briggs 1999 RAPid: Rapid appraisal protocol internet database. Adelaide: The Joanna Briggs Institute; 1999.

5-3 Evaluated Identified by the QAT working group for further evaluation

60 Jonas et al. 2001 Jonas W, Anderson RL, Crawford CC, Lyons JS: A systematic review of the quality of homeopathic clinical trials. BMC Alternative Medicine 2001, 1:12

5-57 Excluded-3

61 Koes et al. 1991 Koes BW, Assendelft WJ, van der Heijden GJ, Bouter LM, Knipschild PG. Spinal manipulation and mobilization for back and neck pain: a blinded review. BMJ. 1991;303:1298-1303.

4-70,3-7 Excluded-2

62 Moseley et al. 2002 Moseley AM, Herbert RD, Sherrington C, Maher CG: Evidence for physiotherapy practice: A survey of the Physiotherapy Evidence Database. Physiotherapy Evidence Database (PEDro). Australian Journal of Physiotherapy 2002, 48:43-50

5-37 Evaluated Identified by the QAT working group for further evaluation

63 NHMRC 2000-a National Health and Medical Research Council (NHMRC). How to review the evidence: assessment and application of scientific evidence. Canberra, Australia : NHMRC; 2000.

5-2 Evaluated Identified by the QAT working group for further evaluation

64 Nicolucci et al. 1989 Nicolucci A, Grilli R, Alexanian A, Apolone G, Torri V, Liberati A. Quality, evolution, and clinical implications of randomized, controlled trials on the treatment of lung cancer. JAMA 1989;262:2101-2107.

4-77 Excluded-1

A-45

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

65 Nurmohamed et al.

1992 Nurmohamed MT, Rosendaal FR, Buller HR. Low-molecular-weight heparin versus standard heparin in general and orthopaedic surgery a meta-analysis. Lancet 1992;340:152-6.

3-9 Excluded-2

66 Onghena and van Houdenhove

1992 Onghena P, van Houdenhove B. Antidepressant induced analgesia in chronic non-malignant pain. Pain 1992;49:205-19

3-11 Excluded-2

67 Oremus et al. 2001 Oremus M, Wolfson C, Perrault A, Demers L, Momoli F, Moride Y: Interrater reliability of the modified Jadad quality scale for systematic reviews of Alzheimer's disease drug trials. Dement Geriatr Cognit Disord 2001, 12:232-236.

5-55 Evaluated Identified by the QAT working group for further evaluation

68 PHRU 2002 Critical Appraisal Skills Programme (CASP). 10 questions to help you make sense of randomised controlled trials. Oxford: Public Health Resource Unit (PHRU); 2002. Available from: http://www.sph.nhs.uk/sph-files/casp-appraisal-tools/rct%20appraisal%20tool.pdf/?searchterm=10 questions to help you make sense of randomised controlled trials. Updated 2006.

5-26 Evaluated Identified by the QAT working group for further evaluation

69 Pogue and Yusuf

1998 Pogue J, Yusuf S. Overcoming the limitations of current meta-analysis of randomised controlled trials. Lancet. 1998;351:47-52.

5-34 Excluded-3

70 Poynard 1998 Poynard T. Evaluation de la qualité méthodologique des essais thérapeutiques randomisés. Presse Méd. 1998;17:315-8.

3-18 Excluded-4

71 SIGN-CPD 2002 Continuing Professional Development: A manual for SIGN Guideline Developers. SIGN 2002.

5-25 Excluded--3

72 Smith et al. 1992 Smith K, Cook D, Guyatt GH, Madhavan J, Oxman AD. Respiratory muscle training in chronic airflow limitation: a meta-analysis. Am Rev Respir Dis. 1992;145:533-9.

3-8 Excluded-1

73 ter Riet et al. 1990 ter Riet G, Kleijnen J, Krupschild P. Acupuncture and chronic pain: a criteria-based meta-analysis. J Clin Epidemiol. 1990;43:1191-9.

3-16 Excluded-1

74 van der Windt et al.

1995 van der Windt DA, van der Heijden GJ, Scholten RJ, Koes BW, Bouter LM. The efficacy of non-steroidal anti-inflammatory drugs for shoulder complaints. J Clin Epidemiol. 1995;48:691-704.

4-71 Excluded-1

75 van Tulder et al. 2000 van Tulder M, Malmivaara A, Esmail R, Koes B: Exercises therapy for low back pain: a systematic review within the framework of the Cochrane Collaboration back review group. Spine. 2000;25:2784-2796.

5-58 Excluded-4

A-46

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

76 van Tulder et al. 2000 van Tulder MW, Ostelo R, Vlaeyen JWS, Linton SJ, Morley SJ, Assendelft WJJ: Behavioral treatment for chronic low back pain: a systematic review within the framework of the Cochrane back. Spine. 2000;25:2688-2699.

5-59 Excluded-4

77 van Tulder et al. 2000 van Tulder MW, Malmivaara A, Esmail R, Koes BW. Exercise therapy for low back pain. Cochrane Database Syst Rev. 2000;2:CD000335.

3-5 Evaluated Identified by the QAT working group for further evaluation

78§ Crombie 1996 Crombie IK. The pocket guide to critical appraisal: a handbook for health care professionals. London: BMJ Publishing Group. 1996.

5-5 Evaluated Identified by the QAT working group for further evaluation

79§ Gyorkos et al. 1994 Gyorkos TW, Tannenbaum TN, Abrahamowicz M et al. An approach to the development of practice guidelines for community health interventions. Can J Public Health. Revue Canadienne De Santé Publique. 1994;85 Suppl 1:S8-13.

5-31 Evaluated Identified by the QAT working group for further evaluation

80 DERP 2002 Drug Effectiveness Review Project. Quality assessment. Portland (OR): Oregon Health and Science University; 2002. Available from: http://www.ohsu.edu/xd/research/centers-institutes/evidence-based-policy-center/derp/documents/upload/Quality-assessment-pdf.pdf

Expert input Evaluated Identified by the QAT working group for further evaluation

81 EPOC 2002 Cochrane Effective Practice and Organisation of Care Review Group (EPOC). The data collection checklist. Ottawa: Institute of Population Health, University of Ottawa; 2002. Available from: http://www.epoc.uottawa.ca/checklist2002.doc

Expert input Evaluated Identified by the QAT working group for further evaluation

82 SIGN 50 2004 Methodology checklist 2: randomised controlled trials. In SIGN 50: A guideline developers' handbook. Edinburgh): Scottish Intercollegiate Guidelines Network; 2004. Revised 2011. Available from: www.sign.ac.uk/methodology/index.html

Expert input Evaluated Identified by the QAT working group for further evaluation

83 Eccles 2001 Eccles M, Mason J. How to develop cost-conscious guidelines. Health Technology Assessment 2001;5(16).

Expert input Evaluated Identified by the QAT working group for further evaluation

84 Bandolier (OPVS)

2003 Independent evidence-based health care. In: Bandolier Forum. Oxford: Bandolier; 2003.

Expert input Evaluated Identified by the QAT working group for further evaluation

A-47

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

85 NZGG 2001 New Zealand Guideline Group. Handbook for the preparation of explicit evidence-based clinical practice guidelines. Wellington (New Zealand): NZGG. 2001.

Expert input Evaluated Identified by the QAT working group for further evaluation

86 van Tulder 2003 van Tulder M, Furlan A, Bombardier C, Bouter L. Updated method guidelines for systematic reviews in the Cochrane Collaboration Back Review Group. Spine. 2003;28(12):1290-9.

Expert input Evaluated Identified by the QAT working group for further evaluation

87 MacLehose (modified Downs & Black)

2000 MacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4(34):1-154.

Expert input Evaluated Identified by the QAT working group for further evaluation

88 Thomas et al. (EPHPP)

2004 Thomas BH, Ciliska D, Dobbins M, Micucci S. A process for systematically reviewing the literature: providing the research evidence for public health nursing interventions. Worldviews on Evidence-Based Nursing. 2004;1(3):176-84.

Expert input Evaluated Identified by the QAT working group for further evaluation

89 Kleijnen 1994 Kleijnen J, de Craen AJ, van Everdingen J, Krol L. Placebo effect in double-blind clinical trials: a review of interactions with medications. Lancet 1994;344-:1347-9.

Additional search

Evaluated Identified by the QAT working group for further evaluation

90 Chalmers 1985 Chalmers I, Enkin M, Keirse MJNC (eds). Effective care in pregnancy and childbirth. Oxford: Oxford University Press, 1985

Additional search

Evaluated Identified by the QAT working group for further evaluation

91 Turlik et al 2003 Turlik MA, Kushner D, Stock D. Assessing the validity of published randomized controlled trials in podiatric medical journals. J Am Podiatr Med Assoc. 2003 Sep;93(5):392-8.

Additional search

Evaluated Identified by the QAT working group for further evaluation

92 Braunschweig et al

2001 Braunschweig CL, Levy P, Sheean PM, Wang X. Enteral compared with parenteral nutrition: A meta-analysis. Am J Clin Nutr. 2001;74(4):534-42.

Additional search

Evaluated Identified by the QAT working group for further evaluation

93 Yang et al. 2001 Yang Q, Peters TJ, Donovan JL, Wilt TJ, Abrams P. Transurethral incision compared with transurethral resection of the prostate for bladder outlet obstruction: A systematic review and meta-analysis of

Additional search

Evaluated Identified by the QAT working group for

A-48

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status†

randomized controlled trials. J Urol. 2001;165(5):1526-32. further evaluation 94 Greenhalgh and

Donald 2000 Greenhalgh T, Donald A. Papers that report drug trials (randomized

controlled trials of therapy). In: Evidence based health care workbook: understanding research; for individual and group learning. London: BMJ Books; 2000. p. 59

Additional search

Evaluated Identified by the QAT working group for further evaluation

*The order number of the nine review articles in the inclusion list of QAT1 (see Appendix D–2) – the number of the reference originally assigned in the review article. †One of the following exclusion reasons was provided for each excluded instrument using the order number, even though some instruments had multiple reasons: it was not recommended by review articles; it was a specific QAI (not generic); it was determined not to be a QAI; it was a duplicate; it was unavailable; it was not included by further selection using tailored criteria in Shea et al. and Deeks et al. reviews; 17,24 it was a guidance document; and it was a an instrument for the study type not of interest. ‡Full instrument was available from the authors on request; no reply to our request from the authors. §Identified by the second round combined analysis; i.e., when selecting the potential tools for further evaluation.

A-49

Appendix E-3: Reference List of QAIs for OBSs

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 1 Reisch et al. 1989 Reisch JS, Tyson JE, Mize SG. Aid to the evaluation of therapeutic

studies. Pediatrics. 1989;84:815-827. 9-45,4-111 Recommended by the

AHRQ report 2 Spitzer et al. 1990 Spitzer WO, Lawrence V, Dales R, et al. Links between passive

smoking and disease: a best-evidence synthesis. A report of the Working Group on Passive Smoking. Clin Invest Med. 1990;13:17-42; discussion 43-46.

9-47,4-105 Recommended by the AHRQ report

3 Cho and Bero 1994 Cho MK, Bero LA. Instruments for assessing the quality of drug studies published in the medical literature. JAMA. 1994;272:101-104.

9-31,4-84 Excluded-1

4 Goodman et al. 1994 Goodman SN, Berlin J, Fletcher SW, Fletcher RH. Manuscript quality before and after peer review and editing at Annals of Internal Medicine. Ann Intern Med. 1994;121:11-21.

9-32 Recommended by the AHRQ report

5 Downs and Black

1998 Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998;52:377-384.

9-40,4-85,7 Recommended by the AHRQ report

6 Goudas et al. 1999 Goudas L, Carr DB, Bloch R, et al. Management of cancer pain. Evidence Report/Technology Assessment. No. 35 (Contract 290-97-0019 to the New England Medical Center). Rockville (MD): Agency for Health Care Policy and Research. .AHCPR Publication No. 99-E004; 2000.

9-69 Excluded-1

7 Ariens et al. 2000 Ariens GA, van Mechelen W, Bongers PM, Bouter LM, van der Wal G. Physical risk factors for neck pain. Scand J Work, Environ Health. 2000;26:7-19.

9-70,5-82 Excluded-1

8 Khan et al. 2000 Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness. CRD's guidance for those carrying out or commissioning reviews: York (UK): University of York, NHS Centre for Reviews and Dissemination; 2000.

9-12,5-66 Excluded-1

9 New Zealand Guidelines Group

2000 New Zealand Guidelines Group. Tools for guideline development & evaluation [Internet]. Wellington: N ZGG; 2000. [cited 2000 Jul 10]. Available from: http://www.nzgg.org.nz /.

9-13 Excluded-1

A-50

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 10 NHMRC 2000 National Health and Medical Research Council (NHMRC). How to

review the evidence: systematic identification and review of the scientific literature. Canberra (Australia) : NHMRC; 2000.

9-49 Excluded-1

11 Harbour and Miller

2001 Harbour R, Miller J. A new system [Scottish Intercollegiate Guidelines Network (SIGN)] for grading recommendations in evidence based guidelines. BMJ. 2001;323:334-336.

9-14 Recommended by the AHRQ report

12 Zaza et al. 2000 Zaza S, Wright-De Aguero LK, Briss PA, Truman BI, Hopkins DP, Hennessy MH, et al. Data collection instrument and procedure for systematic reviews in the Guide to Community Preventive Services. Task Force on Community Preventive Services. Am J Prev Med. 2000 Jan;18(1 Suppl):44-74.

9-50,5-72,4-86

Recommended by the AHRQ report

13 Carruthers et al. 1993 Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL. Report of the Canadian Hypertension Society Consensus Conference:1.Introduction. Can Med Assoc J. 1993; 149:289-293.

9-71,5-19 Excluded-1

14 Laupacis et al. 1994 Laupacis A, Wells G, Richardson WS, Tugwell P: Users' guides to the medical literature. V. How to use an article about prognosis. Evidence-Based Medicine Working Group. J Am Med Assoc. 1994; 272:234-237.

9-72,5-84 Excluded-1

15 Levine et al. 1994 Levine M, Walter S, Lee H, Haines T, Holbrook A, Moyer V: Users' guides to the medical literature. IV. How to use an article about harm. Evidence-Based Medicine Working Group. J Am Med Assoc. 1994;271:1615-1619.

9-73,5-85 Excluded-1

16 Angelillo and Villari

1999 Angelillo I, Villari P: Residential exposure to electromagnetic fields and childhood leukaemia: a meta-analysis. Bull World Health Org. 1999;77:906-915.

9-74,5-81 Excluded-1

17 Lohr and Carey 1999 Lohr KN, Carey TS. Assessing 'best evidence': issues in grading the quality of studies for systematic reviews. Joint Commission. J Qual Improvement. 1999;25:470-479.

9-1 Excluded-1

18 Chestnut et al. 1999 Chestnut RM, Carney N, Maynard H, Patterson P, Mann NC, Helfand M. Rehabilitation for traumatic brain injury. Evidence Report/Technology Assessment No 2. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E006; 1999.

9-60 Excluded-1

A-51

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 19 Vickrey et al. 1999 Vickrey BG, Shekelle P, Morton S, Clark K, Pathak M, Kamberg

C. Prevention and management of urinary tract infections in paralyzed persons. Evidence Report/Technology Assessment No 6. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E008; 1999.

9-64 Excluded-1

20 Audet et al. 1993 Audet N, Gagnon R, Ladoucer R, Marcil M. How effective is the teaching of critical analysis of scientific publications? Review of studies and their methodological quality. CMAJ. 1993;148:945-52.

4-176 Excluded-1

21 Bass et al. 1993 Bass JL, Christoffel KK, Widome M, Boyle W, Scheidt P, Stanwick R, Roberts K. Childhood injury prevention counselling in primary care settings: A critical review of the literature. Pediatrics. 1993;92:544-550.

7 Excluded-1

22 Boers and Ramsden

1991 Boers M, Ramsden M. Long acting drug combinations in rheumatoid arthritis: A formal overview. Journal of Rheumatology. 1991;18:316-324.

7 Excluded-1

23 Bours et al. 1998 Bours GJ, Ketelaars CA, Frederiks CM, Abu Saad HH, Wouters EF. The effects of aftercare on chronic patients and frail elderly patients when discharged from hospital: a systematic review. J Adv Nurs. 1998;27:1076-1086.

4-73 Excluded-1

24 Bracken 1989 Bracken MB. Reporting observational studies. Br J Obstet Gynaecol. 1989;96:383-388.

4-104 Excluded-6

25 Cameron et al. 2000 Cameron I, Crotty M, Currie C, Finnegan T, Gillespie L, Gillespie W et al. Geriatric rehabilitation following fractures in older people: a systematic review. Health Technol Assess. 2000;4(2).

4-83 Excluded-1

26 Campos-Outcalt et al.

1995 Campos-Outcalt D, Senf J, Watkins AJ, Bastacky S. The effects of medical school curricula, faculty role models, and biomedical research support on choice of generalist physician careers: a review and quality assessment of the literature. Acad Med. 1995;70:611-619.

4-177 Excluded-1

27 Carey and Boden

2003 Carey TS, Boden SD: A critical guide to case series reports. Spine. 2003;28:1631-1634.

5-86 Excluded-8

A-52

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 28 Carter and

Verhoef 1994 Carter J, Verhoef MJ. Efficacy of self-help and alternative

treatments of premenstrual syndrome. Womens Health Issues. 1994;4:130-137.

4-178 Excluded-1

29 CASP

1999 Critical Appraisal Skills Programme. 12 questions to help you make sense of Cohort study. Oxford: Critical Appraisal Skills Programme; 1999

4-64 Excluded-6

30 Clemens et al. 1983 Clemens JD, Chuong JH, Feinstein AR. The BCG controversy: A methodological and statistical reappraisal. Journal of the American Medical Association. 1983;249(17):2362-2369.

7 Excluded-1

31 Cochrane MIG 2002 Cochrane Musculoskeletal Injuries Group. Assessment of methodological quality of included trials. In The Cochrane Library, Issue 4. Oxford Update Software; 2002.

4-179 Excluded-1

32 Coleridge Smith

1999 Coleridge Smith P. The management of chronic venous disorders of the leg: an evidence-based report of an International Task Force. Phlebology. 1999;14:3-19.

4-97 Excluded-1

33 Cowley 1995 Cowley DE. Prostheses for primary total hip replacement. A critical appraisal of the literature. Int J Technol Assess Health Care. 1995;11:770-778.

4-109 Evaluated Identified by the QAT working group for further evaluation

34 Cuddy et al. 1983 Cuddy PG, Elenbaas RM, Elenbaas JK. Evaluating the medical literature. Part I: abstract, introduction, methods. Ann Emerg Med. 1983;12:549-555.

4-180 Excluded-1

35 Dawson-Saunders and Trapp

1990 Dawson-Saunders B, Trapp R. Reading the medical literature. Basic and clinical biostatistics. Norwalk, CT: Appleton & Lange; 1990. pp.267-276.

4-98 Excluded-1

36 de Oliveira et al.

1995 de Oliviera IR, Dardennes RM, Amorim ES, Diquet B, se Sena EP, Moreira EC et al. Is there a relationship between antipsychotic blood levels and their clinical efficacy? An analysis of studies design and methodology. Fundam Clin Pharmacol. 1995;9:488-502.

4-96 Excluded-1

37 de Vet et al. 1997 de Vet HCW, de Bie RA, van der heijden GJMG, Verhagen AP, Sijpkes P, Kipschild PG. Systematic reviews on the basis of methodological criteria. Physiotherapy. 1997;83(6):284-289.

5-72 Excluded-1

38 DuRant 1994 DuRant RH. Checklist for the evaluation of research articles. J Adolesc Health. 1994;15:4-8.

4-99 Evaluated Identified by the QAT working group for further evaluation

A-53

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 39 Elwood 1998 Elwood JM: Critical appraisal of epidemiological studies and

clinical trials 2 Edition. Oxford: Oxford University Press; 1998. 5-7 Excluded-7

40 Fowkes and Fulton

1991 Fowkes FG, Fulton PM. Critical appraisal of published research. Introductory guidelines. BMJ. 1991;302:1136-1140.

4-107 Excluded-6

41 Friedenreich 1993 Friedenreich CM. Methods for pooled analyses of epidemiologic studies. Epidemiology. 1993;4:295-302.

4-100 Excluded-1

42 Gardner et al. 1986 Gardner MJ, Machin D, Campbell MJ. Use of check lists in assessing the statistical content of medical studies. BMJ. 1986;292:810-812.

4-181 Excluded-1

43 Glantz and McNanley

1997 Glantz JC, McNanley TJ. Active management of labor: a meta-analysis of cesarean delivery rates for dystocia in nulloparas. Obstet Gynecol Surv. 1997;52:497-505.

4-182 Excluded-1

44 Gordis et al. 1990 Gordis L, Kleinman JC, Klerman LV, Mullen PD, Paneth N. Criteria for evaluating evidence regarding the effectiveness of prenatal interventions. In Merkatz IR, Thompson JE, editors. New perspectives on prenatal care. New York: Elsevier; 1990. pp.31-38.

4-101 Excluded-1

45 Greenhalgh 1997 Greenhalgh T: How to read a paper: assessing the methodological quality of published papers. BMJ. 1997, 315:305-308.

5-80,4-183 Excluded-1

46 Gurman and Kniskern

1978 Gurman A, Kniskern D. Research on marital and family therapy: progress, perspective and prospect. In Garfield S, Bergan A, editors. Handbook of psychotherapy and behavior change: an empirical analysis. New York: Wiley; 1978. pp.817-901.

4-184 Excluded-1

47 Hadorn et al. 1996 Hadorn DC, Baker D, Hodges JS, Hicks N. Rating the quality of evidence for clinical practice guidelines. J Clin Epidemiol. 1996;49:749-754.

4-102,7 Evaluated Identified by the QAT working group for further evaluation

48 Heneghan et al. 1996 Heneghan AM, Horwitz SM, Leventhal JM. Evaluating intensive family preservation programs: A methodological review. Pediatrics 1996;97:535-542.

7 Excluded-1

49 Hoogendoorn et al.

1999 Hoogendoorn WE, van Poppel MN, Bongers PM, Koes BW, Bouter LM. Physical load during work and leisure time as risk factors for back pain. Scand J Work, Environ Health. 1999;25:387-403.

5-83 Excluded-2

A-54

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 50 Horwitz et al. 1990 Horwitz RI, Viscoli CM, Clemens JD, Sadock RT. Developing

improved observational methods for evaluating therapeutic effectiveness. American Journal of Medicine. 1990;89:630-638.

7 Excluded-1

51 Joanna Briggs 1999 RAPid: Rapid appraisal protocol internet database. Adelaide: The Joanna Briggs Institute; 1999.

5-3 Evaluated Identified by the QAT working group for further evaluation

52 Kay and Locker 1996 Kay EJ, Locker D. Is dental health education effective? A systematic review of current evidence. Community Dent Oral Epidemiol. 1996;24:231-235.

4-186 Excluded-1

53 Kreulen et al. 1998 Kreulen CM, Creugers NH, Meijering AC. Meta-analysis of anterior veneer restorations in clinical studies. J Dent. 1998;26:345-353.

4-187 Excluded-1

54 Kwakkel et al. 1997 Kwakkel G, Wagenaar RC, Koelman TW, Lankhorst GJ, Koetsier JC. Effects of intensity of rehabilitation after stroke. A research synthesis. Stroke. 1997;28:1550-1556.

4-188 Excluded-1

55 Lee et al. 1997 Lee TM, Chan CC, Paterson JG, Janzen HL, Blashkko CA. Spectral properties of phototherapy for seasonal affective disorder: a meta-analysis. Acta Psychiatr Scand. 1997;96:117-121.

4-189 Excluded-1

56 Levine 1980 Levine J. Trial Assessment Procedure Scale (TAPS). Bethesda MD: Department of Health and Human Resources, Public Health Service, Alcohol, Drug Abuse and Mental Health Administration, National Institute of Mental Health; 1980.

4-190 Excluded-1

57 Linde et al. 1999 Linde K, Scholz M, Ramirez G, Clausius N, Melchart D, Jonas WB. Impact of study quality on outcome in placebo-controlled trials of homeopathy. J Clin Epidemiol. 1999;52:631-636,

4-95 Excluded-1

58 Loevinsohn 1990 Loevinsohn B. Health education interventions in developing countries: A methodological review of published articles. International Journal of Epidemiology. 1990;19(4):788-794.

7 Excluded-1

59 MacMillan et al.

1994 MacMillan HL, MacMillan JH, Offord DR, Griffith DL, MacMillan A. Primary prevention of child physical abuse and neglect: a critical review. Part 1. J Child Psychol Psychiatry. 1994;35:835-856.

4-191,7 Excluded-1

60 Maziak et al. 1998 Maziak DE, Meade MO, Todd TR. The timing of tracheotomy: a systematic review. Chest. 1998;114:605-609.

4-192,7 Excluded-1

A-55

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 61 Meijman and

Melker 1995 Meijman F, Melker de RA. The extent of inter and intra-reviewer

agreement on the classification and assessment of designs of single-practice research. Fam Pract. 1995;12:93-97.

4-193 Excluded-1

62 Melchart et al. 1994 Melchart D, Linde K, Worku F, Bauer R, Wagner H. Immunomodulation with Echinachea - a systematic review of controlled clinical trials. Phytomedicine. 1994;1:245-254.

4-93 Excluded-1

63 Miller et al. 1995 Miller WR, Brown JM, Simpson TL. What works: A methodological analysis of the alcohol treatment outcome literature. In Hester RK, Miller WR, editors. Handbook of alcoholism treatment approaches: effective alternatives. Boston MA: Allyn & Bacon; 1995. pp.12-44.

4-87 Excluded-1

64 Moncrieff and Drummond

1998 Moncrieff J, Drummond DC. The quality of alcohol treatment research: an examination of influential controlled trials and development of a quality rating system. Addiction. 1998;93:811-823.

4-195 Excluded-1

65 Morley et al. 1996 Morley JA, Finney JW, Monahan S, Floyd AS. Alcoholism treatment outcome studies, 1980-1992: methodological characteristics and quality. Addict Behav. 1996;21:429-443.

4-196 Excluded-1

66 Mulrow and Lichtenstein

1986 Mulrow CD, Lichtenstein MJ. Blood glucose and diabetic retinopathy: a critical appraisal of new evidence. J Gen Intern Med. 1986;1:73-77.

4-197 Excluded-1

67 Newcastle-ottawa

2003 Wells G, Shea B. Data extraction for non-randomised systematic reviews. Ottawa: University Ottawa.

4-66 Excluded-6

68 NHMRC 2000-a National Health and Medical Research Council (NHMRC). How to review the evidence: assessment and application of scientific evidence. Canberra (Australia): NHMRC; 2000.

5-2 Evaluated Identified by the QAT working group for further evaluation

69 Ogilvie-Harris and Gilbart

1995 Ogilvie-Harris DJ, Gilbart M. Treatment modalities for soft tissue injuries of the ankle: A critical review. Clinical Journal of Sports Medicine. 1995;5:175-186.

7 Excluded-1

70 Powe et al. 1994 Powe NR, Tielsch JM, Schein OD, Luthra R, Steinberg EP for the Cataract Patient Outcome Research Team. Rigor of research methods in studies of the effectiveness and safety of cataract extraction with intraocular lens implantation. Cataract Patient Outcome Research Team. Archives of Ophthalmology. 1994;112:228-238.

7 Excluded-1

A-56

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 71 Salisbury 1997 Salisbury C. What is the impact of different models of care on

patients’ quality of life, psychological well-being, or motivation? Appropriate and cost effective models of service delivery in palliative care: report March 1996-July 1997. Bristol: Division of Primary Health Care, University of Bristol; 1997.

4-198 Excluded-1

72 Schechter et al. 1991 Schechter MT, Leblanc FE, Lawrence VA. Critical appraisal of published research. In Mulder D, McPeek B, Troidl H, Spitzer W, McKneally M, Weschler A, editors. Principles and practice of research: strategy for surgical investigators. 2nd ed. New York: Springer; 1991. pp.81-87.

4-199 Excluded-1

73 Sheldon eta l. 1993 Sheldon TA, Song F, Davey Smith G. Critical appraisal of the medical literature: how to assess whether health-care interventions do more harm than good. In Drummond MF, Maynard A, Wells N, editors. Purchasing and providing cost effective health care. London: Churchill Livingstone; 1993. pp.31-48.

4-200 Excluded-1

74 SIGN-CPD 2002 Continuing Professional Development: A manual for SIGN Guideline Developers. SIGN; 2002.

5-25 Excluded-3

75 Smeenk et al. 1998 Smeenk FW, van Haastregt JC, de Witte LP, Crebolder HF. Effectiveness of home care programmes for patients with incurable cancer on their quality of life and time spent in hospital: systematic review. British Medical Journal. 1998;316:1939-1944.

7 Excluded-1

76-77 Stieb et al. 1990 Stieb DM, Frayha HH, Oxman AD, Shannon HS, Hutchison BG, Crombie FS. Effectiveness of haemophilus influenzae type b vaccines. Canadian Medical Association Journal. 1990;142(7):719-733.

7 Excluded-1

78 Talley et al. 1993 Talley NJ, Nyren O, Drossman DA. The irritable bowel syndrome: toward optimal design of controlled treatment trials. Gastroenterology Int. 1993;6:189-211.

4-106 Excluded-1

79 Talley et al. 1996 Talley NJ, Owen BK, Boyce P, Paterson K. Psychological treatments for irritable bowel syndrome: A critique of controlled treatment trials. American Journal of Gastroenterology. 1996;91:277-283.

7 Excluded-1

80 Ter Reit et al. 1990 Ter Riet G, Kleijnen J, Knipschild P. Acupuncture and chronic pain: A criteria-based meta-analysis. Journal of Clinical Epidemiology. 1990;43(11):1191-1199.

7 Excluded-1

A-57

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 81 Thomas 2004 Thomas H. Quality assessment tool for quantitative studies.

Effective Public Health Practice Project. McMaster University, Toronto. Unpublished.

4-65 Excluded-6

82 Vickers 1996 Vickers AJ. Can acupuncture have specific effects on health? A systematic review of acupuncture antiemesis trials. J R Soc Med. 1996;89:303-311.

4-201 Excluded-1

83 Vickers 1995 Vickers A. Critical appraisal: how to read a clinical research paper. Complement Ther Med. 1995;3:158-166.

4-110 Evaluated Identified by the QAT working group for further evaluation

84 Weintraub 1982 Weintraub M. How to critically assess clinical drug trials. Drug Ther. 1982;12:131-148.

4-108 Excluded-7

85 Wilson and Henry

1992 Wilson A, Henry DA. Meta-analysis. Med J Aust 1992;156:173-187.

8-41,4-103 Excluded-1

86 Wingood and DiClemente

1996 Wingood GM, DiClemente RJ. HIV sexual risk reduction interventions for women: a review. Am J Prev Med. 1996;12:209-217.

4-202 Excluded-1

87 Wright and Dye 1995 Wright J, Dye R. Systematic review on obstructive sleep apnoea: its effect on health and benefit of treatment. Leeds: University of Leeds; 1995. pp.1-60.

4-203 Excluded-1

88 Zola et al. 1989 Zola P, Volpe T, Castelli G, Sismondi P, Nicolucci A, Parazzini F, Liberati A. Is the published literature a reliable guide for deciding between alternative treatments for patients with early cervical cancer? International Journal of Radiation, Oncology, Biology & Physics. 1989;16:785-797.

7 Excluded-1

89‡ Crombie 1996 Crombie IK. The pocket guide to critical appraisal: a handbook for health care professionals London: BMJ Publishing Group; 1996.

5-5 Evaluated Identified by the QAT working group for further evaluation

90‡ Gyorkos et al. 1994 Gyorkos TW, Tannenbaum TN, Abrahamowicz M et al. An approach to the development of practice guidelines for community health interventions. Can J Public Health. Revue Canadienne De Santé Publique. 1994;85 Suppl 1:S8-13.

5-31 Evaluated Identified by the QAT working group for further evaluation

A-58

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 91‡ PHRU 2004 Critical Appraisal Skills Programme (CASP). 12 questions to help

you make sense of a cohort study. Oxford: Public Health Resource Unit (PHRU); 2004. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20Cohort%20Assessment%20Tool.pdf

5-26,4-64 Evaluated Identified by the QAT working group for further evaluation

2006 Critical Appraisal Skills Programme (CASP). 11 questions to help you make sense of a case control study. Oxford: Public Health Resource Unit (PHRU); 2006. [cited 2008 Apr 4]. Available from: http://calder.med.miami.edu/portals/ebmfiles/UM%20CASP%20Case-Controls%20Assessment%20Tool.pdf

Evaluated Identified by the QAT working group for further evaluation

92 Slim 2003 Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J. Methodological index for non-randomised studies (MINORS): development and validation of a new instrument. ANZ J Surg. 2003;73:712-716.

Expert input Evaluated Identified by the QAT working group for further evaluation

93 Reeves & Deeks

unpublished Reeves BC, Deeks JJ. A checklist of study design features (Dr. Barnaby Reeves, University of Bristol, Bristol, UK: personal communication, 2005 Jul 22)

Expert input Evaluated Identified by the QAT working group for further evaluation

94 DERP 2002 Oregon Evidence-based Practice Center for the Drug Effectiveness Review Project. Quality assessment methods for drug class review for The Drug Effectiveness Review Project. Portland: Oregon Health and Science University; 2002

Expert input Evaluated Identified by the QAT working group for further evaluation

95 EPOC 2002 Cochrane Effective Practice and Organization of Care Review Group (EPOC). The data collection checklist. Ottawa: Institute of Population Health, University of Ottawa; 2002.

Expert input Evaluated Identified by the QAT working group for further evaluation

96 SIGN50 2004 Scottish Intercollegiate Guidelines Network. Methodology checklist 3: cohort studies. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter Annex C

Expert input Evaluated Identified by the QAT working group for further evaluation

Scottish Intercollegiate Guidelines Network. Methodology checklist 4: case-control studies. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter Annex C

Expert input Evaluated Identified by the QAT working group for further evaluation

A-59

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 97 Bandolier 2003 Bandolier Professional. Independent evidence-based health care.

Oxford (England): Bandolier; 2003. Expert input Evaluated

Identified by the QAT working group for further evaluation

98 NZGG 2001 New Zealand Guideline Group. Handbook for the preparation of explicit evidence-based clinical practice guidelines. Wellington (New Zealand): NZGG. 2001.

Expert input Evaluated Identified by the QAT working group for further evaluation

99 MacLehose (modified Downs & Black)

2000 MacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4(34):1-154.

Expert input Evaluated Identified by the QAT working group for further evaluation

*The order number of the nine review articles in the inclusion list of QAT 1 (see Appendix D–2) — the number of the reference originally assigned in the review article. †One of the following exclusion reasons was provided for each excluded instrument using the order number, even though some instruments had multiple reasons: it was not recommended by review articles; it was a specific QAI (not generic); it was determined not to be a QAI; it was a duplicate; it was unavailable; it was not included by further selection using tailored criteria in Shea et al. and Deeks et al. reviews;17,24 it was a guidance document; and it was a an instrument for the study type not of interest. ‡Identified by the second round combined analysis; i.e., when selecting the potential tools for further evaluation.

A-60

Appendix E-4: Reference List of QAIs for Multiple Design Studies

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 1 CLR Critical Literature Review - Website not functional 5-9 Excluded-2 2 Colditz et al. 1989 Colditz GA, Miller JN, Mosteller F. How study design affects outcomes

in comparisons of therapy. I: Medical. Stat Med. 1989;8:441-454. 5-91 Excluded-4

3 Turlik and Kushner

2000 Turlik MA, Kushner D. Levels of evidence of articles in podiatric medical journals. J Am Podiatr Med Assoc. 2000;90:300-302.

5-92 Excluded-4

4 Borghouts et al.

1998 Borghouts JAJ, Koes BW, Bouter LM: The clinical course and prognostic factors of non-specific neck pain: a systematic review. Pain. 1998, 77:1-13.

5-93 Excluded-2

5 Spitzer et al. 1990 Spitzer WO, Lawrence V, Dales R, et al. Links between passive smoking and disease: a best-evidence synthesis. A report of the Working Group on Passive Smoking. Clin Invest Med. 1990;13:17-42; discussion 43-46.

5-94 Excluded-4

6 Sutton et al. 1998 Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Systematic review of trials and other studies. Health Technology Assess. 1998;2:1-276.

5-95 Excluded-4

7 NHMRC 2000 National Health and Medical Research Council (NHMRC). How to review the evidence: systematic identification and review of the scientific literature. Canberra (Australia) : NHMRC; 2000.

5-1 Excluded-4

8 Beck 1997 Beck CT. Use of meta-analysis as a teaching strategy in nursing research courses. J Nurs Educ. 1997;36:87-90.

5-18 Excluded-4

9 Evans and Pollock

1985 Evans M, Pollock AV. A score system for evaluating random control clinical trials of prophylaxis of abdominal surgical wound infection. Br J Surg. 1985;72:256-260

5-41 Excluded-4

10 Chestnut et al.

1999 Chestnut RM, Carney N, Maynard H, Patterson P, Mann NC, Helfand M. Rehabilitation for traumatic brain injury. Evidence Report/Technology Assessment No 2. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 99-E006; 1999.

5-96 Excluded-4

11 Lohr and Carey

1999 Lohr KN, Carey TS. Assessing 'best evidence': issues in grading the quality of studies for systematic reviews. Joint Commission. J Qual Improvement. 1999;25:470-479.

5-97 Excluded-4

12 Greer et al. 2000 Greer N, Mosser G, Logan G, Halaas GW. A practical approach to evidence grading. Joint Commission. J Qual Improv. 2000;26:700-712.

5-98 Excluded-4

A-61

Order Instrument Year Pub. Full Reference Source(s)* Evaluation Status† 13 Harris et al. 2001 Harris RP, Helfand M, Woolf SH, et al. Current methods of the U.S.

Preventive Services Task Force: A review of the process. Am J Prev Med. 2001;20:21-35.

5-99 Excluded-4

14 Anonymous 1981 How to read clinical journals: IV. To determine etiology or causation. Can Med Assoc J. 1981;124:985-990.

5-100 Excluded-4

15 Whitten et al.

2002 Whitten PS, Mair FS, Haycox A, May CR, Williams TL, Hellmich S: Systematic review of cost effectiveness studies of telemedicine interventions. BMJ. 2002, 324:1434-1437.

5-101 Excluded-8

16 Forrest and Miller

2002 Forrest JL, Miller SA: Evidence-based decision making in action: Part 2-evaluating and applying the clinical evidence. J Contemp Dental Pract. 2002;4:42-52.

5-102 Excluded-4

17 Charnock 1998 Charnock DF (Ed). The DISCERN Handbook: Quality criteria for consumer health information on treatment choices. New York: Radcliffe Medical Press. 1998.

5-116 Excluded-8

*The order number of the nine review articles in the inclusion list of QAT 1 (see Appendix D–2) — the number of the reference originally assigned in the review article. †One of the following exclusion reasons was provided for each excluded instrument using the order number, even though some instruments had multiple reasons: it was not recommended by review articles; it was a specific QAI (not generic); it was determined not to be a QAI; it was a duplicate; it was unavailable; it was not included by further selection using tailored criteria in Shea et al. and Deeks et al. reviews;17,24 it was a guidance document; and it was a an instrument for the study type not of interest.

A-62

Appendix E-5: Reference List of EGSs

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 1 Canadian Task

Force 1979 Canadian Task Force on the Periodic Health Examination. The

periodic health examination. Can Med Assoc J. 1979;121:1193-1254.

9-112 Excluded-1

2 Anonymous 1981 How to read clinical journals: IV. To determine etiology or causation. Can Med Assoc J. 1981;124:985-990.

9-87 Excluded-1

3 Cook et al. 1992 Cook DJ, Guyatt GH, Laupacis A, Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest. 1992;102:305S-311S.

9-114 Excluded-1

Sackett 1989 Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest. 1989;95:2S-4S.

9-113

4 U.S. Preventive Services Task Force

1996 US Preventive Services Task Force. Guide to Clinical Preventive Services. 2nd ed. Baltimore: Williams & Wilkins; 1996

9-122,6-4 Excluded-1

5 Ogilvie et al. 1993 Ogilvie RI, Burgess ED, Cusson JR, Feldman RD, Leiter LA, Myers MG. Report of the Canadian Hypertension Society Consensus Conference: 3. Pharmacologic treatment of essential hypertension. Can Med Assoc J. 1993;149:575-584.

9-115 Excluded-1

6 Gross et al. 1994 Gross PA, Barrett TL, Dellinger EP, et al. Purpose of quality standards for infectious diseases. Infectious Diseases Society of America. Clin Infect Dis. 1994;18:421.

9-123 Excluded-1

7 Gyorkos et al. 1994 Gyorkos TW, Tannenbaum TN, Abrahamowicz M, et al. An approach to the development of practice guidelines for community health interventions. Can J Public Health. Revue Canadienne De Sante Publique. 1994;85 Suppl 1:S8-13.

9-81 Recommended by the AHRQ report

8 Guyatt et al. 1998 Guyatt GH, Cook DJ, Sackett DL, Eckman M, Pauker S. Grades of recommendation for antithrombotic agents. Chest. 1998;114:441S-444S.

9-88 Excluded-1

A-63

9 Guyatt et al. 1995 Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ. Users' guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based Medicine Working Group. JAMA. 1995;274:1800-1804.

9-89,6-12 Excluded-1

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 10 Evans et al. 1997 Evans WK, Newman T, Graham I, et al. Lung cancer practice

guidelines: lessons learned and issues addressed by the Ontario Lung Cancer Disease Site Group. J Clin Oncol. 1997;15:3049-3059.

9-116 Excluded-1

11 Granados et al. 1997 Granados A, Jonsson E, Banta HD, et al. EUR-ASSESS Project Subgroup Report on dissemination and Impact. Int J Technol Assess Health Care. 1997;13:220-286.

9-117 Excluded-1

12 Gray 1997 Gray JAM; Evidence-Based Healthcare. London: Churchill Livingstone; 1997.

9-124 Excluded-1

13 van Tulder et al.

1997 van Tulder MW, Koes BW, Bouter LM. Conservative treatment of acute and chronic nonspecific low back pain. A systematic review of randomized controlled trials of the most common interventions. Spine. 1997;22:2128-2156.

9-39 Excluded-1

14 Bartlett et al. 1998 Bartlett JG, Breiman RF, Mandell LA, File TMJ. Community-acquired pneumonia in adults: guidelines for management. The Infectious Diseases Society of America. Clin Infect Dis. 1998;26:811-838.

9-118 Excluded-1

15 Djulbegovic and Hadley

1998 Djulbegovic B, Hadley T. Evaluating the quality of clinical guidelines. Linking decisions to medical evidence. Oncology. 1998 Nov;12:310-314.

9-125 Excluded-1

16 Edwards et al. 1998 Edwards AG, Russell IT, Stott NC. Signal versus noise in the evidence base for medicine: an alternative to hierarchies of evidence? Fam Pract. 1998;15:319-322.

9-126 Excluded-1

17 Bril et al. 1999 Bril V, Allenby K, Midroni G, O'Connor PW, Vajsar J. IGIV in neurology––evidence and recommendations. Can J Neurol Sci. 1999;26:139-152.

9-119 Excluded-1

A-64

18 Chesson et al. 1999 Chesson ALJ, Wise M, Davila D, et al. Practice parameters for the treatment of restless legs syndrome and periodic limb movement disorder. An American Academy of Sleep Medicine Report. Standards of Practice Committee of the American Academy of Sleep Medicine. Sleep. 1999;22:961-968.

9-127 Excluded-1

19 Clarke and Oxman

1999 Clarke M., Oxman AD. Cochrane Reviewer's Handbook 4.0. The Cochrane Collaboration; 1999.

9-11 Recommended by the AHRQ report

20 Hoogendoorn et al.

1999 Hoogendoorn WE, van Poppel MN, Bongers PM, Koes BW, Bouter LM. Physical load during work and leisure time as risk factors for back pain. Scand J Work, Environ Health. 1999;25:387-403.

9-90,5-83 Excluded-1

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 21 Working Party 1999 Working Party for Guidelines for the Management of Heavy

Menstrual Bleeding. An evidence-based guideline for the management of heavy menstrual bleeding. N Z Med J. 1999;112:174-177.

9-120 Excluded-1

22 Shekelle et al. 1999 Shekelle PG, Woolf SH, Eccles M, Grimshaw J. Clinical guidelines: developing guidelines. BMJ. 1999;318:593-596.

9-121 Excluded-1

23 Wilkinson 1999 Wilkinson CP. Evidence-based medicine regarding the prevention of retinal detachment. Transactions Am Ophthalmol Society. 1999;97:397-406.

9-128 Excluded-1

24 Ariens et al. 2000 Ariens GA, van Mechelen W, Bongers PM, Bouter LM, van der Wal G. Physical risk factors for neck pain. Scand J Work, Environ Health. 2000;26:7-19.

9-70 Excluded-1

25 Briss et al. 2000 Briss PA, Zaza S, Pappaioanou M, et al. Developing an evidence-based guide to community preventive services ––methods. The Task Force on Community Preventive Services. Am J Prev Med. 2000;18:35-43.

9-82,1-25 Recommended by the AHRQ report

26 Greer et al. 2000 Greer N, Mosser G, Logan G, Halaas GW. A practical approach to evidence grading. Joint Commission J Qual Improv. 2000;26:700-712.

9-83,5-98 Recommended by the AHRQ report

27 Guyatt et al. 2000 Guyatt GH, Haynes RB, Jaeschke RZ, et al. Users' Guides to the medical literature: XXV. Evidence-based medicine: principles for applying the users' guides to patient care. Evidence- Based Medicine Working Group. JAMA. 2000;284:1290-1296.

9-84 Recommended by the AHRQ report

A-65

28 Khan et al. 2000 Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness. CRD's guidance for those carrying out or commissioning reviews: York (UK): University of York, NHS Centre for Reviews and Dissemination; 2000.

9-12 Excluded-1

29 NHMRC 2000 National Health and Medical Research Council (NHMRC). How to review the evidence: systematic identification and review of the scientific literature. Canberra (Australia): NHMRC; 2000.

9-49 Excluded-1

30 NHS 2001 NHS Research and Development Centre of Evidence-Based Medicine. Levels of evidence. London: The Centre; 200.

9-85 Recommended by the AHRQ report

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 31 New Zealand

Guidelines Group

2000 New Zealand Guidelines Group. Tools for guideline development & evaluation [Internet]. Wellington: N ZGG; 2000. [cited 2000 Jul 10]. Available from: http://www.nzgg.org.nz

9-13 Excluded-1

32 Sackett et al. 2000 Sackett DL, Straus SE, Richardson WS, et al. Evidence-based medicine: how to practice and teach EBM. London: Churchill Livingstone; 2000.

9-91 Excluded-1

33 Harbour and Miller

2001 Harbour R, Miller J. A new system [Scottish Intercollegiate Guidelines Network (SIGN)] for grading recommendations in evidence based guidelines. BMJ. 2001;323:334-336.

9-14,1-18 Excluded-1

34 Harris et al. 2001 Harris RP, Helfand M, Woolf SH, et al. Current methods of the U.S. Preventive Services Task Force: A review of the process. Am J Prev Med. 2001;20:21-35.

9-86 Recommended by the AHRQ report

35 Chestnut et al. 1999 Chestnut RM, Carney N, Maynard H, Patterson P, Mann NC, Helfand M. Rehabilitation for traumatic brain injury. Evidence Report/Technology Assessment No 2. Rockville (MD).:Agency for Health Care Policy and Research. AHCPR Publication No. 99-E006; 1999.

9-60 Excluded-1

36 West et al. 1999 West SL, Garbutt JC, Carey TS et al. Pharmacotherapy for alcohol dependence. Evidence Report/Technology Assessment No 5. Rockville (MD). :Agency for Health Care Policy and Research. AHCPR Publication No. 99-E004; 1999

9-65 Excluded-1

A-66

37 McNamara et al.

2001 McNamara RL, Miller MR, Segal JB et al. Management of new onset atrial fibrillation. Evidence Report/Technology Assessment No 12. Rockville (MD):Agency for Health Care Policy and Research. AHCPR Publication No. 01-E026; 2001.

9-66 Excluded-1

38 Ross et al. 2001 Ross S, Eston R, Chopra S, French J. Management of Newly diagnosed patients with epilepsy: A systematic review of the literature. Evidence Report/Technology Assessment No 39. Rockville (MD): Agency for Health Care Policy and Research. AHCPR Publication No. 01-E029; 2001.

9-67 Excluded-1

39 Levine et al. 2000 Levine C, Armstrong K, Chopra S, Estok R, Zhang S, Ross S. Diagnosis and management of breast disease: A systematic review of the literature. Rockville (MD): Agency for Healthcare Research and Quality; 2000.

9-147 Excluded-1

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 40

Goudas et al. 2000 Goudas L, Carr DB, Bloch R et al. Management of cancer pain. Evidence Report/Technology Assessment No 35. Rockville (MD). Agency for Health Care Policy and Research. AHCPR Publication No. 99-E004; 2000.

9-68 Excluded-1

Lau et al. 2000 Lau J, Ioannidis J, Balk E et al. Evaluating Technologies for Identifying Acute Cardiac Ischemia in Emergency Departments: Evidence Report/Technology Assessment No 26. Rockville, MD. Agency for Health Care Policy and Research. AHCPR Publication No. 01-E006;2000.

9-59

41 Guyatt et al. 2001 Guyatt GH, Schunemann H, Cook D, Pauker S, Sinclair J, Bucher H, Jaeschkle R: Grades of recommendations for antithrombotic agents. Chest. 2001;119:3S-7S

1-21 Evaluated Identified by the QAT working group for further evaluation

42 AHCPR 1992 Acute Pain Management. Rockville, MD: US Dept of Health and Human Services, Public Health Services, Agency for Health Care Policy and Research; 1992. AHCPR publication 92-0038.

6-11 Excluded-1

43 Ball et al. 1998 Ball C, Sackett D, Phillip B, Straus S, Haynes B. Levels of Evidence and grades of recommendations. Oxford, UK: Centre for Evidence based Medicine; 1998.

6-14,1-16 Evaluated Identified by the QAT working group for further evaluation

A-67

44 Eccles et al. 1996 Eccles M, Clapp Z, Grimshaw J, Adams PC, Higgins B, Purves I, Russell I. North of England evidence based guidelines development project: methods of guideline development. BMJ. 1996;312:760-762.

6-10 Excluded-1

45 Hadorn et al. 1996 Hadorn DC, Baker D, Hodges JS, Hicks N. Rating the quality of evidence for clinical practice guidelines. J Clin Epidemiol. 1996;49:749-754.

6-13 Excluded-1

46 Liddle et al. 1997 Liddle J, Williamson M, Irwig L. Method for evaluating research and guideline evidence. Sydney, Australia: NSW Health Department; 1997.

6-15 Evaluated Identified by the QAT working group for further evaluation

47 Jovell and Navarro-Rubio

1995 Jovell AL, Navarro-Rubio MD. Evaluacion de al evidencia cientifica. Med Clin (Barc). 1995;105:740-743.

6-16 Evaluated Identified by the QAT working group for further evaluation

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 48 NHMRC 2000-a National Health and Medical Research Council (NHMRC).

How to review the evidence: assessment and application of scientific evidence. Canberra (Australia): NHMRC; 2000.

1-17 Evaluated Identified by the QAT working group for further evaluation

49 Woolf 1990 Woolf SH, Battista R, Anderson GM, Logan AG, Wang E, Canadian Task Force on the Periodic Health Examination. Assessing the clinical effectiveness of preventive maneuvers: analytic principles and systematic methods in reviewing evidence and developing clinical practice recommendations. A report by CTFPHE. J Clin Epidemiol. 1990;43:891-905.

6-3 Excluded-1

50$ Carruthers et al. 1993 Carruthers SG, Larochelle P, Haynes RB, Petrasovits A, Schiffrin EL. Report of the Canadian Hypertension Society Consensus Conference:1.Introduction. Can Med Assoc J. 1993; 149:289-293.

5-19 Evaluated Identified by the QAT working group for further evaluation

51$ Joanna Briggs 1999 RAPid: Rapid appraisal protocol internet database. Adelaide: The Joanna Briggs Institute; 1999.

5-3 Evaluated Identified by the QAT working group for further evaluation

A-68

52 GRADE working group

2004 The GRADE* Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490-1494

Expert input Evaluated Identified by the QAT working group for further evaluation

53 NHMRC (in progress)

2005 Coleman K, Norris S, Weston A, Grimmer K, Hillier S, Merlin T, et al. NHMRC additional levels of evidence and grades for recommendations for developers of guidelines. Pilot program 2005 - 2007 . Canberra: National Health and Medical Research Council; 2005.

Expert input Evaluated Identified by the QAT working group for further evaluation

54 Guyatt et al 2004 Guyatt G, Schünemann HJ, Cook D, Jaeschke R, Pauker S. Applying the Grades of Recommendation for Antithrombotic and Thrombolytic Therapy. Chest. 2004;126:179S-187S.

Expert input Evaluated Identified by the QAT working group for further evaluation

55 SIGN 50 2004 Scottish Intercollegiate Guidelines Network. Forming guideline recommendations. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter 6.

Expert input Evaluated Identified by the QAT working group for further evaluation

Order Systems Year Pub. Full Reference Source(s) Evaluation Status 56 NHMRC 1999-c National Health and Medical Research Council. A guide to the

development, implementation and evaluation of clinical practice guidelines. Canberra, Australia. NHMRC. 1999.

Expert input Evaluated Identified by the QAT working group for further evaluation

57 Eccles 2001 Eccles M, Mason J. How to develop cost-conscious guidelines. Health Technology Assessment. 2001;5(16).

Expert input Evaluated Identified by the QAT working group for further evaluation

58 Soldani 2005 Soldani F, Ghaemi SN, Baldessarini RJ. Research reports on treatments for bipolar disorder: preliminary assessment of methodological quality. Acta Psychiatr Scand. 2005;112:72-4.

Expert input Evaluated Identified by the QAT working group for further evaluation

59 Ellis 2000 Ellis J. sharing the evidence: clinical practice benchmarking to improve continuously quality of care. J Adv Nurs. 2000;32(1):215-225

Additional search Evaluated Identified by the QAT working group for further evaluation

A-69

60 CTFPHC 2003 Canadian Task Force methodology. Ottawa: Canadian Task Force on Preventive Health Care; 2003.

QAT working group Evaluated Identified by the QAT working group for further evaluation

*The order number of the nine review articles in the inclusion list of QAT 1 (see Appendix D–2) — the number of the reference originally assigned in the review article. ^One of the following exclusion reasons was provided for each excluded instrument using the order number, even though some instruments had multiple reasons: it was not recommended by review articles; it was a specific EGS (not generic); it was determined not to be a EGS; it was a duplicate; and it was unavailable. $ Identified by the second round combined analysis; i.e., when selecting the potential tools for further evaluation.

A-70

APPENDIX F: EVALUATION RESULTS Appendix F-1: QAIs for SRs

Evaluation of Quality Assessment Instruments for Systematic Reviews*

Instruments Study Question

Search Strategy†

In/Exclusion Criteria

Data Extraction

Study Quality/Validity†

Data Synthesis† Funding†

Goldschmidt, 1986 85 ● ● o ◐ ● ◐ o Sacks et al., 1987 88 ● ● ● ◐ ◐ ● o Wilson, 1992 89 ● ◐ ● ◐ ● ● o Irwig, 199450 ● ● ● ● ● ● o Nony, 1995 86 ● ● ● ● ● ● o Crombie 199683 ● ● ◐ o ● ● o Sacks et al. 199651‡ ● ● ● ● ● ● ● Auperin, 199752 ◐ ● ● ● ◐ ● ● Barnes and Bero, 199853 ● ◐ ● o ● ● ● Joanna Briggs, 1999 73 ● ● ● ◐ ● ● o Khan, 200054 ● ● ● ● ● ● o FOCUS, 200184 ● ● ● o ◐ ◐ o NZGG, 200134 o ● ● o ● ◐ o CASP (PHRU), 200275 ● ● o o ● ● o

A-71

Evaluation of Quality Assessment Instruments for Systematic Reviews*

Instruments Study Question

Search Strategy†

In/Exclusion Criteria

Data Extraction

Study Quality/Validity†

Data Synthesis† Funding†

DERP, 200236 ● ● ● ● ● ● o Goodwin et al, 200243 ● ◐ ◐ ◐ ◐ ● o Oxman, 200187 ● ● ◐ o ● ● o Glenny et al, 2003 42 ● ● ◐ o ● ● o SIGN 50, 2004 94 ● ● ● o ● o o AMSTAR, 2005‡ (unpublished) 2§ ● ● ● ● ● ● ●

*A clear study question, comprehensive and rigorous search strategy that addresses publication biases, a priori inclusion/exclusion criteria, clearly defined intervention groups, address all potentially important harms and benefits for associated outcomes, detailed data extraction process, clearly defined study quality and validity strategy, appropriate data synthesis that considered the robustness of results and heterogeneity issues, results are presented in either in narrative summary and/or quantitative summary statistic and measure of precision, conclusions discussed, and identify source of funding or sponsorship.1 †Domains with at least one element with an empirically demonstrated basis ‡Tools with highest scores. §AMSTAR was unpublished during the evaluation in 2005. Later, the published version2 was provided by the author without major changes.

A-72

Appendix F-2: QAIs for RCTs

Evaluation of Quality Assessment Instruments for Randomized Controlled Trials*

Instruments Study Population Randomization† Blinding† Interventions Outcomes

Statistical Analysis† Funding†

Chalmers et al. 198155‡ ● ● ● ● ● ● ● Chalmers 198547 o ● ◐ o o ◐ o Liberati et al. 198660 ● ● ● ● ● ● o Reisch et al, 198956‡ ● ● ● ● ● ● ● Gyorkos 199466 ◐ ● ● ◐ ● ● o Kleijnen 199448 ◐ o ● ● ● ● o Crombie 199683 ◐ ● ◐ ● ● ● o van der Heijden and van der Windt 199661 ● ● ● ● ● ● o de Vet et al. 199762 ● ● ● ● ● ● o Sindhu et al. 199757 ● ● ● ● ● ● o Downs and Black, 199858 ● ● ● ● ● ● o Joanna Briggs 199973 ◐ ● ● o ◐ ◐ o Greenhalgh and Donald 200049 ● ◐ ● ◐ ◐ ● o

A-73

Evaluation of Quality Assessment Instruments for Randomized Controlled Trials*

Instruments Study Population Randomization† Blinding† Interventions Outcomes

Statistical Analysis† Funding†

MacLehose 200030 ● ● ● ● ● ● o NMHRC 2000-a82 o ● ● o o ◐ o van Tulder 200095 o ● ● o o ● o Braunschweig et al. 200144 ◐ ● ◐ ◐ ◐ ◐

o

Eccles, 200127 ◐ ● ● o ◐ o o FOCUS 200184 ◐ ● ● o ◐ ◐ o Harbour and Miller 200159 ● ● ● ● ● ● o NZGG, 200134 ● ● ● ● ● ● o Oremus 200179 ● ● ● o ◐ ● o Yang et al. 200145 o ● o ◐ ◐ ● o CASP(PHRU) 200276 ◐ ● ● o ◐ ● o DERP 200236 ● ● ● o o ● ● EPOC 200233 ◐ ● ◐ o ● ◐ o

A-74

Evaluation of Quality Assessment Instruments for Randomized Controlled Trials*

Instruments Study Population Randomization† Blinding† Interventions Outcomes

Statistical Analysis† Funding†

Moseley 200281 ● ● ● o ◐ ● o Bandolier (OPVS) 200338 o o ● o ● ● o Turlik et al. 200341 o ● ● o ● ● o van Tulder 200331 ◐ ● ● o o ◐ o SIGN 50, 20043‡ ● ● ● ● ● ● ● Thomas 200496 o ● ◐ o ◐ ◐ o

*A clear study question, describe study population using explicit inclusion/exclusion criteria, adequate approach to randomization of study groups (e.g., sequence generation, concealment), address treatment allocation (e.g., double-blinding), sufficient detail surrounding intervention groups, outcomes and analytic techniques, appropriate measure of precision, support for conclusion, and identify source of funding or sponsorship. 1 †Domains with at least one element with an empirically demonstrated basis ‡Tools with highest scores.

A-75

Appendix F-3: QAIs for OBSs

Evaluation of Quality Assessment Instruments for Observational Studies*

Instruments Comparability of Subjects†

Exposure/ Intervention

Outcome Measure Statistical Analysis Funding†

Reisch et al. 198956‡ ● ● ● ● ● Spitzer et al. 199063 ● ● ● ● o DuRant 199491 ◐ o o ◐ o Goodman et al. 199464 ● ● ● ● o Gyorkos 199466 ◐ ● ● ● o Cowley 199590 ◐ o ◐ o o Vickers 199593 ◐ ◐ ● ● o Crombie 199683 ◐ ● o o o Hadorn 199692 ◐ o ◐ ◐ o Downs and Black 199858 ● ● ● ● o Joanna Briggs 199973 ◐ o o o o MacLehose 200030 ● ● ● ● o NMHRC 200082 ◐ ● ◐ ● o Zaza et al. 200065 ● ● ● ● o Harbour and Miller 200159 ● ● ● ● o

A-76

Evaluation of Quality Assessment Instruments for Observational Studies*

Instruments Comparability of Subjects†

Exposure/ Intervention

Outcome Measure Statistical Analysis Funding†

NZGG 200134 ● ● ● o o CASP (PHRU) 200277,78 ● ● ● ● o DERP 200236 ◐ ● o ● ● EPOC 200233 ◐ o ● o o Bandolier 200338 ◐ o o ● o Slim 200328 ● ● ● ◐ o SIGN 50 20044,5‡ ● ● ● ● ● Reeves & Deeks, 2005 ((Dr. Barnaby Reeves, University of Bristol, Bristol, UK: personal communication, 2005 Jul 22)

o o ◐ ● o *A clear study question, describe study population and comparability of subjects using explicit inclusion/exclusion criteria, clear definition and validity/reliability of selection of exposure or intervention, clearly defined outcomes, address confounding variables either at design or analysis level, appropriate statistical analysis (e.g., measure of precision), support for conclusion and identification of funding source or sponsorship.1 †Domains with at least one element with an empirically demonstrated basis. ‡Tools with highest scores.

A-77

Appendix F-4: EGSs

Evaluation of Evidence Grading Systems EGSs Guideline related?^ Domain

Quality Quantity Consistency Caruthers 199380 ◐ ○ ○ **Gyorkos et al. 199466 Guideline system ● ● ● Jovell 199597 ● ○ ○ Liddle 199698 ● ○ ○ Clarke and Oxman 199970 Non-guideline system ● ● ● Joanna Briggs 199974 ◐ ○ ○ NHMRC 1999-c39 ● ○ ○ **Briss et al. 200067 Guideline system ● ● ● Ellis, 200046 ● ○ ○ **Greer et al. 200068 Guideline system ● ● ● Guyatt et al. 200071 Non-guideline system ● ● ● NHMRC 2000-a82 ● ◐ ○ ACCP-Guyatt 200199 ● ◐ ○ Eccles 200127 ◐ ○ ○

A-78

Evaluation of Evidence Grading Systems EGSs Guideline related?^ Domain

Quality Quantity Consistency **Harris et al. 200169 Guideline system ● ● ● NHS 200172 Non-guideline system ● ● ● OCEBM 2001100 ● ○ ● CTFPHE 200340 ● ◐ ○ **GRADE working group 20046

● ● ● Guyatt et al. 200426 ● ○ ○ **SIGN 50 2004101 ● ● ● NHMRC 2005 (in progress) 35

● ● ● Soldani 200529 ◐ ◐ ○

*Quality: extent to which a study's design, conduct, and analysis has minimized selection, measurement, and confounding biases; quantity: magnitude of treatment effect, the number of studies that have evaluated the given topic, and overall sample size across all included studies; and consistency: extent to which similar findings are reported from work using similar and different study designs.1 **Tools with highest scores. ^The information on guideline related is extracted from the AHRQ report.1

A-79

APPENDIX G: QAIs AND EGSs SELECTED Appendix G-1: AMSTAR for SRs

A Measurement Tool to Assess Reviews (AMSTAR), 2007 2

Source: Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol [Internet]. 2007 [cited 2007 Nov 22];7:10. Available from: http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1810543&blobtype=pdf

A-80

Appendix G-2: Modified SIGN 50 Checklist for RCTs3

Section 1: Internal validity In a well-conducted RCT study… In this study this criterion is:

1.1

The study addresses an appropriate and clearly focused question.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.2 The assignment of subjects to treatment groups is randomised.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.3 An adequate concealment method is used.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.4 Subjects and investigators are kept “blind” about treatment allocation.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.5 The treatment and control groups are similar at the start of the trial.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.6 The only difference between groups is the treatment under investigation.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.7 All relevant outcomes are measured in a standard, valid, and reliable way.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.8 What percentage of the individuals or clusters recruited into each treatment arm of the study dropped out before the study was completed?

1.9 All the subjects are analysed in the groups to which they were randomly allocated (often referred to as intention to treat analysis).

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

1.10 Where the study is carried out at more than one site, results are comparable for all sites.

Well covered Adequately addressed

Poorly addressed Not reported

Not applicable Not addressed

Section 2: Overall Assessment of the Study 2.1 How well was the study done to minimise bias?

Code ++, +, or −

Section 3: Others 3.1 How was this study funded?

List all sources of funding quoted in the article, whether Government, voluntary sector, or industry.

Adapted From: Scottish Intercollegiate Guidelines Network. Sign 50: A Guideline Developers' Handbook [Internet]. Edinburgh: The Network; 2008. Annex C. Methodology Checklist 2: Randomised Controlled Trials; P. 52. [Cited 2008 Jun 6]. Available From: Http://www.Sign.Ac.Uk/Pdf/Sign50.pdf

A-81

Appendix G-3: Modified SIGN 50 Checklist for Cohort Studies4

S I G N

Methodology Checklist 3: Cohort Studies

Section 1: Internal Validity

In a well conducted cohort study: In this study the criterion is: 1.1 The study addresses an appropriate and clearly

focused question. Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Selection of Subjects 1.2 The two groups being studied are selected from

source populations that are comparable in all respects other than the factor under investigation.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.3 The study indicates how many of the people asked to take part did so, in each of the groups being studied.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.4 The likelihood that some eligible subjects might have the outcome at the time of enrolment is assessed and taken into account in the analysis.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.5 What percentage of individuals or clusters recruited into each arm of the study dropped out before the study was completed.

1.6 Comparison is made between full participants and those lost to follow up, by exposure status.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Assessment 1.7 The outcomes are clearly defined. Well covered

Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.8 The assessment of outcome is made blind to exposure status.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.9 Where blinding was not possible, there is some recognition that knowledge of exposure status could have influenced the assessment of outcome.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.10 The measure of assessment of exposure is reliable. Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.11 Evidence from other sources is used to demonstrate that the method of outcome assessment is valid and reliable.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.12 Exposure level or prognostic factor is assessed more than once.

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

A-82

Confounding 1.13 The main potential confounders are identified and

taken into account in the design and analysis. Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Statistical Analysis 1.14 Have confidence intervals been provided?

Section 2: Overall Assessment Of The Study 2.1 How well was the study done to minimise the risk

of bias or confounding, and to establish a causal relationship between exposure and effect? Code ++, +, or −

Section 3: Others 3.1 How was this study funded?

List all sources of funding quoted in the article, whether Government, voluntary sector, or industry.

Adapted from: Scottish Intercollegiate Guidelines Network. Methodology checklist 3: cohort studies. In: SIGN 50: a guideline developers' handbook. Edinburgh: The Network; 2004. Chapter Annex C [cited 2008 Jun 6]. Available from: http://www.sign.ac.uk/guidelines/fulltext/50/checklist3.html

A-83

Appendix G-4: Modified SIGN 50 Checklist for Case-Control Studies5

S I G N

Methodology Checklist 4: Case-Control Studies

Section 1: Internal Validity In an well-conducted case control study: In this study the criterion is:

1.1 The study addresses an appropriate and clearly focused question

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Selection of Subjects 1.2 The cases and controls are taken from comparable

populations Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.3 The same exclusion criteria are used for both cases and controls

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.4 What percentage of each group (cases and controls) participated in the study?

Cases: Controls:

1.5 Comparison is made between participants and non-participants to establish their similarities or differences

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.6 Cases are clearly defined and differentiated from controls

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.7 It is clearly established that controls are non-cases Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Assessment 1.8 Measures will have been taken to prevent

knowledge of primary exposure influencing case ascertainment

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

1.9 Exposure status is measured in a standard, valid and reliable way

Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Confounding 1.10 The main potential confounders are identified and

taken into account in the design and analysis Well covered Adequately addressed Poorly addressed

Not addressed Not reported Not applicable

Statistical Analysis 1.11 Confidence intervals are provided

Section 2: Overall Assessment of The Study 2.1 How well was the study done to minimise the risk

of bias or confounding? Code ++, +, or −

A-84

Section 3: Others 3.1 How was this study funded?

List all sources of funding quoted in the article, whether Government, voluntary sector, or industry.

Adapted From: Scottish Intercollegiate Guidelines Network. Methodology Checklist 4: Case-Control Studies [Internet]. In: Sign 50: A Guideline Developers' Handbook. Edinburgh: The Network; 2004. Chapter Annex C [Cited 2008 Jun 6]. Available From: Http://www.Sign.Ac.Uk/Guidelines/Fulltext/50/Checklist4.html

A-85

Appendix G-5: GRADE 2004

Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) Working Group. Grading quality of evidence and strength of recommendations: abridged version. BMJ [Internet]. 2004 Jun 19 [cited 2005 Nov 17];328(7454):1490-4. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC428525/?tool=pubmed