Papageorgiou SN, Xavier GM, Cobourne MT. Basic study design influences treatment effect when...

31
REVIEW ARTICLE Basic study design influences the results of orthodontic clinical investigations Spyridon N. Papageorgiou a,b,c, * , Guilherme M. Xavier d , Martyn T. Cobourne d a Department of Orthodontics, School of Dentistry, University of Bonn, Welschnonnenstr. 17, Bonn 53111, Germany b Department of Oral Technology, School of Dentistry, University of Bonn, Welschnonnenstr. 17, Bonn 53111, Germany c Clinical Research Unit 208, University of Bonn, Welschnonnenstr. 17, Bonn 53111, Germany d Department of Orthodontics, King’s College London Dental Institute, Floor 27, Guy’s Hospital, London SE1 9RT, UK Accepted 18 March 2015; Published online xxxx Abstract Objectives: Meta-analysis is the gold standard for synthesizing evidence on the effectiveness of health care interventions. However, its validity is dependent on the quality of included studies. Here, we investigated whether basic study design (i.e., randomization and timing of data collection) in orthodontic research influences the results of clinical trials. Study Design and Setting: This meta-epidemiologic study used unrestricted electronic and manual searching for meta-analyses in or- thodontics. Differences in standardized mean differences (DSMD) between interventions and their 95% confidence intervals (CIs) were calculated according to study design through random-effects meta-regression. Effects were then pooled with random-effects meta-analyses. Results: No difference was found between randomized and nonrandomized trials (25 meta-analyses; DSMD 5 0.07; 95% CI 5 0.21, 0.34; P 5 0.630). However, retrospective nonrandomized trials reported inflated treatment effects compared with prospective (40 meta- analyses; DSMD 5 0.30; 95% CI 5 0.53, 0.06; P 5 0.018). No difference was found between randomized trials with adequate and those with unclear/inadequate generation (25 meta-analyses; DSMD 5 0.01; 95% CI 5 0.25, 0.26; P 5 0.957). Finally, subgroup analyses indicated that the results of randomized and nonrandomized trials differed significantly according to scope of the trial (effective- ness or adverse effects; P 5 0.005). Conclusion: Caution is warranted when interpreting systematic reviews investigating clinical orthodontic interventions when non- randomized and especially retrospective nonrandomized studies are included in the meta-analysis. Ó 2015 Elsevier Inc. All rights reserved. Keywords: Orthodontics; Meta-analysis; Systematic review; Randomized controlled trial; Prospective clinical study; Retrospective clinical study 1. Introduction 1.1. Background Meta-analysis of clinical trials provides the best evi- dence for evaluating orthodontic interventions because of the increased statistical power and precision [1]. However, if the methodological quality of these studies is suboptimal, then the results will be biased, even if the meta-analysis is conducted to the highest standards (the so-called ‘‘garbage in-garbage out’’ concept) [2]. Ideally, meta-analyses assessing the efficacy of ortho- dontic interventions would include only well-conducted and appropriately reported randomized controlled trials (RCTs), which are seen as the epitome of clinical research [2]. Their principal advantage lies in the random allocation of patients to different interventions, which minimizes se- lection bias [3]. However, high-quality clinical trials do not always exist, and non-RCTs of interventions (non- RCTs) are often included in systematic reviews and meta- analyses in orthodontics [4,5], which can potentially affect their conclusions. Indeed, many widely used interventions in orthodontics are not adequately supported by clinical ev- idence possibly because orthodontics is a field where pa- tient lives are rarely put at risk [6]. Funding: S.N.P. received funds from Clinical Research Unit 208 ‘‘Eti- ology and. Sequelae of Periodontal Diseases - Genetic, Cell. Biological and Biomechanical Aspects’’ (University of Bonn, Bonn, Germany) but no relevance to the topic. G.M.X. and M.T.C. are supported by the Acad- emy of Medical Sciences (The Wellcome Trust, British Heart Foundation, Arthritis Research UK). * Corresponding author. Tel.: þ49-(0)228-287-22449; fax: þ49-(0) 228-287-22588. E-mail address: [email protected] (S.N. Papageorgiou). http://dx.doi.org/10.1016/j.jclinepi.2015.03.008 0895-4356/Ó 2015 Elsevier Inc. All rights reserved. Journal of Clinical Epidemiology - (2015) -

Transcript of Papageorgiou SN, Xavier GM, Cobourne MT. Basic study design influences treatment effect when...

REVIEWARTICLE

Basic study design influences the results of orthodonticclinical investigations

Spyridon N. Papageorgioua,b,c,*, Guilherme M. Xavierd, Martyn T. CobournedaDepartment of Orthodontics, School of Dentistry, University of Bonn, Welschnonnenstr. 17, Bonn 53111, Germany

bDepartment of Oral Technology, School of Dentistry, University of Bonn, Welschnonnenstr. 17, Bonn 53111, GermanycClinical Research Unit 208, University of Bonn, Welschnonnenstr. 17, Bonn 53111, Germany

dDepartment of Orthodontics, King’s College London Dental Institute, Floor 27, Guy’s Hospital, London SE1 9RT, UK

Accepted 18 March 2015; Published online xxxx

Abstract

Objectives: Meta-analysis is the gold standard for synthesizing evidence on the effectiveness of health care interventions. However, itsvalidity is dependent on the quality of included studies. Here, we investigated whether basic study design (i.e., randomization and timing ofdata collection) in orthodontic research influences the results of clinical trials.

Study Design and Setting: This meta-epidemiologic study used unrestricted electronic and manual searching for meta-analyses in or-thodontics. Differences in standardized mean differences (DSMD) between interventions and their 95% confidence intervals (CIs) werecalculated according to study design through random-effects meta-regression. Effects were then pooled with random-effects meta-analyses.

Results: No difference was found between randomized and nonrandomized trials (25 meta-analyses; DSMD 5 0.07; 95% CI 5 �0.21,0.34; P 5 0.630). However, retrospective nonrandomized trials reported inflated treatment effects compared with prospective (40 meta-analyses; DSMD 5 �0.30; 95% CI 5 �0.53, �0.06; P 5 0.018). No difference was found between randomized trials with adequateand those with unclear/inadequate generation (25 meta-analyses; DSMD 5 0.01; 95% CI 5 �0.25, 0.26; P 5 0.957). Finally, subgroupanalyses indicated that the results of randomized and nonrandomized trials differed significantly according to scope of the trial (effective-ness or adverse effects; P 5 0.005).

Conclusion: Caution is warranted when interpreting systematic reviews investigating clinical orthodontic interventions when non-randomized and especially retrospective nonrandomized studies are included in the meta-analysis. � 2015 Elsevier Inc. All rightsreserved.

Keywords: Orthodontics; Meta-analysis; Systematic review; Randomized controlled trial; Prospective clinical study; Retrospective clinical study

1. Introduction

1.1. Background

Meta-analysis of clinical trials provides the best evi-dence for evaluating orthodontic interventions because ofthe increased statistical power and precision [1]. However,if the methodological quality of these studies is suboptimal,

then the results will be biased, even if the meta-analysis isconducted to the highest standards (the so-called ‘‘garbagein-garbage out’’ concept) [2].

Ideally, meta-analyses assessing the efficacy of ortho-dontic interventions would include only well-conductedand appropriately reported randomized controlled trials(RCTs), which are seen as the epitome of clinical research[2]. Their principal advantage lies in the random allocationof patients to different interventions, which minimizes se-lection bias [3]. However, high-quality clinical trials donot always exist, and non-RCTs of interventions (non-RCTs) are often included in systematic reviews and meta-analyses in orthodontics [4,5], which can potentially affecttheir conclusions. Indeed, many widely used interventionsin orthodontics are not adequately supported by clinical ev-idence possibly because orthodontics is a field where pa-tient lives are rarely put at risk [6].

Funding: S.N.P. received funds from Clinical Research Unit 208 ‘‘Eti-

ology and. Sequelae of Periodontal Diseases - Genetic, Cell. Biological

and Biomechanical Aspects’’ (University of Bonn, Bonn, Germany) but

no relevance to the topic. G.M.X. and M.T.C. are supported by the Acad-

emy of Medical Sciences (The Wellcome Trust, British Heart Foundation,

Arthritis Research UK).

* Corresponding author. Tel.: þ49-(0)228-287-22449; fax: þ49-(0)

228-287-22588.

E-mail address: [email protected] (S.N. Papageorgiou).

http://dx.doi.org/10.1016/j.jclinepi.2015.03.008

0895-4356/� 2015 Elsevier Inc. All rights reserved.

Journal of Clinical Epidemiology - (2015) -

What is new?

Key findings� Evidence of bias was found between randomized

and nonrandomized orthodontic studies, althoughthe direction of bias varied between trials investi-gating the effectiveness or adverse effects of ortho-dontic interventions.

� Evidence of bias was found between retrospectivenonrandomized and prospective nonrandomizedstudies of orthodontic interventions.

What this study adds to what was known?� The influence of basic study design on the results

of clinical research, which has already been identi-fied in various biomedical fields, also exists inorthodontics.

What is the implication and what should changenow?� Reporting of the study design in orthodontics is

suboptimal and should be improved, as it can influ-ence study results.

� If the inclusion of studies with different designs isjudged appropriate in a meta-analysis, a sensitivityanalysis based on study design is warranted.

Currently, clinical trials reported in the orthodontic litera-ture consist of only a modest proportion of RCTs, whereasthe rest are either prospective or retrospective non-RCTs[7e9]. Most of these are retrospective clinical trials, wheredata concerning the performance of an intervention are ex-tracted from archived patient files. Traditionally, these haveprovided much of the evidence for clinical decision makingin orthodontics [10]. This can be attributed to (1) the relativetime lag between the introduction of orthodontics as a spe-cialty and the establishment of evidence-based researchmeth-odologies; (2) the fact that for some orthodontic questions,RCTs can be unethical; and (3) personal preferences of re-searchers or research groups. Indeed, criticisms have been lev-eled at the use of RCTs in orthodontics, characterizing them asunethical and inappropriate in some cases [11]. The relativemerits of RCTs in orthodontics have been discussed in severalcontexts [12,13]; however, if orthodontics is to be considered aclinical discipline with a sound scientific basis, conclusionsshould be based on appropriate scientific methodology.

Empirical evidence relating to the effect of study designcharacteristics on treatment effects can be derived frommeta-epidemiologic studies that integrate data from acollection of meta-analyses [14]. In this collection ofmeta-analyses, all component trials are classified accordingto a specific study-level characteristic and then synthesized.

As an example, it has been shown that inadequacies in thegeneration of a randomization sequence, allocationconcealment, or blinding in RCTs can lead to biased esti-mates [15e17]. Concerning interventional research in or-thodontics, empirical evidence is needed because it isimportant to know for questions that can be answeredthrough both RCTs and non-RCTs, the extent of bias thatis associated with each study design. Also, because manyorthodontic interventions will only ever be assessed fromnon-RCTs due to their nature, it is important to knowhow biased they might be. To date, there is no empirical ev-idence in relation to the impact of study design on treat-ment effect in orthodontics.

1.2. Aim

The aim of this meta-epidemiologic study was to iden-tify the extent of bias that might be introduced in the resultsof clinical trials in orthodontics based on their basic studydesign.

2. Methods

2.1. Terminology and protocol

In this report, we define as ‘‘systematic review’’ a struc-tured review with comprehensively planned procedures ofstudy identification, study selection, data extraction, andmethodological assessment of included studies. We define‘‘meta-analysis’’ as the procedure of statistical synthesisof the results of two or more studies. As such, meta-analysis is ideally conducted within the framework of a sys-tematic review, and multiple meta-analyses can be includedin a systematic review. Primary studies (here, clinical trials)included in a systematic review or a meta-analysis aretermed ‘‘component trials.’’ We define non-RCTs as ‘‘anyquantitative trial estimating the effects of an interventionthat does not use randomization to allocate units to compar-ison groups’’ [18]. Finally, the pooling of multiple meta-analyses according to a specific factor (e.g., study design)is termed ‘‘meta-epidemiologic synthesis.’’

In an attempt tomake the nature of the component trials astransparent as possible, their designs were categorized as fol-lows: (1) tRCT: ‘‘true’’ RCT (adequate random sequencegeneration method clearly described and adequate accordingto Cochrane Collaboration criteria); (2) uRCT: ‘‘unclear’’RCT (an unclear random sequence generation method); (3)qRCT: quasi-RCT (an inadequate random sequence genera-tion method); (4) pCCT: prospective nonrandomizedcontrolled clinical trial; (5) rCCT: retrospective nonrandom-ized controlled clinical trial; and (6) unclear: nonrandomizedtrial with unclear design (probably retrospective).

The protocol for this study was registered prospectivelyin the PROSPERO International prospective register of sys-tematic reviews (CRD42014013767) before any dataextraction or analysis.

2 S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

2.2. Search and selection procedures

Study selection was based on a previously publishedCADMOS meta-epidemiologic database [19]. To constructthis database, seven general open-access, regional, or greyliterature databases were searched from inception toSeptember 2012 for systematic reviews in the field of oralmedicine, including orthodontics. There was no language,publication year, or publication status restriction. The exist-ing database was updated in MEDLINE through PubMed(http://www.ncbi.nlm.nih.gov/pubmed) on July 25, 2014,with the strategy: (orthodon* OR malocclusion) AND(‘‘systematic review’’ OR ‘‘meta-analysis’’ OR ‘‘random-ef-fects’’ OR ‘‘fixed-effect’’ OR ‘‘meta-regression’’), limited toreviews, systematic reviews, and meta-analyses. Manualupdates were conducted up to September 2014, using thesame search strategy (Appendix at www.jclinepi.com).

2.3. Inclusion criteria

Eligibility for this study was systematic reviews in ortho-dontics with at least one meta-analysis of interventionalstudieswith different design. Systematic reviews that includedstudies of a single design (e.g., all tRCTs) were excluded, asthey would not contribute in the comparative analyses amongstudy designs. As a requirement, either the raw data or thecalculated standardized mean difference (SMD) should be re-ported in the published report. All obtained reports werescreened by one author (S.N.P.). MEDLINE was searchedthrough PubMed to assign unique identifiers to each compo-nent trial and systematic review. References not indexed weremanually assigned a unique identifier. Using the identifier,overlaps were checked and duplicates were removed untilthere was no overlap between component trials.

2.4. Data extraction

Apredefined formwas used to extract the characteristics ofincluded systematic reviews/component trials, including thedesign of each included component trial. Multiple meta-analyses were extracted from a systematic review only whenthe component trials or their outcomes differed. The designof component trials was categorized as tRCT, uRCT, qRCT,pCCT, rCCT, or unclear according to the full text. Data extrac-tion and characterization of study designwere performed inde-pendentlyby twoauthors (S.N.P. andG.M.X.) basedon the fulltext of each review/component trial, as often misclassificationof study designs in the orthodontic literature has been reported[20]. In a handful of instances, where the full text of a singlecomponent trial could not be obtained, even after communica-tionwith the authors, this component trialwas excluded. Inoneinstance, where no judgment about trial design could bemade,the third author (M.T.C.) was consulted. Subgroup analyseswere ignored, if an overall pooled estimate of the subgroupswas given.When the subgroupswere not pooled together, datawere extracted from the largest subgroup. A preliminary cali-bration between the two authors responsible for extraction

(S.N.P. and G.M.X.) was conducted before the actual extrac-tion procedures until consensus was reached.

2.5. Analysis

2.5.1. Calculating effects within each meta-analysisFor both continuous and binary outcomes, raw data or

calculated effect sizes were extracted for each componenttrial (i.e., trial included in the original meta-analysis), con-verted into SMDs, and recoded so that a negative SMD wasbeneficial (Appendix at www.jclinepi.com).

Random-effects meta-regression was performed, fullyincorporating heterogeneity between trials, to derive a ‘‘differ-ence in SMDs’’ (DSMD) and the standard error for eachmeta-analysis, according to the component trial design. An iterativeresidual maximum likelihood algorithm was used for the esti-mation of between-study variance because of its performance[21], and the KnappeHartung modification [22] was used forthe calculation of the DSMDs, which accounts for the uncer-tainty in the heterogeneity estimate [23]. For the DSMD ofcharacteristic [A] vs. characteristic [B], a DSMD ! 0 indi-cated that studieswith characteristic [A] show larger treatmenteffects than those with characteristic [B]. The magnitude forSMDs andDSMDwas assessed with the following guidelines(SMD of 0.2 5 small effect; SMD of 0.5 5 medium effect;SMD of 0.85 large effect) [24].

Three statistical comparisons were conducted: (1) RCTsvs. non-RCTs; (2) prospective non-RCTs vs. retrospectivenon-RCTs; and (3) RCTs with an adequate randomsequence generation method vs. RCTs with inadequate orunclear random sequence generation method.

2.5.2. Meta-epidemiologic synthesis amongmeta-analysesThe DSMDs among meta-analyses were pooled with

the metan macro (random-effects model based on theDerSimonian and Laird method) as no guidelines for meta-epidemiologic synthesis exist. Between-meta-analysis hetero-geneity was assessed with the heterogeneity parameter t2,whereas between-meta-analysis inconsistency was quantifiedwith the I2 statistic, definedas theproportionof total variabilityin the results explained by heterogeneity [25,26]. The 95%uncertainty intervals [similar to confidence interval (CIs)]around the I2 were calculated [27] using the noncentral c2

approximation of Q [28]; 95% predictive intervals (PIs) werecalculated for the DSMD, which incorporate existing hetero-geneity and provide a range of possible effects for a futuremeta-analysis [29]. All analyses were run in Stata SE 10.0(StataCorp, College Station, TX, USA) (Appendix at www.jclinepi.com). A two-tailed P-value of 0.05 was consideredsignificant for hypothesis testing, except for a 0.10 used forthe test of heterogeneity and reporting biases [30].

2.5.3. Additional analysesMixed-effects subgroup analyses were performed to

identify possible differences of study design role accordingto (1) various fields of orthodontics; (2) outcome type

3S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

(binary or continuous); (3) research scope (effectiveness oradverse effects); and (4) nature of the outcome (subjectiveor objective). Subjective outcomes included self-reportedpain intensity and eating or speaking difficulty. Indicationsof reporting bias were assessed with Egger’s linear regres-sion test [31] if 10 or more meta-analyses were included ina meta-epidemiologic synthesis.

2.5.4. Sensitivity analysesSensitivity analyses were performed by (1) comparing

the results of fixed-effect and random-effects models; (2)excluding trials with unclear description of methodology;(3) including only the largest meta-analysis from each

systematic review; and (4) including only the most precise50% from the number of eligible meta-analyses (i.e., hav-ing the lowest standard error) for each comparison.

3. Results

3.1. Study characteristics

3.1.1. Study selectionFollowing an initial screening of the pre-existing data-

base and manual literature update, a total of 171 relevantsystematic reviews were identified (Fig. 1). A total of 148systematic reviews were excluded after consideration,leaving 23 relevant reviews with 147 meta-analyses for

Fig. 1. Flow diagram for the identification and selection of eligible systematic reviews.

4 S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

inclusion. After the addition of manually identified reviews,77 meta-analyses with 333 component trials were included(Table 1 and Appendix at www.jclinepi.com).

3.2. Main analyses

3.2.1. RCTs vs. non-RCTsA total of 25 meta-analyses included both RCTs and

non-RCTs and could be pooled. On average, RCTs showedminimally smaller treatment effects compared with non-RCTs (DSMD 5 0.07; 95% CI 5 �0.21, 0.34;P 5 0.630) (Table 2 and Fig. 2).

3.2.2. Prospective non-RCTs vs. retrospective non-RCTsA total of 40 meta-analyses included both prospective and

retrospective non-RCTs and could be pooled. On average,

retrospective non-RCTs showed inflated treatment effectscompared with prospective non-RCTs (DSMD 5 �0.30;95% CI 5 �0.53, �0.06; P 5 0.018) (Table 2 and Fig. 3).The magnitude of the effect was small to medium, whereasmoderate heterogeneity was found among meta-analyses.

3.2.3. Adequate vs. inadequate/unclear randomsequence generation

A total of 25 meta-analyses included both RCTs withadequate random sequence generation and RCTs with inade-quate/unclear random sequence generation and could bepooled. On average, RCTs with adequate random sequencegeneration showedalmost the same treatment effects comparedwith RCTs with inadequate/unclear random sequence genera-tion (DSMD 5 0.01; 95% CI 5 �0.25, 0.26; P 5 0.957)(Table 2 and Appendix at www.jclinepi.com).

3.3. Additional analyses

According to the subgroup analyses (Table 3), consider-able variation of DSMDs was found among the various or-thodontic fields with statistically significant differences (Pbetween subgroups 5 0.027). In addition, the scope ofthe meta-analysis had a significant modifying effect onthe results (P between subgroups5 0.005). When the effec-tiveness of an intervention was studied, RCTs tended toshow smaller treatment effects than non-RCTs(DSMD 5 0.32; 95% CI 5 0.06, 0.58; P 5 0.015). How-ever, when adverse effects of interventions were studied,RCTs tended to show greater treatment effects than non-RCTs (DSMD 5 �0.57; 95% CI 5 �1.05, �0.10;P 5 0.018) (Appendix at www.jclinepi.com). Finally, nosignificant indications of reporting bias among meta-analyses could be found with Egger’s test for any of thethree comparisons (Appendix at www.jclinepi.com).

3.4. Sensitivity analyses

For all three meta-epidemiologic comparisons, the resultsof the sensitivity analyseswere similar to the original analyses(Appendix at www.jclinepi.com), apart from some minorpoints. The difference between prospective and retrospectivenon-RCTs was no longer significant in any of the sensitivityanalyses, but the effect’s direction was consistent. Therefore,this was attributed to hampered precision following reductionin the sample size. The comparison of adequate vs. inade-quate/unclear random sequence generation seemed to be theless robust of the analyses with great variation of effect sizesbetween the original analysis (DSMD5 0.01) and the sensi-tivity analysis (DSMDs of �0.03 to 0.13).

4. Discussion

4.1. Evidence and comparison to literature

As far as we are aware, this is the first empirical assess-ment of study design influence on treatment effects in

Table 1. Summary of the included meta-analyses’ characteristics

Category n (%)

Systematic reviewsTotal (n) 17Publication years (range) 2009e2014

Meta-analysesTotal, n (%) 77 (100)Field, n (%)

Class II treatment (functional appliances) 16 (21)Class III treatment (skeletal anchorage) 3 (4)Cleft lip palate 10 (13)Lingual appliances 3 (4)Maxillary expansion 10 (13)Orthodontic education 2 (3)Self-ligating appliances 13 (17)Skeletal anchorage (effects) 5 (6)Skeletal anchorage (failure) 11 (14)Tooth extraction 4 (5)

Outcome type, n (%)Binary 17 (22)Continuous 60 (78)

Scope, n (%)Effectiveness 50 (65)Adverse effects 27 (35)

Outcome nature, n (%)Objective 73 (95)Subjective 4 (5)

Component trialsTotal, n (%) 333 (100)Trials per meta-analysis (range) 2e16Trials per meta-analysis (median) 4Trial design, n (%)

tRCT 49 (15)uRCT 38 (11)qRCT 18 (5)pCCT 89 (27)rCCT 109 (33)Unclear 30 (9)

Abbreviations: tRCT, randomized controlled trial (randomsequence generation clearly described and adequate); uRCT, random-ized controlled trial (unclear random sequence generation); qRCT,quasi-randomized controlled trial (random sequence generation inad-equate); pCCT, prospective nonrandomized controlled clinical trial;rCCT, retrospective nonrandomized controlled clinical trial; Unclear,nonrandomized controlled clinical trial with unclear design.

5S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

orthodontic clinical interventions. Despite the relativelyrestricted sample of included meta-analyses, study designsignificantly influenced observed effects.

Based on the empirical evidence, the results from RCTsdid not seem to differ significantly from the results of non-RCTs, when pooling all eligible meta-analyses together

Table 2. Results of the meta-epidemiologic analyses

Comparison Ref Exp MAs Trials (Ref/Exp)

Effect size (random-effects model) Heterogeneity

DSMD (95% CI)P-

value 95% PI I2 (95% CI) t2

1. Randomized vs.nonrandomized

pCCT, rCCT,unclear

uRCT, qRCT,tRCT

25 116 (44/72) 0.07 (�0.21, 0.34) �0.70, 0.83 26% (0%, 54%) 0.116

2. Prospective vs.retrospective

pCCT rCCT, unclear 40 190 (70/119) �0.30 (�0.53, �0.06) * �1.38, 0.79 59% (35%, 69%) 0.274

3. Adequate vs.inadequate/unclearrandom sequencegeneration

tRCT qRCT, uRCT 25 75 (37/38) 0.01 (�0.25, 0.26) �0.98, 0.99 60% (32%, 73%) 0.211

Abbreviations: Ref, reference; Exp, experimental; MA, meta-analysis; DSMD, difference in standardized mean differences; CI, confidence in-terval; PI, predictive interval; pCCT, prospective nonrandomized controlled clinical trial; rCCT, retrospective nonrandomized controlled clinicaltrial; unclear, nonrandomized controlled clinical trial with unclear design; uRCT, randomized controlled trial (unclear random sequence genera-tion); qRCT, quasi-randomized controlled trial (inadequate random sequence generation); tRCT, randomized controlled trial (random sequence gen-eration clearly described and adequate).

*P-value ! 0.05.

. (-0.70, 0.83)with estimated predictive intervalOverall (I2=26%; PH0=0.630)

Long 2013 6th

Ehsani 2014 2nd

Zhou 2014 01st

Pandis 2014 1st

Ehsani 2014 1st

Long 2013 5th

Yang 2014 1st

Yang 2014 4th

Ehsani 2014 4th

Perinetti 2014 1stYang 2014 2nd

Al-Jewair 2009 1st

Papadopoulos 2011 2nd

Perinetti 2014 2nd

Chen 2010 11th

Ehsani 2014 5th

Chen 2010 09th

Ehsani 2014 3rd

Zhou 2014 09th

Chen 2010 10th

Papadopoulos 2011 1stLi 2011 4th

Zhou 2014 11th

Long 2013 1st

Papageorgiou 2012 4th

Meta-analysis

0.07 (-0.21, 0.34)

0.30 (-1.53, 2.14)

0.44 (-0.29, 1.18)

-1.47 (-3.90, 0.96)

-0.53 (-1.75, 0.68)-0.51 (-1.59, 0.58)-0.49 (-1.76, 0.77)

-0.64 (-1.92, 0.64)-0.76 (-1.69, 0.17)

-1.47 (-3.56, 0.61)

∆SMD (95% CI)

0.02 (-1.22, 1.25)

0.66 (-1.11, 2.43)

0.10 (-1.47, 1.68)

-0.25 (-1.88, 1.38)

0.37 (-0.51, 1.25)

0.10 (-0.60, 0.79)0.07 (-1.25, 1.38)

1.09 (0.41, 1.77)

-1.38 (-2.70, -0.05)

0.97 (-0.23, 2.17)

2.45 (-4.56, 9.46)

-0.13 (-0.98, 0.72)

1.16 (-0.16, 2.48)

0.30 (-2.05, 2.64)

0.48 (-0.31, 1.28)

-0.06 (-1.63, 1.52)

0-3 -2 -1 -0.8 -0.5 -0.2 0.2 0.5 0.8 1 2 3

Difference in estimates between RCTs and non-RCTs

nIsTCR ni stceffe tnemtaert detalfnI flated treatment effects in non-RCTs

large medium small trivial

Fig. 2. Forest plot for the comparison of RCTs vs. non-RCTs. Because of data recoding, estimates on the right side of the forest plot indicate thatRCTs show smaller treatment effects than non-RCTs. DSMD, difference in standardized mean differences; RCT, randomized controlled trial; non-RCT, nonrandomized controlled trial; CI, confidence interval.

6 S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

(P 5 0.630). Previous empirical studies of binary outcomeseither confirm [32] or reject [33e36] a significant differ-ence between RCTs and non-RCTs. This difference is usu-ally interpreted as an overestimation in non-RCTs and notan underestimation in RCTs [37]. Almost all empiricalstudies have concluded that RCTs and non-RCTs can some-times differ substantially and that systematic review authorsshould try to include RCTs whenever possible.

Retrospective non-RCTs showed significantly inflatedtreatment effects compared with prospective non-RCTs.Similarly, a tendency for RCTs to agree more with prospec-tive compared with retrospective non-RCTs has beendescribed [34]. This can be explained by the fact that retro-spective non-RCTs might be more prone to selection and

observation bias than prospective non-RCTs. Also, in retro-spective non-RCTs, there is a higher risk for confoundingby indication, meaning that the choice of intervention isdetermined by the severity of the disease [38]. In orthodon-tics, this would translate to one treatment being allocated toa more compliant patient or to a patient with a less severemalocclusion, where the prognosis is more favorable. Incontrast, it is possible that for a more difficult case, earlycessation of treatment or a change of treatment plan mightalso play a role. Furthermore, it is important to discriminatebetween ‘‘consecutively enlisted’’ and ‘‘consecutivelyfinished’’ patients, as this also entails the risk of overestimat-ing the effect of an intervention [39,40]. It is also possible thatmany retrospective trials are conducted using data that have

. (-1.38, 0.79)with estimated predictive intervalOverall (I2=59%; PH0=0.018)

Feng 2012 5th

Dalessandri 2014 6th

Papageorgiou 2012 3rd

Zhou 2014 19th

Chen 2010 05th

Al-Jewair 2014 1st

Feng 2012 3rd

Papageorgiou 2012 1st

Dalessandri 2014 4th

Perinetti 2014 3rd

Feng 2012 2nd

Papageorgiou 2012 7th

Liu 2010 4th

Liu 2010 1st

Liu 2010 2ndDalessandri 2014 3rd

Papageorgiou 2012 4thZhou 2014 03rd

Yang 2014 5th

Ehsani 2014 2nd

Yang 2014 3rd

Zhou 2014 28th

Dalessandri 2014 1st

Ehsani 2014 6th

Perinetti 2014 2nd

Yang 2014 1st

Zhou 2014 06th

Perinetti 2014 5th

Zhou 2014 22nd

Papadopoulos 2011 1st

Zhou 2014 16th

Ehsani 2014 1st

Dalessandri 2014 9th

Papageorgiou 2012 5thYang 2014 2nd

Liu 2010 3rd

Zhou 2014 13th

Meta-analysis

Yang 2014 4th

Papageorgiou 2012 2nd

Ehsani 2014 5th

Perinetti 2014 1st

∆SMD (95% CI)

-1.56 (-2.52, -0.59)

-0.30 (-0.53, -0.06)

0.31 (-0.76, 1.38)

-1.23 (-2.77, 0.30)

-0.36 (-0.99, 0.26)

0.36 (-0.07, 0.80)

-0.36 (-0.79, 0.08)

-0.45 (-1.10, 0.19)

0.48 (-0.69, 1.66)

-0.26 (-1.44, 0.92)

-0.59 (-1.22, 0.04)

0.50 (0.04, 0.95)

0.27 (-1.25, 1.79)

-1.26 (-2.74, 0.22)

-0.96 (-2.45, 0.52)

0.18 (-0.62, 0.98)

-0.76 (-2.73, 1.22)-0.90 (-1.90, 0.10)

-0.14 (-0.72, 0.44)-0.20 (-1.41, 1.02)

1.18 (-0.81, 3.17)

-0.40 (-1.01, 0.21)

-0.26 (-1.63, 1.11)

-0.03 (-1.01, 0.95)

-0.37 (-1.15, 0.40)

0.29 (-1.44, 2.02)

0.18 (-1.57, 1.93)

0.13 (-1.97, 2.24)

-1.04 (-3.48, 1.39)

0.67 (-0.24, 1.59)

-1.34 (-10.15, 7.48)

1.66 (0.12, 3.20)

-0.93 (-3.20, 1.34)

-0.41 (-1.70, 0.88)

-0.53 (-1.74, 0.68)

-0.70 (-1.57, 0.17)-0.62 (-2.27, 1.04)

-0.37 (-1.59, 0.85)

-0.45 (-2.80, 1.89)

-0.59 (-1.57, 0.39)

-4.14 (-5.41, -2.87)

0.29 (-0.31, 0.89)

0-3 -2 -1 -0.8 -0.5 -0.2 0.2 0.5 0.8 1 2 3

Difference in estimates between retrospective and prospective non-RCTs

Inflated treatment effects in retrospective non-RCTs

Inflated treatment effects in prospective non-RCTs

large medium small trivial

Fig. 3. Forest plot for the comparison of retrospective vs. prospective non-RCTs. Because of data recoding, estimates on the left side of the forestplot indicate that retrospective non-RCTs show larger treatment effects than prospective non-RCTs. DSMD, difference in standardized mean dif-ferences; non-RCT, nonrandomized controlled trial; CI, confidence interval.

7S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

been collected for other purposes and therefore may not be ascomplete or unbiased as one would wish [39].

In the present study, the difference between RCTs andnon-RCTs varied between effectiveness and adverse effectsof interventions. It is accepted that RCTs cannot answer allquestions, as they are not always feasible, whereas they aremainly designed and powered to assess the efficacy of inter-ventions [18,40]. Estimations of a treatment’s adverse ef-fects may be prone to different biases than its efficacy[32]. RCTs may not be large enough or may not have suf-ficiently large follow-up to identify some long-term harms[33,41e44]. Moreover, generalizability of the RCTs’ re-sults may be limited for various reasons [45]; for example,as high-risk patients are often excluded from trials[32,33,46]. Our results might be considered as supportiveof this notion, as RCTs were more conservative than non-RCTs only regarding the effectiveness of interventions.When adverse effects of interventions were considered,the results of RCTs seemed to be inflated compared withthe results of non-RCTs.

Conversely, it has been acknowledged for some time thatnon-RCTs are more prone to bias than RCTs and thereforeprovide weaker evidence [47,48]. Risk of selection bias(baseline differences between the patients in the two groups)is widely regarded as the principal difference between RCTsand non-RCTs. Incorporating results of non-RCTs at highrisk of bias in a meta-analysis runs the danger of producing

a biased estimate with unwarranted precision [18,49]. Inaddition, there are indications that publication bias mightbe more severe for non-RCTs than for RCTs [50].

However, recent initiatives from the field of comparativeresearch have highlighted the need for trials that are moreeasily applicable to real-world settings [51] and for theincorporation of non-RCTs in reviews assessing harmful ef-fects [18]. Various statistical techniques have been intro-duced to deal with selection bias in non-RCTs, althoughnot always successfully [52,53]. Indeed, organizations suchas the Cochrane Collaboration and the Campbell Collabora-tion have decided to incorporate non-RCTs for assessingthe harmful effects of interventions (Cochrane since 2012and Campbell since 2000).

Non-RCTs are invaluable for investigating interventionsthat cannot ethically be randomized or for the assessment oflong-term outcomes, rare events, or adverse effects of inter-ventions. However, in cases where RCTs are not feasible orinappropriate, this does not mean that retrospective non-RCTs are the only alternative. Well-conducted prospectivenon-RCTs, which guard against sources of bias, can pro-vide complementary evidence and are not as complex orexpensive as RCTs. When including non-RCTs in system-atic reviews, it should be remembered that assessment ofthe risk of bias is more complicated for these types of trialthan for RCTs. It is advisable that systematic review teamsinclude a person with methodological expertise in assessing

Table 3. Results of the conducted subgroup analyses

Subgroup

1. Randomized vs. nonrandomized 2. Prospective vs. retrospective3. Adequate vs. inadequate/unclear

random sequence generation

MAs DSMD (95% CI)P-

value PSG MAs DSMD (95% CI)P-

value PSG MAs DSMD (95% CI)P-

value PSG

Class II treatment(functional appliances)

10 0.29 (�0.05, 0.64) * 14 �0.36 (�0.94, 0.22) d d d

Class III treatment(skeletal anchorage)

d d d 2 0.40 (�0.53, 1.33) d d d

Cleft lip palate d d d d d d 10 0.13 (�0.42, 0.68)Lingual appliances 3 �0.07 (�1.30, 1.16) d d d d d dMaxillary expansion 3 �0.11 (�1.09, 0.86) 7 0.17 (�0.19, 0.54) d d d

Orthodontic education 1 1.09 (0.41, 1.77) ** d d d d d d

Orthodonticeperiodonticinteractions

d d d d d d d d d

Self-ligating appliances 4 �0.18 (�0.75, 0.39) 1 �0.36 (�0.79, 0.08) 11 �0.04 (�0.31, 0.24)Skeletal anchorage

effects3 �0.88 (�1.54, �0.22) ** 1 1.66 (0.13, 3.20) * 4 �0.18 (�0.99, 0.63)

Skeletal anchoragefailure

1 �0.06 (�1.64, 1.52) 11 �0.49 (�0.75, �0.24) *** d d d

Teeth extraction d d d 4 �0.20 (�0.79, 0.38) d d d

Binary 4 �0.05 (�0.94, 0.85) 11 �0.49 (�0.75, �0.24) *** d d d d

Continuous 21 0.07 (�0.23, 0.37) 29 �0.18 (�0.50, 0.14) d d d

Effectiveness 17 0.32 (0.06, 0.58) * ** 24 �0.22 (�0.57, 0.14) 19 0.06 (�0.26, 0.37)Adverse effects 8 �0.57 (�1.05, �0.10) * 16 �0.40 (�0.63, �0.17) ** 6 �0.11 (�0.43, 0.22)

Subjective 3 �0.07 (�1.30, 1.16) d d d 1 �0.02 (�0.52, 0.48)Objective 22 0.07 (�0.22, 0.36) d d d 24 0.01 (�0.26, 0.28)

Abbreviations: MA, meta-analysis; DSMD, difference in standardized mean differences; CI, confidence interval; PSG, P-value for difference be-tween subgroups.

*P-value ! 0.05; **P-value ! 0.01; ***P-value ! 0.001.

8 S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

risk of bias and correctness of the analyses used, as well asawareness of the topic area [54,55]. Finally, when incorpo-rating non-RCTs in systematic reviews, it will always makesense to explore potential sources of heterogeneity andadopt a random-effects approach to acknowledge the unex-plained heterogeneity [56,57]. Various methods have alsobeen suggested to inform the meta-analysis results aboutthe extent of bias by empirically based priors [58] ordirectly by mixed treatment comparison meta-analysis[59] but are not widely used and may require specializedstatistical expertise and software [60].

4.2. Strengths and limitations

The strengths of this study include the extensive litera-ture search, which was not restricted to orthodontic journals[61]. Also, misclassification of component trials was mini-mized, as the full texts of every component trial in themeta-analyses were acquired and assessed first hand.Furthermore, random sequence generation was assessedbased on the Cochrane risk-of-bias tool [62]. No overlapexisted between the included studies, while the DSMD cal-culations took into account the between-study heterogene-ity [63]. Almost all meta-analyses agreed on the directionof the effect, the magnitude of which was medium to large,meaning that the observed differences could have a bearingon the clinical decision regarding this treatment. Finally,the results of the meta-epidemiologic analyses were rela-tively robust.

There are also some limitations to this study. Because ofthe inclusion criteria of this empirical assessment, only asubsample of existing meta-analyses could be included.For example, meta-analyses including studies of only onedesign were excluded, limiting the final sample of eligiblemeta-analyses. In addition, the use of design labels for trialsas ‘‘prospective’’ and ‘‘retrospective’’ is ambiguous, andboth the degree of prospectiveness of a study and its riskof bias can be variable [64]. Furthermore, the significancelevel of most results was marginally significant (significantat the 5% level), and the 95% PIs were not consistent,possibly endangering our confidence in these estimates.

4.3. Conclusions

Existing evidence from orthodontic meta-analyses indi-cates that study design of interventional clinical trials mighthave a bearing on estimated treatment effects. Based on ex-isting empirical evidence, intervention effects in orthodon-tic research seem to be inflated in non-RCTs compared withRCTs and in retrospective non-RCTs compared with pro-spective non-RCTs.

4.4. Recommendations for future research

Clear reporting of study design, preferably in the titleand/or abstract, stating if randomization took place andthe prospective or retrospective nature of the study is

essential. For clinical questions where randomization isfeasible, systematic reviews should preferably includeRCTs. If no RCTs are available or if authors decide to alsoinclude non-RCTs, a sensitivity analysis based on studydesign is warranted. For clinical questions where randomi-zation is unethical or not feasible, systematic reviewsshould preferably include prospective non-RCTs over retro-spective non-RCTs. If both study designs are included, asensitivity analysis is warranted. Conclusions from system-atic reviews based solely on non-RCTs, and especiallyretrospective non-RCTs, should be viewed with caution.

Acknowledgments

The authors thank Greg J. Huang and Anne-Marie Bollen(both from University of Washington, Seattle, WA) forproviding component studies included in their meta-analysesand Carlos Flores-Mir (University of Alberta, Edmonton,Canada) for providing raw study data from their includedmeta-analysis.

Supplementary data

Supplementary data related to this article can be found athttp://dx.doi.org/10.1016/j.jclinepi.2015.03.008.

References

[1] Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-

based medicinedhow to practice and teach EBM. New York, NY:

Churchill Livingstone; 1997.

[2] Egger M, Smith GD, Altman DG, editors. Systematic reviews in

health care: meta-analysis in context. 2nd edition. London, UK:

BMJ Books; 2001.

[3] Kunz R, Vist G, Oxman AD. Randomisation to protect against se-

lection bias in healthcare trials. Cochrane Database Syst Rev

2007;MR000012.

[4] Papageorgiou SN, Papadopoulos MA, Athanasiou AE. Evaluation of

methodology and quality characteristics of systematic reviews in or-

thodontics. Orthod Craniofac Res 2011;14:116e37.

[5] Papageorgiou SN, Papadopoulos MA, Athanasiou AE. Reporting

characteristics of meta-analyses in orthodontics: methodological

assessment and statistical recommendations. Eur J Orthod 2014;

36:74e85.

[6] Amat P. What would you choose: evidence-based treatment or an

exciting, risky alternative? Am J Orthod Dentofacial Orthop 2007;

132:724e5.

[7] Harrison JE. Clinical trials in orthodontics I: demographic details of

clinical trials published in three orthodontic journals between 1989

and 1998. J Orthod 2003;30:25e30.

[8] Gibson R, Harrison J. What are we reading? An analysis of the or-

thodontic literature 1999 to 2008. Am J Orthod Dentofacial Orthop

2011;139:e471e84.

[9] Pandis N, Polychronopoulou A, Madianos P, Makou M, Eliades T.

Reporting of research quality characteristics of studies published

in 6 major clinical dental specialty journals. J Evid Based Dent Pract

2011;11:75e83.

[10] Proffit WR. Evidence and clinical decisions: asking the right ques-

tions to obtain clinically useful answers. Semin Orthod 2013;19:

130e6.

9S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

[11] Zuccati G, Clauser C, Giorgetti R. Randomized clinical trials in or-

thodontics: reality, dream or nightmare? Am J Orthod Dentofacial

Orthop 2009;136:634e7.

[12] Meikle MC. Guest editorial: what do prospective randomized clin-

ical trials tell us about the treatment of class II malocclusions? A

personal viewpoint. Eur J Orthod 2005;27:105e14.

[13] Bondemark L, Ruff S. EJO open session 2013 a debate randomized

controlled trial (RCT): the gold standard or unobtainable fallacy.

Available at http://www.oxfordjournals.org/our_journals/eortho/

ejovideo.html. Accessed April 14, 2014.

[14] Naylor CD. Meta-analysis and the meta-epidemiology of clinical

research. BMJ 1997;315:617e9.[15] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence

of bias. Dimensions of methodological quality associated with esti-

mates of treatment effects in controlled trials. JAMA 1995;273:

408e12.

[16] Wood L, Egger M, Gluud LL, Schulz KF, J€uni P, Altman DG, et al.

Empirical evidence of bias in treatment effect estimates in

controlled trials with different interventions and outcomes: meta-

epidemiological study. BMJ 2008;336:601e5.[17] Savovi�c J, Jones H, Altman D, Harris R, J}uni P, Pildal J, et al. Influ-

ence of reported study design characteristics on intervention effect

estimates from randomised controlled trials: combined analysis of

meta-epidemiological studies. Health Technol Assess 2012;16:

1e82.

[18] Reeves BC, Higgins JPT, Ramsay C, Shea B, Tugwell P, Wells GA.

An introduction to methodological issues when including non-

randomised studies in systematic reviews on the effects of interven-

tions. Res Synth Meth 2013;4:1e11.

[19] Papageorgiou SN, Antonoglou G, Tsiranidou E, Jepsen S, J€ager A.

Bias and small-study effects influence treatment effect estimates: a

meta-epidemiological study in oral medicine. J Clin Epidemiol

2014;67:984e92.

[20] Koletsi D, Pandis N, Polychronopoulou A, Eliades T. What’s in a

title? An assessment of whether randomized controlled trial in a title

means that it is one. Am J Orthod Dentofacial Orthop 2012;141:

679e85.

[21] Thompson SG, Sharp SJ. Explaining heterogeneity in meta-

analysis: a comparison of methods. Stat Med 1999;18:2693e708.

[22] Knapp G, Hartung J. Improved tests for a random-effects meta-

regression with a single covariate. Stat Med 2003;22:2693e710.

[23] Higgins JP, Thompson SG. Controlling the risk of spurious findings

from meta-regression. Stat Med 2004;23:1663e82.

[24] Cohen J. Statistical power analysis for the behavioral sciences. 2nd

ed. New York: Academic Press; 1988.

[25] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-

analysis. Stat Med 2002;21:1539e58.

[26] Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring incon-

sistency in meta-analyses. BMJ 2003;327:557e60.[27] Ioannidis JP, Patsopoulos NA, Evangelou E. Uncertainty in hetero-

geneity estimates in meta-analyses. BMJ 2007;335:914e6.

[28] Orsini N, Bottai M, Higgins J, Buchan I. Heterogi: Stata module to

quantify heterogeneity in a meta-analysis. Available at http://

econpapers.repec.org/RePEc:boc:bocode:s449201. Accessed April

14, 2014.

[29] Higgins JP, Thompson SG, Spiegelhalter DJ. A re-evaluation of

random-effects meta-analysis. J R Stat Soc Ser A Stat Soc 2009;

172:137e59.

[30] Ioannidis JP. Interpretation of tests of heterogeneity and bias in

meta-analysis. J Eval Clin Pract 2008;14:951e7.[31] Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta-

analysis detected by a simple, graphical test. BMJ 1997;315:

629e34.

[32] Golder S, Loke YK, Bland M. Meta-analyses of adverse effects data

derived from randomised controlled trials as compared to observa-

tional studies: methodological overview. PLos Med 2011;8:

e1001026.

[33] Chou R, Helfand M. Challenges in systematic reviews that assess

treatment harms. Ann Intern Med 2005;142:1090e9.

[34] Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI,

Tektonidou MG, et al. Comparison of evidence of treatment effects

in randomized and nonrandomized studies. JAMA 2001;286:821e30.

[35] Tzoulaki I, Siontis KC, Ioannidis JP. Prognostic effect size of car-

diovascular biomarkers in datasets from observational studies versus

randomised trials: meta-epidemiology study. BMJ 2011;343:d6829.

[36] Jacobs WC, Kruyt MC, Moojen WA, Verbout AJ, Oner FC. No ev-

idence for intervention-dependent influence of methodological fea-

tures on treatment effect. J Clin Epidemiol 2013;66:1347e1355.e3.

[37] Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F,

et al. Evaluating non-randomised intervention studies. Health Tech-

nol Assess 2003;7. iiiex, 1e173.

[38] Psaty BM, Koepsell TD, Lin D, Weiss NS, Siscovick DS,

Rosendaal FR, et al. Assessment and control for confounding by indi-

cation in observational studies. J Am Geriatr Soc 1999;47:749e54.

[39] Johnston LE Jr. Moving forward by looking back: ’retrospective’

clinical studies. J Orthod 2002;29:221e6.

[40] Flores-Mir C. Can we extract useful and scientifically sound infor-

mation from retrospective nonrandomized trials to be applied in or-

thodontic evidence-based practice treatments? Am J Orthod

Dentofacial Orthop 2007;131:707e8.[41] Papanikolaou PN, Ioannidis JP. Availability of large-scale evidence

on specific harms from systematic reviews of randomized trials. Am

J Med 2004;117:582e9.

[42] Vandenbroucke JP. When are observational studies as credible as

randomised trials? Lancet 2004;363:1728e31.

[43] Papanikolaou PN, Christidi GD, Ioannidis JPA. Comparison of evi-

dence on harms of medical interventions in randomized and non-

randomized studies. CMAJ 2006;174:635e41.

[44] Vandenbroucke JP. What is the best evidence for determining harms

of medical treatment? CMAJ 2006;174:645e6.

[45] Zwarenstein M, Oxman A. Why are so few randomized trials useful,

and what can we do about it? J Clin Epidemiol 2006;59:1125e6.

[46] Hordijk-Trion M, Lenzen M, Wijns W, de Jaegere P, Simoons ML,

Scholte op Reimer WJ, et al. Patients enrolled in coronary interven-

tion trials are not representative of patients in clinical practice: re-

sults from the Euro Heart Survey on Coronary Revascularization.

Eur Heart J 2006;27:671e8.

[47] O’Connor D, Green S, Higgins JPT. Chapter 5: defining the review

question and developing criteria for including studies. In:

Higgins JPT, Green S, editors. Cochrane handbook for systematic

reviews of intervention. Version 5.1.0 (updated March 2011). The

Cochrane Collaboration; 2011. Available at www.cochranehandbook.

org. Accessed April 15, 2014.

[48] Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P,

et al. GRADE guidelines: 4. Rating the quality of evidenceestudy

limitations (risk of bias). J Clin Epidemiol 2011;64:407e15.

[49] Egger M, Schneider M, Davey Smith G. Spurious precision? Meta-

analysis of observational studies. BMJ 1998;316:140e4.

[50] Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication

bias in clinical research. Lancet 1991;337:867e72.[51] Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues

Clin Neurosci 2011;13:217e24.

[52] Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic

regression versus propensity score when the number of events is low

and there are multiple confounders. Am J Epidemiol 2003;158:

280e7.

[53] Lonjon G, Boutron I, Trinquart L, Ahmad N, Aim F, Nizard R, et al.

Comparison of treatment effect estimates from prospective non-

randomized studies with propensity score analysis and randomized

controlled trials of surgical procedures. Ann Surg 2014;259:18e25.

[54] Higgins JP, Ramsay C, Reeves B, Deeks JJ, Shea B, Valentine JC,

et al. Issues relating to study design and risk of bias when including

non-randomized studies in systematic reviews on the effects of in-

terventions. Res Synth Methods 2013;4:12e25.

10 S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

[55] Papageorgiou SN. Meta-analysis for orthodontists: Part IIdis all

that glitters gold? J Orthod 2014;41:327e36.

[56] Valentine JC, Thompson SG. Issues relating to confounding and

meta-analysis when including non-randomized studies in systematic

reviews on the effects of interventions. Res Synth Meth 2013;4:

26e35.

[57] Papageorgiou SN. Meta-analysis for orthodontists: Part Idhow to

choose effect measure and statistical model. J Orthod 2014;41:

317e26.[58] Welton NJ, Ades AE, Carlin JB, Altman DG, Sterne JAC. Models

for potentially biased evidence in meta-analysis using empirically

based priors. J R Stat Soc Ser A Stat Soc 2009;172:119e36.

[59] Dias S, Welton NJ, Marinho VCC, Salanti G, Higgins JPT,

Ades AE. Estimation and adjustment of bias in randomized evi-

dence by using mixed treatment comparison meta-analysis. J R Stat

Soc Ser A Stat Soc 2010;173:613e29.

[60] Hopewell S, Boutron I, Altman DG, Ravaud P. Incorporation of as-

sessments of risk of bias of primary studies in systematic reviews of

randomised trials: a cross-sectional study. BMJ Open 2013;3:

e003342.

[61] Mavropoulos A, Kiliaridis S. Orthodontic literature: an overview of

the last 2 decades. Am J Orthod Dentofacial Orthop 2003;124:

30e40.

[62] Higgins JP, Altman DG, Gøtzsche PC, J€uni P, Moher D, Oxman AD,

et al. The Cochrane Collaboration’s tool for assessing risk of bias in

randomised trials. BMJ 2011;343:d5928.

[63] Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Sta-

tistical methods for assessing the influence of study characteristics

on treatment effects in ‘meta-epidemiological’ research. Stat Med

2002;21:1513e24.

[64] Feinstein AR. Clinical epidemiology. The architecture of clinical

research. Philadelphia, PA: W. B. Saunders Company; 1985.

11S.N. Papageorgiou et al. / Journal of Clinical Epidemiology - (2015) -

1

Basic study design influences the results of orthodontic clinical investigations

Appendices Appendix A. Electronic databases and search strategy used in the original meta-epidemiological study (up to September 23rd, 2012) and in its update for this study (up to July 25th, 2014). Electronic databases Search strategy Hits

#1 - ("systematic review" OR "meta-analysis") 80798 #2 - tooth OR teeth OR dentist* OR dental OR endodont* OR orthodont* pedodont* OR paedodont* OR periodont* OR prosthodont*

92435

#3 - (oral OR maxillofacial) AND (implant OR surg*) 125845 #4 - (tooth OR teeth OR implant*) AND (prosth* OR restor* OR bridge* OR crown* OR denture*) 118504 #5 - #2 OR #3 OR #4 300055 #6 - #1 AND #5 1918

MEDLINE searched via PubMed (1950 – 23.09.2012) www.ncbi.nlm.nih.gov/sites/entrez/

Article Type: Systematic Reviews, Meta-Analysis 1702 Scopus (1966 - 23.09.2012) www.scopus.com

"systematic review" OR "meta-analysis" Limit to Dentistry 1741

tooth OR teeth OR dentist* OR dental OR endodont* OR orthodont* pedodont* OR paedodont* OR periodont* OR prosthodont* Limit to Cochrane Reviews & Other Reviews, Exclude Protocols

476 Cochrane Database of Systematic Reviews searched via The Cochrane Library on 23.09.2012 www.thecochranelibrary.com "Oral Health Group"

Limit to Cochrane Reviews & Other Reviews, Exclude Protocols 160

Thomson Reuters Web of Knowledge (1945 - 23.09.2012) http://apps.webofknowledge.com

"systematic review" OR "meta-analysis" Limit to Dentistry Oral Surgery Medicine Limit to Review

566 In Topic

Bibliografia Brasileira de Odontologia (including LILACS) searched on 23.09.2012 http://regional.bvsalud.org/php/index.php?lang=en

("systematic review" OR "meta-analysis") AND (tooth OR teeth OR dentist* OR dental OR endodont* OR orthodont* pedodont* OR paedodont* OR periodont* OR prosthodont* OR ((oral OR maxillofacial) AND (implant OR surg*)) OR ((tooth OR teeth OR implant*) AND prosth*)) Type of Study: Systematic Reviews

584

ADA Center for Evidence-Based Dentistry http://ebd.ada.org/

Manually -

PROSPERO http://www.crd.york.ac.uk/prospero/index.asp

Manually -

Digital Dissertations searched via UMI/ProQuest on 23.09.2012 http://proquest.umi.com/login

("systematic review" OR "meta-analysis") AND (tooth OR teeth OR dentist* OR dental OR endodont* OR orthodont* pedodont* OR paedodont* OR periodont* OR prosthodont* OR ((oral OR maxillofacial) AND (implant OR surg*)) OR ((tooth OR teeth OR implant*) AND prosth*)) Source: Dissertations & Theses Subject: Dentistry

439

Sum with overlap from original search 5668

MEDLINE searched via PubMed (2012 – 25.07.2014) www.ncbi.nlm.nih.gov/sites/entrez/

(orthodon* OR malocclusion) AND ("systematic review" OR "meta-analysis" OR "random-effects" OR "fixed-effect" OR "meta-regression") Limit to reviews, systematic reviews and meta-analyses

150

Sum with overlap from update only 150

2

Appendix B. Details and statistical code used for the analysis in Stata SE 10.0 (StataCorp, College Station, TX) A. Calculating effects within each meta-analysis

Firstly, the influence of each chosen basic study design was assessed within each eligible meta-analysis. In order for

a meta-analysis to be eligible one of the following data formats should be provided: (A1) raw continuous data

(sample, mean and standard deviation), (A2) already-calculated Standardized Mean Differences (SMDs) for

continuous data (SMD and the corresponding standard error), (A3) raw binary data (contingency 2 x 2 table) or (A4)

already-calculated odds ratios for binary data (together with their 95% confidence intervals).

SMD was chosen as the effect measure because it standardizes estimates by their variability and enables

overall synthesis [1]. For meta-analyses of continuous outcomes, raw data or SMDs were extracted for each

component trial (i.e. trial included in the original meta-analysis) and were recoded, so that a negative SMD was

beneficial. For binary outcomes, raw data or odds ratios were likewise extracted, recoded, and afterwards

transformed to SMDs [2] in order to enable synthesis of both continuous and binary outcomes together.

The SMD chosen for the analysis was Cohen’s d, defined as the difference between two means divided by

their pooled standard deviation:

s

xxd 21 −=

, where s is the pooled standard deviation, defined as:

2)1()1(

21

222

211

−+−+−=

nn

SnSns

For the calculation of difference in standardized mean differences (∆SMDs) from each included meta-analysis the

following formulas were used, according to the data provided:

A1. Raw continuous data: metan exn exm exsd ctrn ctrm ctrsd, random by(design) nograph metareg _ES design,wsse(_seES)

3

A2. Provided SMDs: metan smd sesmd,random by(design) nograph metareg smd design,wsse(sesmd) A3. Raw binary data: metan EVex NEVex EVctr NEVctr, random or log gen smd=0.5513*_ES gen sesmd=0.5513*_selogES metan smd sesmd,random nograph by(design) metareg smd design,wsse(sesmd) A4. Provided ORs: gen logOR = ln(OR) gen logORuci = ln(ORuci) gen selogOR = (logORuci-logOR)/1.96 gen smd=0.5513* logOR gen sesmd = 0.5513* selogOR metan smd sesmd,random nograph by(design) metareg smd design,wsse(sesmd) B. Meta-epidemiological synthesis among meta-analyses

For the pooling of ∆SMDs across meta-analyses the following formulas were used

B1. Fixed-effect meta-analysis: metan dsmd sedsmd B2. Random-effects meta-analysis: metan dsmd sedsmd, random rfdist heterogi [Q] [df],nc B3. Reporting bias assessment: metabias dsmd sedsmd,egger B4. Subgroup analyses: metan dsmd sedsmd, random rfdist by(factor) metareg dsmd factor,wsse(sedsmd) Figures 2-3 and Appendix E were standard output from command line B2. The values on the horizontal axis were defined within Stata from the image editor and the contours for effect magnitude were later adapted on them (image output was not processed) in Microsoft Powerpoint. Appendix F was extracted from command line B4 and adapted accordingly. Abbreviations exn=sample (experimental group; continuous outcome) exm=mean (experimental group; continuous outcome) exsd=standard deviation (experimental group; continuous outcome) ctrn=sample (control group; continuous outcome) ctrm=mean (control group; continuous outcome) ctrsd=standard deviation (control group; continuous outcome) design=variable including the basic design of the component trials included in the meta-analyses smd=standardized mean difference sesmd=standard error of the smd EVex=events in the experimental group (binary outcome)

4

NEVex=non-events in the experimental group (binary outcome) EVctr=events in the control group (binary outcome) NEVctr=non-events in the control group (binary outcome) selogOR=standard error of the log odds ratio logORuci=upper limit of the 95% CI for the log odds ratio logOR=log odds ratio dsmd=difference in the standardized mean differences sedsmd=standard error for the dsmd factor=variable with the subgrouping variable (field, adverse effects, binary, subjective,etc)

References to Appendix B

1. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. New York: Academic Press; 1988.

2. Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Stat Med

2000;19:3127–31.

5

Appendix C. Characteristics of the included meta-analyses (regarding all three meta-epidemiological comparisons)

A/A Meta-analysis Field Experimental group Control group Outcome Binary Adverse effects

Subjective Studies* Patients$

1 Al-Jewair 2009 1rst Orthodontic education

Computer-aided orthodontic learning

Conventional orthodontic learning

Knowledge gain (pre/post tests) No No No 3 90

2 Al-Jewair 2009 2nd

Orthodontic education Computer-aided orthodontic learning

Conventional orthodontic learning

Knowledge gain (post tests)

No No No 6 296

3 Al-Jewair 2014 1st

Class II malocclusion-functional appliances

MARA appliance Control (untreated)

Total mandibular unit length

No No No 6 266

4 Chen 2010 2nd Self-ligating brackets Self-ligating brackets Conventional brackets

Mandibular incisor alignment

No No No 2 114

5 Chen 2010 5th Self-ligating brackets Self-ligating brackets Conventional brackets

Time to remove ligation module

No No No 2 462

6 Chen 2010 6th Self-ligating brackets Self-ligating brackets Conventional brackets

Time to engage ligation module

No No No 2 462

7 Chen 2010 9th Self-ligating brackets Self-ligating brackets Conventional brackets

Intercanine width

No No No 3 140

8 Chen 2010 10th

Self-ligating brackets Self-ligating brackets Conventional brackets

Intermolar width No No No 3 140

9 Chen 2010 11th

Self-ligating brackets Self-ligating brackets Conventional brackets

Incisor inclination

No Yes No 3 140

10 Dalessandri 2014 1rst

Skeletal anchorage-failure rates

Male patients Female patients Implant failure Yes Yes No 14 [4361]

11 Dalessandri 2014 3rd

Skeletal anchorage-failure rates Site inflammation

No site inflammation Implant failure Yes Yes No 5 [992]

12 Dalessandri 2014 4th

Skeletal anchorage-failure rates

Maxilla Mandible Implant failure Yes Yes No 14 [4136]

13 Dalessandri 2014 6th

Skeletal anchorage-failure rates

Keratinized gingiva Non-keratinized gingiva

Implant failure Yes Yes No 5 [1110]

14 Dalessandri 2014 9th

Skeletal anchorage-failure rates

Thin implants Thick implants Implant failure Yes Yes No 8 [1969]

15 Ehsani 2014 1st

Class II malocclusion-functional appliances

Twin Block Control (untreated)

SNA angle No No No 5 303

16 Ehsani 2014 2nd

Class II malocclusion-functional appliances

Twin Block Control (untreated)

SNB angle No No No 5 303

17 Ehsani 2014 3rd

Class II malocclusion-functional appliances

Twin Block Control (untreated)

Co-Gn distance No No No 3 148

18 Ehsani 2014 4th

Class II malocclusion-functional appliances

Twin Block Control (untreated)

Lower anterior facial height

No No No 3 184

6

19 Ehsani 2014 5th

Class II malocclusion-functional appliances

Twin Block Control (untreated)

U1-NL No No No 3 184

20 Ehsani 2014 6th

Class II malocclusion-functional appliances

Twin Block Control (untreated)

LI-ML No No No 3 211

21 Feng 2012 2nd Class III-skeletal anchorage

Skeletal-anchored maxillary protraction

Tooth-anchored maxillary protraction

Maxillary advancement

No No No 3 135

22 Feng 2012 3rd Class III-skeletal anchorage

Skeletal-anchored maxillary protraction

Tooth-anchored maxillary protraction

Mandibular plane angle

No No No 3 135

23 Feng 2012 5th Class III-skeletal anchorage

Skeletal-anchored maxillary protraction

Tooth-anchored maxillary protraction

Maxillary molar extrusion

No No No 2 66

24 Jambi 2013 1st Distalization of first molars

Intraoral appliance Headgear Molar distalization

No No No 4 150

25 Jambi 2013 2nd

Distalization of first molars

Intraoral appliance Headgear Movement of upper incisors

No Yes No 4 150

26 Li 2011 4th Skeletal anchorage-effects

Skeletal anchorage Headgear Maxillary incisor inclination

No Yes No 2 77

27 Liu 2010 1st Tooth extraction Tooth extraction No tooth extraction

Lower lip prominence No Yes No 5 356

28 Liu 2010 2nd Tooth extraction Tooth extraction No tooth extraction

Upper lip prominence

No Yes No 3 136

29 Liu 2010 3rd Tooth extraction Tooth extraction No tooth extraction

Lower lip thickness

No Yes No 3 136

30 Liu 2010 4th Tooth extraction Tooth extraction No tooth extraction

Upper lip thickness

No Yes No 3 136

31 Long 2013 1st Lingual appliances Lingual appliances Labial appliances Pain intensity Yes Yes Yes 4 201 32 Long 2013 5th Lingual appliances Lingual appliances Labial appliances Eating difficulty Yes Yes Yes 4 201

33 Long 2013 6th Lingual appliances Lingual appliances Labial appliances Speaking difficulty

Yes Yes Yes 3 154

34 Pandis 2014 1st

Self-ligating brackets Self-ligating brackets Conventional brackets

Teeth alignment No No No 5 277

35 Pandis 2014 2nd

Self-ligating brackets Self-ligating brackets Conventional brackets

Teeth alignment No No No 2 123

36 Papadopoulos 2011 1st

Skeletal anchorage-effects

Skeletal anchorage Conventional anchorage

Anchorage loss No Yes No 10 206

37 Papadopoulos 2011 2nd

Skeletal anchorage-effects Skeletal anchorage

Conventional anchorage

Anchorage loss rate No Yes No 6 153

38 Papadopoulos 2012 5th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Speech development

No No No 2 74

7

39 Papadopoulos 2012 6th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

SNA angle No No No 2 329

40 Papadopoulos 2012 7th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

SNB angle No No No 2 299

41 Papadopoulos 2012 8th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

SN-MP angle No No No 2 151

42 Papadopoulos 2012 12th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch depth

No No No 2 60

43 Papadopoulos 2012 15th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch width I

No No No 2 52

44 Papadopoulos 2012 17th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch width II

No No No 2 65

45 Papadopoulos 2012 18th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch width III

No No No 2 65

46 Papadopoulos 2012 19th

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch form I

No No No 2 65

47 Papadopoulos 2012 22nd

Cleft lip with/without palate

Presurgical infant orthopedics

Control (untreated)

Maxillary arch form II

No No No 2 62

48 Papageorgiou 2012 1st

Skeletal anchorage-failure rates

Female patients Male patients Implant failure Yes Yes No 9 [1724]

49 Papageorgiou 2012 2nd

Skeletal anchorage-failure rates Adolescent patients Adult patients Implant failure Yes Yes No 5 [1124]

50 Papageorgiou 2012 3rd

Skeletal anchorage-failure rates

Right mouth side Left mouth side Implant failure Yes Yes No 8 [1295]

51 Papageorgiou 2012 4th

Skeletal anchorage-failure rates

Maxilla Mandible Implant failure Yes Yes No 16 [2315]

52 Papageorgiou 2012 5th

Skeletal anchorage-failure rates

Cortical bone thickness<1mm

Cortical bone thickness>1mm

Implant failure Yes Yes No 2 [296]

53 Papageorgiou 2012 7th

Skeletal anchorage-failure rates

No root contact Root contact Implant failure Yes Yes No 4 [604]

54 Papageorgiou 2014 2nd

Self-ligating brackets Self-ligating brackets Conventional brackets

Tooth alignment No No No 5 281

55 Papageorgiou 2014 3rd

Self-ligating brackets Self-ligating brackets Conventional brackets

Intercanine width

No No No 5 284

56 Papageorgiou 2014 4th

Self-ligating brackets Self-ligating brackets Conventional brackets

Intermolar width No No No 5 284

57 Papageorgiou 2014 8th

Self-ligating brackets Self-ligating brackets Conventional brackets

Mandibular incisor inclination

No Yes No 3 174

58 Papageorgiou 2014 9th

Self-ligating brackets Self-ligating brackets Conventional brackets

Pain intensity No Yes Yes 5 266

59 Perinetti 2014 1st

Class II malocclusion-functional appliances

Fixed funtional appliances

Control (untreated)

Total mandibular

No No No 5 325

8

length

60 Perinetti 2014 2nd

Class II malocclusion-functional appliances

Fixed funtional appliances

Control (untreated)

Composite mandibular length

No No No 4 228

61 Perinetti 2014 3rd

Class II malocclusion-functional appliances

Fixed funtional appliances with fixed appliances

Control (untreated)

Total mandibular length (pubertal patients)

No No No 6 352

62 Perinetti 2014 5th

Class II malocclusion-functional appliances

Fixed funtional appliances with fixed appliances

Control (untreated)

Composite mandibular length

No No No 2 89

63 Yang 2014 1st Class II malocclusion-functional appliances

Fränkel appliance Control (untreated)

SNA angle No No No 5 234

64 Yang 2014 2nd

Class II malocclusion-functional appliances

Fränkel appliance Control (untreated)

SNB angle No No No 5 234

65 Yang 2014 3rd Class II malocclusion-functional appliances

Fränkel appliance Control (untreated)

MPA angle No No No 4 136

66 Yang 2014 4th Class II malocclusion-functional appliances

Fränkel appliance Control (untreated)

ANB angle No No No 5 234

67 Yang 2014 5th Class II malocclusion-functional appliances Fränkel appliance

Control (untreated) Overjet No No No 4 163

68 Zhou 2014 1st Maxillary expansion Slow maxillary expansion

Control (untreated)

Maxillary intermolar width

No No No 4 220

69 Zhou 2014 3rd Maxillary expansion Rapid maxillary expansion

Control (untreated)

Maxillary intermolar width

No No No 6 479

70 Zhou 2014 6th Maxillary expansion Rapid maxillary expansion

Slow maxillary expansion

Maxillary intermolar width

No No No 3 104

71 Zhou 2014 9th Maxillary expansion Slow maxillary expansion

Control (untreated)

Maxillary intercanine width

No No No 3 112

72 Zhou 2014 11th Maxillary expansion

Slow maxillary expansion

Control (untreated)

Mandibular intermolar width No No No 4 220

73 Zhou 2014 13th

Maxillary expansion Rapid maxillary expansion

Control (untreated)

Maxillary intercanine width

No No No 6 479

74 Zhou 2014 16th

Maxillary expansion Rapid maxillary expansion

Control (untreated)

Maxillary interpremolar width

No No No 6 479

75 Zhou 2014 19th

Maxillary expansion Rapid maxillary expansion

Control (untreated)

Mandibular intermolar width

No No No 5 429

76 Zhou 2014 22nd

Maxillary expansion Rapid maxillary expansion

Slow maxillary expansion

Maxillary intercanine

No No No 3 104

9

width

77 Zhou 2014 28th Maxillary expansion

Rapid maxillary expansion

Slow maxillary expansion

Mandibular intermolar width No No No 2 84

*Number refers to the trials/trial arms included in the original meta-analysis. Trials might have been omitted from the meta-epidemiological comparison for overlap or eligibility reasons. $Numbers in parentheses indicate number of implants and not number of patients. Multiple implants might have been inserted per patient.

10

Appendix D. Included trials in the selected meta-analyses with the judgments about their design (citations available upon request).

TrialID* First author/Year Design Description (quot ation from published report)

9699403 Akkaya 1998 pCCT Subjects with a maxillary bilateral crossbite were selected and two treatment groups each with 12 patients were constructed.

19524823 Bacceti 2009 pCCT

The conditions for patient enrollment, based on personal choice, could be assimilated to a random allocation of patients. Each practitioner enrolled 28 patients consecutively for treatment with his respective specific 1-phase orthodontic protocol. One practitioner performed HG1FA therapy with Class II elastics, and the other treated 28 patients with the BH 1 FA protocol.

9674676 Bondemark 1998 pCCT Two groups of 20 adolescents (5 boys and 15 girls in each group), one treated and one untreated, participated in the study and were followed longtudinally for 5 years.

9088600 Bravo 1997 pCCT None of the patients presented with severe craniofacial anomalies and all were to be treated with Edgewise appliances.

15747820 Caniklioglu 2005 pCCT After three months of treatment, each patient completed a seven-part survey with 12 questions…

20578848 Cevidanes 2010 pCCT Success of therapy at the end of the observation period was not a determinant factor for selection of patients because this sample was collected prospectively.

14982362 Cheng 2004 pCCT The aim of this prospective study was to investigate the complications and failures of orthodontic mini-implants in a series of consecutive patients.

23876947 Chiqueto 2013 pCCT ...all patients signed an informed consent form before participating in the study. Inclusion criteria were as follows:...no previous orthodontic treatment.

19892288 El-Beialy 2009 pCCT The experiment ended 6 months after loading the miniscrew.

16679203 Geran 2006 pCCT The patients examined were part of a prospective clinical investigation, the Michigan Expansion Study, of mixed dentition patients who underwent RME in a private faculty practice.

22423185 Ghislanzoni 2013 pCCT The purpose of this prospective clinical trial, therefore, was to investigate the role of timing in the treatment of Class II malocclusion with MARA and fixed appliances with respect to Class II untreated control data.

17524619 Hedayati 2007 pCCT In this prospective study nine patients with a mean age of 17.4 years (range 15.5–19 years) were randomly selected from patients referred to the Orthodontic Department

18384412 Justens 2008 pCCT Patients received a questionnaire to assess their opinion regarding the treatment…

20816295 Kim 2010 b pCCT

They volunteered to receive orthodontic treatment with maxillary mini-implants and agreed to have a CBCT scan after mini-implant placement. They also consented to have the removal torque measured with a digital device after treatment.

12142899 Kocadereli 2002 pCCT None of them had a severe craniofacial anomaly, and all were to be treated with edgewise appliances.

9457025 Lund 1998 pCCT This prospective controlled study investigated the net effects of the Twin Block functional appliance taking into account the effects of normal growth in an untreated control group.

17580417 Luzi 2007 pCCT In this prospective clinical study we investigated the factors of importance for the failure rate of immediately loaded orthodontic mini-implants.

12940553 McNamara Jr 2003 pCCT

Treated group. The treated sample analyzed in this study (112 subjects, 61 females and 51 males) was part of a long-term prospective study on consecutively treated patients who had undergone Haas-type RME and nonextraction edgewise appliance therapy in a single orthodontic practice.

20413451 Miyazawa 2010 pCCT Eighteen patients (5 males and 13 females; mean age 23.8; minimum 10.7 and maximum 45.5 years) who required skeletal anchorage for orthodontic therapy were included in this prospective study.

16441792 Motoyoshi 2006 pCCT The data presented here are from subjects who gave their consent to be part of

11

the study before surgical placement of the mini-implant.

17974113 Motoyoshi 2007 a pCCT This study used only the diagnostic materials of patients who consented to the placement of mini-implants in cooperation with this study.

17521887 Motoyoshi 2007b pCCT The data presented here are from patients who gave their consent to be part of the study before placement of the mini-implant.

16905065 O'Grady 2006 pCCT The patients examined were part of the MES, a prospective clinical investigation of mixed-dentition patients who had undergone RME.

6961781 Pancherz 1982 pCCT The control subjects were followed on a parallel basis with the treated subjects during a time period of 6 months…

22640678 Phelan 2012 pCCT This prospective clinical study was based on the records of 34 consecutively treated patients from the private practice of the third author (R.H.).

R_987654321 Rosenberg 2008 pCCT

..Hence students would not be penalized for agreeing to use an educational tool that they might not benefit from. ..A detailed answer sheet was developed prior to administration of the test so as to standardize grading.

9082855 Sandikcioglu 1997 pCCT Patients with unilateral or bilateral posterior crossbites in the mixed dentition were studied. They were divided into three groups of 10 patients in each group.

21536207 Sar 2011 pCCT

The aim of this prospective clinical study was to evaluate the skeletal, dentoalveolar, and soft tissue effects of maxillary protraction with miniplates compared with conventional facemask therapy and an untreated Class III control group.

21750242 Shalish 2012 pCCT Consecutive patients were recruited prospectively from the orthodontic clinic in the Hebrew University-Hadassah School of Dental Medicine and from two private clinics.

21536211 Suzuki 2011 pCCT In this study, direct analysis was performed of MIT and MRT values of predrilling and self-drilling miniscrew implants that were used as skeletal anchorage in orthodontic patients.

12401054 Thomson 2002 pCCT This investigation used epidemiological data from a longstanding prospective observational study to systematically evaluate the equity, efficacy, effectiveness, and safety of orthodontic treatment

17346597 Turnbull 2007 pCCT

Introduction: In this prospective clinical study, we assessed the relative speed of archwire changes, comparing self-ligating brackets with conventional elastomeric ligation methods, and further assessed this in relation to the stage of orthodontic treatment represented by different wire sizes and types.

19577145 Viwattanatipa 2009 pCCT No patients withdrew from our study.

R_987654309 Waheed-Ul-Hameed 2002

pCCT

A total of 20 functional class-3 cases were chosen from a general clinic intake. 10 of the 20 cases formed the treatment group while 10 untreated patients were taken as a control group....The patients were observed over a period of one year.

17348892 Wiechmann 2007 pCCT The aim of this prospective study was therefore to investigate the clinical outcome of orthodontic micro-implants by a prospective study in a series of consecutive patients.

20018798 Wu 2010 pCCT Sixty adult patients treated in the Orthodontic Department, Prince Philip Dental Hospital, Hong Kong, over a 3 month period were included in this age-matched case–control prospective longitudinal study.

14717690 Aly 2004 qRCT Students were alphabetically assigned to each group. They could not themselves select in what way to study the subject matter.

18193961 Chaddad 2008 qRCT

Ten healthy patients, ages 13 to 65 years, whose treatment plan included the use of temporary anchorage devices (TADs), were included in the study...The two mini-implant systems were alternately placed until a minimum of 15 mini-implants were placed for both systems

15259572 Lohmander 2004 qRCT The first ten children (five girls and five boys) were treated with infant orthopaedics (IO-group), whereas the following ten children (two girls and eight boys) were not treated with infant ortopaedics (no-IO-group).

11709597 Lowe 2001 qRCT Eighty-five 3rd year undergraduate dental students allocated by pseudo-

12

randomisation to two groups. 16429868 Miles 2005 qRCT Consecutive eligible patients were assigned alternately to one of two groups. 17693371 Pandis 2007 qRCT Personal communication with author. 19959610 Pandis 2010 qRCT Personal communication with author.

6951655 Peat 1982 qRCT

The decision to use pre-surgical oral orthopedics depended mainly on traveling distance. If traveling involved several hundred miles then no early treatment was carried out, although other cases simply slipped through the early screening.

22214390 Al-Jewair 2012 rCCT

A retrospective study was conducted using lateral cephalograms in habitual occlusion of adolescent patients who received treatment for their Class II skeletal malocclusions using the MARA or the AdvanSync functional appliances.

12938839 Allais 2003 rCCT A retrospective case–control study based on the analysis of study-casts and intra-oral slides of 300 adult patients was carried out

19651342 Antoszewska 2009 rCCT Our aims in this retrospective study were to investigate the success rates of MIs,…

15014405 Baik 2004 rCCT The criteria for sample selection were as follows: … (5) no appliances used other than the FR III, and (6) good cooperation during the treatment period (the patients wore the appliance for at least 14 hours per day).

10730669 Bowman 2000 rCCT A sample of 120 Caucasian orthodontic patients (70 extraction and 50 nonextraction, Table 1) was randomly selected from the senior author’s treatment files.

17868386 Chen 2007 rCCT Objectives: The aim of this retrospective study was to assess systematically the case distribution among three types of mini-implants and …

18983323 Chen 2008 rCCT Objectives: The aim of this retrospective study was to evaluate systematically the potential factors that influence failure rates of temporary anchorage devices (TADs) used for orthodontic anchorage.

10474101 Erdinc 1999 rCCT In this retrospective study, altogether 37 cases with posterior crossbites forming two treatment groups and one control group were treated at the Department of Orthodontics, Istanbul University, Faculty of Dental Medicine.

21299410 Franchi 2011 rCCT

A sample of 32 subjects with Class II division 1 malocclusion ... was treated consecutively at a single private practice by one of the authors. A sample of 27 subjects was selected from the files of the University of Michigan Growth Study (12 subjects) and of the Denver Child Growth Study (15 subjects).

21299408 Ghislanzoni 2011 rCCT From a parent sample of 62 Class II division 1 subjects treated consecutively with the MARA appliance, 23 subjects were selected according to the following inclusion criteria

10833001 Handelman 2000 rCCT Inclusion in the study required that the subject be 18 years or older at the time of the pretreatment records and that records from before and after appliance therapy were available.

11683811 Harradine 2001 rCCT Design – A retrospective study of two groups.

16257988 Isik 2005 rCCT Pre- and post-treatment orthodontic models of 84 patients comprised the subject matter of this retrospective study (Table 1).

12923511 Janson 2003 rCCT Because this was a retrospective study, the information regarding hygiene status during treatment was obtained from the clinical charts

15827701 Kalavritinos 2005 rCCT The aim of this retrospective clinical study was to evaluate dental arch, skeletal, dentoalveolar, and soft tissue profile changes following treatment of Class III malocclusion by means of the Function Regulator (FR-3) appliance.

17465652 Kucukkeles 2007 rCCT The sample consisted of the records from 45 growing patients (22 boys, 23 girls) exhibiting skeletal Class II malocclusion characterized by mandibular retrognathism.

17208101 Kuroda 2007 rCCT Seventy-five patients, 116 titanium screws of 2 types, and 38 miniplates were retrospectively examined.

17448389 Kuroda 2007 rCCT Seventy-five patients, 116 titanium screws of 2 types, and 38 miniplates were retrospectively examined.

13

7625394 Ladner 1995 rCCT A retrospective study of dental and maxillary skeletal changes occurring during a period of orthodontic treatment was made from pretreatment and posttreatment dental casts.

20152674 Lee 2010 rCCT

One hundred forty-one orthodontic patients (treated from October 1, 2000, to November 29, 2007) were included in this survival study. Oral hygiene status was determined by reviewing orthodontic records and intraoral photos subjectively.

18929269 Levin 2008 rCCT The aim of this retrospective controlled investigation was to analyze the short-term and long-term skeletal and dentoalveolar treatment outcomes of Function Regulator 3 (FR-3) therapy.

18405816 Lim 2008 rCCT Fifty premolar extraction and 50 nonextraction Korean patients were selected from the files of Chonnam National University Hospital in Gwangju, Korea.

19651354 Lim 2009 rCCT One hundred fifty-four consecutive patients (47male, 107 female; mean age, 21.9 years; SD, 8.3 years) who had miniscrews placed as orthodontic anchorage were included in this retrospective study

20926556 Manni 2011 rCCT In this retrospective study, conducted in a private practice, the records of 300 miniscrews inserted in 132 consecutive patients (80 females, 60.6 per cent) by the same surgeon were evaluated.

9674675 Mills 1998 rCCT Pretreatment and posttreatment cephalometric records of 28 consecutively treated patients with Class II malocclusions were evaluated and compared with an age- and sex-matched sample of untreated Class II control subjects.

14560266 Miyawaki 2003 rCCT The clinical features and treatment progress for 1 year were retrospectively examined for the 51 subjects.

18193973 Moon 2008 rCCT Materials and Methods: Four hundred eighty OMI placed in 209 orthodontic patients were examined retroactively.

20620833 Moon 2010 rCCT The samples in this retrospective study consisted of 778 OMIs

3181296 Ogaard 1988 rCCT Ninety-eight individuals were examined of whom 51 (28 girls and 23 boys), had received orthodontic treatment. They had been treated for different forms of malocclusion in private practices.

20691348 Ong 2010 rCCT Patient records were included if they satisfied the following inclusion criteria:...(3) intraoral photos and study models were available at pretreatment (T0), 10 weeks (T1), and 20 weeks (T2) postbonding; …

22432591 Pangrazio 2012 rCCT This retrospective cephalometric study examined 30 consecutively treated patients involving 12 boys with a mean age of 11.9 years

12637901 Pangrazio-Kulbersh 2003

rCCT The following criteria were established for the sample:...(6) the appliance was not removed prematurely due to breakage.

18405824 Park 2008 rCCT Sixteen nongrowing patients (14 women, 2 men; ages, 22.5 4.8 years) who had been treated orthodontically for bialveolar protrusion were selected.

18617111 Phatouros 2008 rCCT

The purpose of this retrospective study was to estimate the area change of the palate after rapid maxillary expansion (RME) in the early mixed dentition stage by using a 3-dimensional (3D) helical computed tomography (CT) scanning technique.

W_000080316600375 Ribeiral 1999 rCCT This study aimed at analyzing the periodontal state of 53 orthodontic treated patients (finished cases for at least two years), whose ages ranged from 17 to 26 years.

22084789 Sharma 2011 rCCT The aim of this retrospective study was to find factors related to the clinical success of micro-implants in Asian patients.

21674183 Shundo 2012 rCCT The subjects in the treatment and control groups were selected retrospectively

19852635 Siara-Olds 2010 rCCT In this retrospective long-term investigation, the treatment groups were chosen strictly based upon the appliance used for the correction of the Class II malocclusion and not upon their treatment responses.

15947523 Sidlauskas 2005 rCCT Cephalometric analysis of skeletal and dentoalveolar facial structures of 34 Class II Divison 1 patients treated with Twin-block appliance was performed using the same reference system before and after treatment.

10587592 Toth 1999 rCCT This retrospective cephalometric study compares the treatment effects produced in 40 patients treated…

14

19615569 Wu 2009 rCCT

From January 2001 to December 2006, 166 patients (35 male patients and 131 female patients; mean age, 26.5 8.9 years) who had received mini-implants (total number, 414) for orthodontic anchorage at the Section of Orthodontics and Pediatric Dentistry, Taipei Veterans General Hospital (Taipei, Taiwan) were enrolled in this study

16679208 Xu 2006 rCCT Records of 39 borderline patients treated at the Faculty Clinic of the Peking University Orthodontic Department were evaluated retrospectively by 5 associate professors.

18984393 Yao 2008 rCCT The subjects in this retrospective study included 47 adults (4 men, 43 women).

21357655 Baysal 2013 tRCT Randomization was made at the start of the study with preprepared random number tables with block stratification on gender.

16279817 Bondemark 2005 tRCT A restricted randomization method was used in blocks of 10 to ensure that equal numbers of patients were allocated to each of the two treatment groups.

19123718 Fleming 2009 tRCT

An unstratified subject allocation sequence was generated by a computer program; random numbers were generated and assignment was concealed from the clinician until the time of the appointment at which the appliance was to be placed.

19409342 Fleming 2009 a tRCT An unstratified subject allocation sequence was made by using a computer- generated randomization program.

19732667 Fleming 2009 b tRCT An unstratified subject allocation sequence was generated using an electronic randomization program.

21195256 Godoy 2011 tRCT For randomization, numbers were randomly drawn from a plastic bag.

23075062 Khattab 2012 tRCT They were divided into two groups; patient assignment was based on computer-generated random numbers.

12498603 Konst 2003 tRCT Babies with complete UCLP without soft tissue bands were recruited within 2 weeks after birth and randomly assigned to one of two groups by means of computerized balancing with regard to alveolar cleft width and birth weight.

19602104 Liu 2009 tRCT After informed consent, they were randomly assigned to two groups with the aid of a table of random numbers:

18540016 Ma 2008 tRCT The subjects were randomly divided (RandA1.0 Software, Planta Medical Technology and Development Co. Ltd, Beijing, China) into two equal groups, one anchored by intraoral micro-implants and one by extraoral headgear

21889063 Pandis 2011 tRCT

Fifty patients were randomized to either a conventional or a self-ligating appliance. The statistical software package was used by the first author, and the user-written ralloc command was implemented to generate the random allocation sequence.

18538237 Petren 2008 tRCT 4 opaque envelopes were prepared with 20 sealed notes in each (5 notes for each group).

11695749 Prahl 2001 tRCT A computerised balanced allocation method was used in order to reduce imbalance on relevant prognostic factors between groups.

19651344 Pringle 2009 tRCT Simple randomization was done with a computer-generated list of random numbers.

18929262 Scott 2008 a tRCT

The subjects were randomly allocated for treatment with either Damon3 self-ligating brackets or Synthesis (Ormco) preadjusted edgewise brackets by using a restricted random number table to ensure equivalence of numbers in each group.

18339656 Scott 2008 b tRCT Randomization was carried out using a table of random numbers.

18617099 Upadhyah 2008a tRCT A restricted randomization method was used in blocks of 10 to ensure that equal numbers of patients were allocated to each treatment group.

18298220 Baek 2008 Unclear No description 17124564 Berens 2006 Unclear No description R_987654320 Cha 2011 Unclear No description 10194289 Franchi 1999 Unclear No description 19087583 Jiang 2008 Unclear No description 20122433 Kim 2010 a Unclear No description

15

18963818 Motoyoshi 2009 Unclear This study used only the data for patients who consented to placement of the miniimplants and agreed to participate in the study.

16849067 Park 2006 Unclear No description 19472896 Wang 2009 Unclear No description 19649577 Wilmes 2009 Unclear No description

20231213 Acar 2010 uRCT

The patients were randomly divided into two groups: group 1 (seven females and eight males; mean age 15.0 ± 3.4 years), were treated with a pendulum appliance supported with a K-loop buccally, while subjects in group 2 (10 females and 5 males; mean age 14.2 ± 2.9 years) were treated with CHG.

20386216 Basha 2010 uRCT A comparative study consisting of 14 patients (all females) randomized into 2 groups.

9459032 Clark 1997 uRCT Students were randomly allocated either to the lecture group or the computer group.

17628251 De Oliveira 2007 uRCT This was a controlled clinical trial on the effects of 2 approaches for Class II correction, with randomization in the assignment of the 2 treatment regimens.

9825553 Illing 1998 uRCT The first 58 patients who had been on the waiting list for the longest period were randomly allocated to one of three groups involving treatment with either a Bass appliance, a Bionator, or a Twin Block appliance.

3519716 Irvine 1986 uRCT All 56 sophomore dental students volunteered to participate and were randomly assigned to instruction either by means of the computer or via the lecture method

R_987654310 Jiang 2009 uRCT [patients were randomly allocated to one of two groups]

12056770 Komolpis 2002 uRCT The 103 members of the second-year dental class were randomly assigned into two study groups, conventional and web-based.

6388628 Luffingham 1984a uRCT Students were allocated randomly into ten groups 20575195 Miles 2010 uRCT The subjects were randomly allocated to one of two groups.

8734726 Mishima 1996 uRCT Eight other children [Hotz(-) group] were selected at random who did not receive the appliance.

R_987654312 Shi 2008 uRCT The anterior teeth of 10 randomly selected cases were retraced by anchorage of miniscrews (Xi’an Zhongbang) with diameter of 1.5 mm and a length of 8 mm.

21696108 Toy 2011 uRCT The subjects were randomly allocated to either a group treated with an intra-oral Pendulum appliance with a midline expansion screw (PEN) or a group treated with a Ricketts-type cervical headgear (CHG).

8198080 Ulgen 1994 uRCT The patients were divided randomly into treatment and control groups. 19061808 Upadhyah 2008b uRCT Then the subjects were randomly divided into 2 groups before treatment R_987654311 Uzdil 2008 uRCT [randomly allocated in the four groups]

21478298 Wahab 2012 uRCT

Twenty-nine patients (10 males and 19 females), between 14 and 30 years of age (mean 20.7 years) who met the inclusion criteria, were invited to participate and were randomly allocated to be treated using either SLB or CLB.

*TrialIDs without lettering indicate PubMed Identifiers; TrialIDs starting with W_ indicate Web of Knowledge unique identifiers; TrialIDs starting with R_ indicate random unique IDs generated for non-indexed component trials. pCCT, prospective clinical controlled trial; qRCT, quasi-randomized controlled trial; rCCT, retrospective clinical controlled trial; tRCT, randomized controlled trial (clear and adequate description of random sequence generation method); Unclear, clinical trial with unclear design; uRCT, randomized controlled trial (unclear random sequence generation method).

. (-0.98, 0.99) with estimated predictive interval

Overall (I2=60%; PH0=0.957)

Chen 2010 11th

Jambi 2013 2nd

Jambi 2013 1st

Papadopoulos 2012 18th

Papadopoulos 2011 2nd

Chen 2010 02nd

Papadopoulos 2012 15th

Papadopoulos 2012 19th

Chen 2010 09th

Papadopoulos 2012 05th

Pandis 2014 2nd

Chen 2010 10th

Papageorgiou 2014 03rd

Papageorgiou 2014 02nd

Pandis 2014 1st

Papadopoulos 2011 1st

Papadopoulos 2012 07th

Papadopoulos 2012 12th

Papageorgiou 2014 08th

Papadopoulos 2012 17th

Papadopoulos 2012 08th

Papadopoulos 2012 22nd

Papageorgiou 2014 09th

Papadopoulos 2012 06th

Papageorgiou 2014 04th

Meta-analysis

-1.16 (-2.28, -0.03)

-1.86 (-3.08, -0.64)

-1.34 (-2.10, -0.58)

-0.10 (-0.55, 0.36)

0.06 (-0.58, 0.70)

0.01 (-0.25, 0.26)

-0.55 (-1.30, 0.20)

0.63 (-1.17, 2.44)

0.40 (-1.83, 2.64)

-0.57 (-1.81, 0.68)

0.73 (-0.02, 1.47)

0.91 (-0.18, 2.00)

0.17 (-0.57, 0.91)

1.41 (0.01, 2.81)

-0.23 (-0.94, 0.48)

0.05 (-0.43, 0.54)

0.12 (-0.45, 0.69)

0.16 (-1.26, 1.58)

-0.49 (-2.14, 1.15)

1.41 (0.29, 2.54)

0.17 (-0.48, 0.81)

-0.92 (-2.04, 0.21)

1.90 (0.74, 3.05)

-0.02 (-0.52, 0.48)

-0.03 (-0.46, 0.40)

0.18 (-0.31, 0.66)

ΔSMD (95% CI)

-3 -2 -1 -0.8 -0.5 -0.2 0 0.2 0.5 0.8 1 2 3

Difference in estimates between RCTs with adequate

and RCTs with inadequate/unclear sequence generation

Inflated treatment effects in RCTs with

adequate sequence generation

Inflated treatment effects in RCTs with

inadequate/unclear sequence generation

large medium small trivial

Appendix E. Forest plot for the comparison of RCTs with adequate random sequence generation and RCTs with

inadequate/unclear random sequence generation. Due to data recoding, estimates on the left side of the forest plot

indicate that RCTs with inadequate/unclear random sequence generation show larger treatment effects than RCTs with

adequate random sequence generation. ΔSMD = difference in standardized mean differences; RCT = randomized

controlled trial.

. (-0.70, 0.83)

. (0.01, 0.63)

. (-1.17, 0.02)

with estimated predictive interval

with estimated predictive interval

with estimated predictive interval

(P between subgroups=0.005)

.

.

Overall (I2=26%; PH0=0.630)

Ehsani 2014 3rd

Ehsani 2014 5th

Ehsani 2014 2nd

Papadopoulos 2011 1st

Al-Jewair 2009 1st

Li 2011 4th

Yang 2014 2nd

Perinetti 2014 2nd

Long 2013 6th

Long 2013 5th

Yang 2014 1st

Papageorgiou 2012 4th

Perinetti 2014 1st

Chen 2010 10th

Chen 2010 9th

Subtotal (I2=1%; PH0=0.015)

Zhou 2014 9th

Meta-analysis

Zhou 2014 1st

Scope: effectiveness

Yang 2014 4th

Ehsani 2014 1st

Ehsani 2014 4th

Zhou 2014 11th

Pandis 2014 1st

Chen 2010 11th

Scope: adverse effects

Long 2013 1st

Papadopoulos 2011 2nd

Subtotal (I2=0%; PH0=0.018)

0.07 (-0.21, 0.34)

2.45 (-4.56, 9.46)

0.32 (0.06, 0.58)

ΔSMD (95% CI)

-1.47 (-3.56, 0.61)

-1.38 (-2.70, -0.05)

1.16 (-0.16, 2.48)

0.44 (-0.29, 1.18)

-0.64 (-1.92, 0.64)

1.09 (0.41, 1.77)

-0.76 (-1.69, 0.17)

0.07 (-1.25, 1.38)

0.97 (-0.23, 2.17)

0.30 (-1.53, 2.14)

0.66 (-1.11, 2.43)

0.10 (-1.47, 1.68)

-0.06 (-1.63, 1.52)

0.10 (-0.60, 0.79)

0.30 (-2.05, 2.64)

-0.13 (-0.98, 0.72)

-0.49 (-1.76, 0.77)

-1.47 (-3.90, 0.96)

-0.25 (-1.88, 1.38)

-0.53 (-1.75, 0.68)

0.37 (-0.51, 1.25)

0.48 (-0.31, 1.28)

0.02 (-1.22, 1.25)

-0.51 (-1.59, 0.58)

-0.57 (-1.05, -0.10)

-3 -2 -1 -0.8 -0.5 -0.2 0 0.2 0.5 0.8 1 2 3

Difference in estimates between RCTs and non-RCTs

(subgroup analysis by research scope)

Inflated treatment effects in RCTs Inflated treatment effects in non-RCTs

large medium small trivial

Appendix F. Forest plot for the comparison of RCTs versus non-RCTs with subgroups according to the scope of component trials. Due to data

recoding, estimates on the right side of the forest plot indicate that RCTs show smaller treatment effects than non-RCTs. ΔSMD = difference

in standardized mean differences; RCT = randomized controlled trial; non-RCT = non-randomized controlled trial.

18

Appendix G. Results of Egger’s test for small-study effects.

Reporting bias (Egger's test)

Comparison Intercept (95% CI)

1. Randomized vs. non-randomized -0.91 (-2.16,0.35)

2. Prospective vs. retrospective -0.78 (-1.78,0.23)

3. Adequate vs. inadequate/unclear random sequence generation 0.17 (-1.48,1.82) CI, confidence interval. All P-values > 0.05

19

Appendix H. Results of the sensitivity analyses Original analysis Fixed-effect model Study design definition Largest MA from each SR Most precise 50%

∆SMD (95% CI)

P-value

∆SMD (95% CI)

P-value Change

∆SMD (95% CI)

P-value

∆SMD (95% CI)

P-value

∆SMD (95% CI)

P-value

1. Randomized vs. non-randomized (n=25)

0.07 (-0.21,0.34)

0.12 (-0.10,0.35)

- Dropped 14 MAs

0.19 (-0.16,0.53)

Dropped 12 MAs

0.12 (-0.22,0.45)

2. Prospective vs. retrospective (n=40)

-0.30 (-0.53,-0.06) *

-0.19 (-0.33,-0.05) **

Excluded Unclear from retrospective; dropped 5 MAs

-0.28 (-0.63,0.08)

Dropped 29 MAs

-0.11 (-0.45,0.24)

Dropped 20 MAs

-0.18 (-0.40,0.05)

3. Adequate vs. inadequate/unclear random sequence generation (n=25)

0.01 (-0.25,0.26)

0.00 (-0.15,0.15)

Excluded uRCTs; dropped 16 MAs

0.13 (-0.11,0.36)

Dropped 19 MAs

0.01 (-0.27,0.30)

Dropped 12 MAs

-0.03 (-0.24,0.17)

* P-value < 0.05; ** P-value < 0.01 ∆SMD, difference in standardized mean differences; CI, confidence interval; MA, meta-analysis; Unclear, non-randomized controlled clinical trial with unclear study design; uRCT, randomized controlled trial (unclear random sequence generation).

20

Appendix I. Changes from the PROSPERO protocol (CRD42014013767)

Minor changes from protocol:

(i) Types of study to be included: meta-analyses of only randomized trials no longer excluded, as

they were included in the comparison of “Adequate vs. inadequate/unclear random sequence

generation”.

(ii) Comparator(s)/controls / context: adopted a more detailed characterization of study designs

than the one that was initially included in the protocol.

(iii) Analyses of subgroups or subsets: comparisons originally planned for the sensitivity

analyses were moved to the main analyses of the paper (adequate vs. inadequate/unclear

random sequence generation).