Papageorgiou SN. Meta-analysis for orthodontists: Part I – How to choose effect measure and...

10
Meta-analysis for orthodontists: Part I – How to choose effect measure and statistical model Spyridon N. Papageorgiou 1,2 1 Department of Orthodontics; 2 Department of Oral Technology, School of Dentistry, University of Bonn, Bonn, Germany Systematic reviews ideally provide a comprehensive and unbiased summary of existing evidence from clinical studies, whilst meta-analysis combines the results of these studies to produce an overall estimate. Collectively, this makes them invaluable for clinical decision-making. Although the number of published systematic reviews and meta-analyses in orthodontics has increased, questions are often raised about their methodological soundness. In this primer, the first steps of meta-analysis are discussed, namely the choice of an effect measure to express the results of included studies, and the choice of a statistical model for the meta-analysis. Clinical orthodontic examples are given to explain the various options available, the thought process behind the choice between them and their interpretation. Key words: Effect measure, orthodontics, meta-analysis, fixed-effect, random-effects Received 2 March 2014; accepted 5 June 2014 Introduction Systematic reviews summarise qualitatively the results of multiple studies, providing the highest grade of evidence to develop clinical guidelines. Meta-analysis goes one step further, to the quantitative synthesis of studies in order to provide an overall summary. Although the number of published systematic reviews and meta-analyses in orthodontics has increased, questions about the metho- dological soundness of some have been raised. In this paper, the first steps of meta-analysis will be discussed in the context of: (1) the choice of an effect measure to express the results of included studies and (2) selection of an appropriate model for subsequent statistical synthesis. For readers not familiar with meta-analysis, a short description of the forest plot and of between-study heterogeneity will also be given. Meta-analyses will be covered that compare two groups; either an intervention group versus a placebo/control group or an intervention group versus another intervention group. Meta-analyses of data from a single group (i.e. event rates, means of groups, etc.), meta-analyses of diagnostic accuracy or network meta-analyses will not be covered. Furthermore, emphasis will be given on basic principles, which can be easily applied in the freely-available program Review Manager (RevMan) (Review Manager, 2012) available from the Cochrane Collaboration. It is beyond the scope of this article to analytically describe the stages of a systematic review and when it is sensible to perform a meta-analysis or not. Interested readers should refer to the Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Green, 2011) and to the PRISMA statement (Preferred Reporting Items for Systematic Reviews and Meta- Analyses) (Liberati et al., 2009), where further details can be found on how to respectively conduct and report systematic reviews and meta-analyses. Effect measures The first step in meta-analysis is to identify all relevant studies evaluating the same measure in order to directly compare and eventually synthesise them. There are a wide array of effect measures that can be used, but this paper will focus on five of the most common: (i) those for continuous outcomes (such as ANB angle): mean difference (MD) (also known as difference in the means) and standardised mean difference (SMD), (ii) those for binary (dichotomous) outcomes (such as bracket failure): odds ratio (OR), relative risk (RR) (also known as risk ratio) and risk difference (RD) (also known as absolute risk reduction). Continuous outcomes Effect measures. If all included studies measure the same continuous outcome and use the same measurement INVITATION TO SUBMIT Journal of Orthodontics, Vol. 41, 2014, 317–326 Address for correspondence: Spyridon N. Papageorgiou, DDS, Department of Orthodontics, School of Dentistry, University of Bonn, Welschnonnenstr. 17, 53111, Bonn, Germany. Email: [email protected] # 2014 British Orthodontic Society DOI 10.1179/1465313314Y.0000000111

Transcript of Papageorgiou SN. Meta-analysis for orthodontists: Part I – How to choose effect measure and...

Meta-analysis for orthodontists: Part I – How tochoose effect measure and statistical modelSpyridon N. Papageorgiou1,2

1Department of Orthodontics; 2Department of Oral Technology, School of Dentistry, University of Bonn, Bonn, Germany

Systematic reviews ideally provide a comprehensive and unbiased summary of existing evidence from clinical studies,

whilst meta-analysis combines the results of these studies to produce an overall estimate. Collectively, this makes them

invaluable for clinical decision-making. Although the number of published systematic reviews and meta-analyses in

orthodontics has increased, questions are often raised about their methodological soundness. In this primer, the first

steps of meta-analysis are discussed, namely the choice of an effect measure to express the results of included studies,

and the choice of a statistical model for the meta-analysis. Clinical orthodontic examples are given to explain the

various options available, the thought process behind the choice between them and their interpretation.

Key words: Effect measure, orthodontics, meta-analysis, fixed-effect, random-effects

Received 2 March 2014; accepted 5 June 2014

Introduction

Systematic reviews summarise qualitatively the results of

multiple studies, providing the highest grade of evidence

to develop clinical guidelines. Meta-analysis goes one step

further, to the quantitative synthesis of studies in order to

provide an overall summary. Although the number of

published systematic reviews and meta-analyses in

orthodontics has increased, questions about the metho-

dological soundness of some have been raised.

In this paper, the first steps of meta-analysis will be

discussed in the context of: (1) the choice of an effect

measure to express the results of included studies and (2)

selection of an appropriate model for subsequent statistical

synthesis. For readers not familiar with meta-analysis, a

short description of the forest plot and of between-study

heterogeneity will also be given. Meta-analyses will be

covered that compare two groups; either an intervention

group versus a placebo/control group or an intervention

group versus another intervention group. Meta-analyses of

data from a single group (i.e. event rates, means of groups,

etc.), meta-analyses of diagnostic accuracy or network

meta-analyses will not be covered. Furthermore, emphasis

will be given on basic principles, which can be easily

applied in the freely-available program Review Manager

(RevMan) (Review Manager, 2012) available from the

Cochrane Collaboration.

It is beyond the scope of this article to analytically

describe the stages of a systematic review and when it is

sensible to perform a meta-analysis or not. Interested

readers should refer to the Cochrane Handbook for

Systematic Reviews of Interventions (Higgins and

Green, 2011) and to the PRISMA statement (Preferred

Reporting Items for Systematic Reviews and Meta-

Analyses) (Liberati et al., 2009), where further details

can be found on how to respectively conduct and reportsystematic reviews and meta-analyses.

Effect measuresThe first step in meta-analysis is to identify all relevant

studies evaluating the same measure in order to directlycompare and eventually synthesise them. There are a

wide array of effect measures that can be used, but this

paper will focus on five of the most common:

(i) those for continuous outcomes (such as ANBangle): mean difference (MD) (also known as

difference in the means) and standardised mean

difference (SMD),

(ii) those for binary (dichotomous) outcomes (such as

bracket failure): odds ratio (OR), relative risk

(RR) (also known as risk ratio) and risk difference

(RD) (also known as absolute risk reduction).

Continuous outcomes

Effect measures. If all included studies measure the same

continuous outcome and use the same measurement

INVITATION TO SUBMIT Journal of Orthodontics, Vol. 41, 2014, 317–326

Address for correspondence: Spyridon N. Papageorgiou, DDS,

Department of Orthodontics, School of Dentistry, University of

Bonn, Welschnonnenstr. 17, 53111, Bonn, Germany.

Email: [email protected]# 2014 British Orthodontic Society DOI 10.1179/1465313314Y.0000000111

scale, then the obvious effect measure to use is the MD,

which is the mean of the intervention group minus

the mean of the control group in each study (or vice

versa):

MD~Meanintervention{Meancontrol,

and has the same units as the outcome measured.

If all included studies measure the same continuous

outcome, but do not use the same measurement scale,

then there are two options. The first is to convert the

data, so that all results are on the same scale. If this is

not possible, then the SMD can be used. This is the MD

standardised (adjusted) by the units of the standard

deviation. There are many standardised effect measures

(Cohen’s d, Hedge’s g, Glass’s d, etc.) (Borenstein et al.,

2008), but this paper will focus on Hedge’s g, which

includes an adjustment for small sample bias and is used

by default in RevMan. SMD does not have the same

units as any of the original scales, but is measured in

standardised measurement units. Both the MD and

SMD are absolute comparative measures and, if MD 5

0 or SMD 5 0, then there is no difference between the

two compared groups.

SMD~Meanintervention{Meanintervention

Pooled standard deviation of both groups

Interpretation. In order to illustrate the differences

between these two measures, suppose we have a meta-

analysis of four randomised controlled trials, where

patients are allocated either to a bite-wafer group or

to a non-wafer group during debonding and perception of

pain is measured by means of a questionnaire. Let us say

that all the trials measured pain with a Visual Analogue

Scale (VAS) (Huskisson, 1974), where a horizontal

100 mm line is plotted with ‘no pain’ on the left and

‘worst pain imaginable’ on the right and the patients

quantify their pain experience on this line. If we find an

MDwafer–control of 210 mm, this means that ‘on average,

patients allocated to the wafer group experienced less pain

during debonding by 10 mm in the VAS scale compared to

patients allocated to the control group’ or alternatively ‘on

average, the bite-wafer reduced pain during debonding by

10 mm in the VAS scale’. You will notice the word ‘on

average’. This means that we are talking about differences

in means (Figure 1) and that not all the wafer patients

experienced less pain than the control patients (hence the

overlap between the curves).

However, if two out of the four trials measured pain

with a VAS scale (in mm) and the other two trials

measured pain with a Likert scale (Likert, 1932) (where

pain is measured in a 5-level scale with a score up to 5)

then the MD cannot be used. In this case, the SMD can

be used for the meta-analysis. If by combining all

four trials using the SMD as a measure, we find a

SMDwafer–control of 20.5, this means that ‘on average,

patients allocated to the wafer group experienced less pain

during debonding by 0.5 standardised measurement units

compared to the patients allocated to the control group’.

One will notice that the interpretation of the effect

measure will depend upon several factors, including the

direction of the scale (higher VAS 5 more pain), the

length of the scale (for VAS, 0 to 100 mm), the minimally

important difference (clinical judgement) and if the

outcome is beneficial (for example, mandibular growth)

or harmful (for example, pain). The interpretation of

SMD is not straightforward and it often makes sense to

back-transform the resulting SMD in one of the original

scales to communicate the results (here, either the VAS or

the Likert scale).

Data needed. For any continuous outcome, indepen-

dently of whether the MD or the SMD is to be used, the

following data are needed: the sample size, the mean and a

measure of variation (standard deviation or standard

error) for both the intervention and the control group.

Many other data formats can be used for data input in

meta-analysis (P, t value, median, interquartile range,

MD, ratio of means, ratio of geometric means), but these

go beyond the scope of this article. It is important also

to know that the SMD does not correct for different

directions among scales. If the used scales run in different

directions, you have to alter the direction of the results, by

multiplying with 21. Finally, conventional meta-analysis

methods rely on the broad assumption that the results are

normally distributed. If concrete indications of skewness

Figure 1 Illustration of the mean difference based on thedescribed fictional bite-wafer trial. The left distributiondepicts the pain experienced during debonding from thepatients in the wafer group and the right distribution depictsthe pain experienced during debonding from the patients inthe control group

318 Papageorgiou Invitation to Submit JO December 2014

are found, it is advisable to get formal statistical advice

before proceeding.

Binary outcomes

Effect measures. For binary outcomes, the three most

widely-used measures are the OR, the RR, and the RD. In

order to understand these three measures, we need to

introduce the concepts of odds and risks. Suppose we have

a fictional randomised controlled trial, where 100 patients

with a class II malocclusion and increased overjet are

allocated either to an early-phase Twin Block treatment orto an untreated control group (to then be treated com-

prehensively later). The measured outcome is the inci-

dence of new incisal trauma, measured as trauma or no

trauma. We can see the results of this trial in the form of a

262 contingency table, which provides the standard data

needed to input binary outcomes for meta-analysis

(Table 1) and includes the events and sample size of

each group. According to the results, 12/50 (24%) of thetreated patients and 24/50 (48%) of the control patients

experienced some kind of incisal trauma during the

follow-up.

If we express the trauma experience in both groups

with odds, then:

N The odds of trauma in the Twin Block group are:

OddsTwin Block~EventsTwin Block

No-eventsTwin Block

~12

38~0:316 or 31:6%

N The odds of trauma in the control group are:

Oddscontrol~Eventscontrol

No-eventscontrol~

24

26~0:923 or 92:3%

N And the OR is simply the ratio of the odds for the two

groups:

OR~OddsTwin Block

Oddscontrol~

0:316

0:923~0:342

Expressed as risks (or probabilities), the same results

will be:

N The risk of trauma in the Twin Block group is:

RiskTwin Block~EventsTwin Block

SampleTwin Block

~12

50~0:240 or 24%

N The risk of trauma in the control group is:

Riskcontrol~Eventscontrol

Samplecontrol~

24

50~0:480 or 48%

N And the RR is simply the ratio of the risks for the two

groups:

RR~RiskTwin Block

Riskcontrol~

0:240

0:480~0:500

N Finally, the RD is the difference of the risks for the

two groups:

RD~RiskTwin Block{Riskcontrol

~0:240{0:480~{0:240 or {24:0%

Another illustrative measure for binary outcomes is the

Number Needed to Treat (NNT); however, this cannot

be directly used in the meta-analysis. Rather, the results

from the other measures have to be converted in the

NNT. NNT gives the number of patients you would have

to treat with the intervention (instead of leaving them

untreated) to prevent one additional event and is there-fore, very helpful for the clinical translation of results.

N The NNT is traditionally calculated as the reciprocal

of the RD, so in our example:

NNT~1

RD~

1

0:240~4:17,

which is always rounded up to the next number, so 5:

However, this assumes a constant RR (meaning that

the benefits/harms of a treatment are the same among

both low-risk and high-risk patients); but this is a bigassumption (Sackett et al., 1997). The aim of evidence-

based medicine is to individualise existing clinical knowl-

edge in order to satisfy the needs and preferences of each

individual patient (Glasziou et al., 1998).

Table 1 Fictional data for a randomised controlled trial, where 100 patients are allocated to either a Twin Block treatment or to an

untreated control group in order to avoid incisal trauma

Event (trauma) No event (no trauma) Total

Intervention (Twin Block) 12 38 50

Control (untreated) 24 26 50

Total 36 64 50

JO December 2014 Invitation to Submit Choice of effect and model in meta-analysis 319

N Therefore, Furukawa et al. (2002) proposed the

calculation of an individualised NNT based on the

Patient’s Expected Event Rate (PEER) and either

the RR or the OR:

NNT~1

PEER|(1{RR)or

NNT~1{PEERzOR|PEER

PEER|(1{OR)|(1{PEER)

Therefore, suppose we want to calculate the NNT for

two patient examples: (a) for a quiet late-adolescent

patient with increased overjet and a PEER for trauma of

0.3 (low-risk patient) and (b) for a rather energetic pre-

adolescent patient, who plays rugby and has a PEER for

trauma of 0.6 (high-risk patient).

N For the low-risk group we would have:

NNTlow risk~1

0:3|(1{0:5)~

1

0:3|0:5

~1

0:15~6:67 rounded up to 7:

N And for the high-risk group we would have:

NNThigh risk~1

0:6|(1{0:5)~

1

0:6|0:5

~1

0:3~3:33, rounded up to 4:

Interpretation. The interpretation of the four measures is

quite different. First of all, one must bear in mind that the

OR and the RR are relative measures, while the RD and

NNT are absolute measures. This means that for the

interpretation of the OR and the RR, we need to subtract

the calculated value from the no-effect value, which is OR

5 1 or RR 5 1:

N OR{1~0:342{1:000~{0:658 or 265.8%

This means that ‘the odds of trauma among patients

treated with Twin Block were reduced by 65.8% compared

to the odds among the control patients’ or ‘Twin Block

treatment reduced the odds of trauma by 65.8% of the

odds in untreated patients’.

N RR{1~0:500{1:000~{0:500 or 250.0%

This means that ‘the trauma risk of patients treated with

Twin Block was reduced by 50% compared to the risk of

untreated patients’ or that ‘Twin Block treatment reduced

the risk of trauma by 50% of the trauma risk in untreated

patients’.

For the interpretation of RD, no subtraction is

needed, so the interpretation would be that:

N ‘on average, patients treated with Twin Block had 24%

lower risk of trauma’ or ‘Twin Block treatment reduced

the trauma risk by 24 percentage points’

Finally, the interpretation of the individualised NNTs, is

that:

N ‘one would have to treat four high-risk patients or seven

low-risk patients with increased overjet with a Twin

Block, instead of leaving them untreated, in order to

prevent one new case of incisal trauma’.

How to choose a summary measure? Ideally, a summary

measure should have consistency, desirable mathematical

properties and ease of interpretation (Higgins and Green,

2011). A summary of the introduced effect measures is

provided in Table 2. The choice of a summary measure

for the meta-analysis of continuous outcomes is straight-

forward. In contrast, the choice of a measure for binary

outcomes has been the focal point of much discussion.

However, the selection of an effect measure for meta-

analysis should be based on empirical evidence (Walter,

2000) and not solely on theoretical or statistical

grounds. According to existing empirical studies

(Engels et al., 2000; Deeks, 2002; Furukawa et al.,

2002), the following recommendations can be made

about the choice of effect measure for binary outcomes:

(i) the use of RD should be avoided,

(ii) the RR for harm (i.e. incidence of incisal trauma)

is preferable to the use of RR for benefit

(avoidance of incisal trauma),

(iii) a convenient approach is to use the RR for the

statistical pooling and then use the patient’s

expected event rate to individualise the NNT for

application into practice.

Many researchers prefer to use OR as a measure in meta-

analysis on the grounds that it is often reported as

adjusted OR resulting from a logistic regression. How-

ever, there exist also several methods to calculate adjusted

RRs [Mantel–Haenszel method, log-binomial regres-

sion, Poisson regression with robust standard error, etc.

(McNutt et al., 2003; Greenland, 2004; Knol, 2012)].

Adding to that, statisticians argue that the OR might not

be the most suitable measure to express the results of

interventional trials or systematic reviews of such trials

(Fleiss, 1981; Sinclair and Braken, 1994; Feinstein, 1999).

For case-control studies and logistic regression, the use of

ORs is inevitable, as the OR is the best estimate of RR

that can be obtained (Deeks, 1998). However, the use of

OR should be avoided in cohort or randomised studies

320 Papageorgiou Invitation to Submit JO December 2014

(Sackett et al., 1996; Knol, 2012). Grimes and Shulz (2008)

advocate the use of RRs for epidemiological studies

together with the corresponding Confidence Intervals

(CIs), whenever possible. In both clinical trials and sys-

tematic reviews of trials there is no reason for compromis-

ing interpretation by reporting results in odds rather than

risks (Sinclair and Bracken, 1994; Sackett et al., 1996).

Finally, it is crucial for the reader to understand that every

estimate (continuous or binary) from any study is

uncertain, and should be always presented with a CI.

Reading a forest plotBefore advancing from expressing all studies in the same

measure to the statistical pooling of measures, we need

to take a look at the forest plot, which is the standard

graphical output of a meta-analysis. A forest plot from a

Cochrane systematic review (Thiruvenkatachari et al.,

2013) is replicated in Figure 2, highlighting its various

components. In this meta-analysis, three randomised

controlled studies are pooled, which allocated patients

to functional appliance treatment or to an untreated

control group and measured the patients’ ANB angle

post-treatment. As can be seen, the raw data from each

study (left) are re-expressed with the effect measure

(right) and then all effect measures are pooled across

studies (middle). This summary estimate is depicted as a

diamond at the bottom of the forest plot. In this data

synthesis, weights are assigned to the studies, according

to the model that is used (here fixed-effect model):

Table 2 Summary of the three assessed effect measures for binary outcomes

Odds ratio (OR) Relative risk (RR) Risk difference (RD)

Characteristics Definition Odda/Oddb Riska/Riskb Riska–Riskb

Range 1/‘ to ‘ 0 to ‘ 21 to 1

No-effect value 1 1 0

Consistency (similar

despite variation in

baseline risk)

(z) Yes (z) Yes (2) No

Mathematical

properties

Bounded (z) No (2) Yes (as risks range

0 to 1); switching event

and no-event can avoid

this problem

(2) Yes (risks

range 0 to 1)

Collapsibility (calculated

effect is unaffected

by adjustments)

(2) No (z) Yes (2) No

Constrained predictions (2) Yes; absolute benefit

when control group event

rate is 0 and 100%

– –

Relatively homogenous (z) Yes (z) Yes (2) Less homogenous

than OR and RR

(Engels et al., 2000;

Deeks, 2002)

Simple variance estimator (z) Yes (z) Yes (2) No

Symmetry (Effectevent

and effectno-event are

symmetrical)

(z) Yes (reciprocal) (2) No (z) Yes (change of sign)

Other (2) Does not approximate

well the RR with event

rates greater than 10–15%

(Altman et al., 1998)

(2) Can behave paradoxically

with chained or conditional

probabilities, in which case

the RR is more transparent

(Newcombe, 2006)

– –

Interpretation Easily understood (2) No; often misinterpreted

as RR by authors and readers

(Sackett et al., 1996)

(z) Yes (z) Yes

JO December 2014 Invitation to Submit Choice of effect and model in meta-analysis 321

Weightfixed�effect~1

variance

Then the pooled summary estimate is:

Pooled effect~sum of (estimate|weight)

sum of weights

Heterogeneity between studiesAs we can see from the forest plot (Figure 2), even

though the studies have similar intervention and control

groups, the effect estimates (here the MDs) vary across

them, a phenomenon termed between-study heterogeneity.Sources for this variation between the study results can

include clinical diversity (regarding patients, interventions

or outcomes), methodological differences (variation in

study design, conduct, attrition, etc.) or statistical hetero-

geneity (variability that could be expected entirely from

chance). Heterogeneity can be ‘eyeballed’ from the overlap

of the 95% CIs across studies. If the 95% CIs of the studies

have poor overlap, this generally indicates the presence

of heterogeneity. There are more sophisticated ways to

identify and quantify heterogeneity (Higgins and Green,

2011), but they will not be discussed here. However, it is

important to know that heterogeneity plays a major role

in meta-analysis. As can be logically expected, extreme

heterogeneity between studies can make them incompatible

with each other. When extreme heterogeneity exists, one can

choose to:

1. check the data again (for mistakes in the data

extraction or data input);

2. not to pool at all the studies;

3. ignore the heterogeneity (not advisable);

4. exclude outlying studies from the analysis (not

advisable);

5. change effect measure (not advisable);

6. encompass heterogeneity with a random-effects model;

Figure 2 Example of a meta-analysis forest plot including its various components. CI, confidence interval; N, sample size; SD,standard deviation; UK, United Kingdom; H0, null hypothesis; MD, mean difference; SMD, standardised mean difference; RD, riskdifference; RR, risk ratio; OR, odds ratio

322 Papageorgiou Invitation to Submit JO December 2014

7. explore sources of heterogeneity that can explain thevariability (subgroup analysis, meta-regression, etc).

Finally, the role of heterogeneity in the meta-analysis

depends heavily on the statistical model that is used.

Statistical model for the meta-analysisThere are two main models used in meta-analysis: the

fixed-effect model and the random-effects model. These

two models rely on different assumptions and are

fundamentally different, both in the synthesis and their

interpretation.

A fixed-effect model is based on the assumption that

every study is evaluating a common (‘fixed’) treatment

effect. This means that the effect of treatment is the same in

all studies. To put it simply, the only difference betweenthe identified studies is the random error (sampling

variation) and if all studies had an infinitely large sample

of patients, they would also yield identical results. The

pooled diamond in the fixed-effect meta-analysis repre-

sents this one ‘fixed’ treatment effect and the 95% CI

represents how uncertain we are about the estimate. In this

model our goal is to compute the common effect measure

for the identified population, and not to generalise to otherpopulations. It should be clear, however, that this situation

is relatively rare. The vast majority of cases will more

closely resemble those discussed immediately below.

A random-effects model does not assume that one

‘fixed’ treatment effect really exists. Under this model,

proposed by DerSimonian and Laird (DerSimonian and

Laird, 1986), the true treatment effects in the individual

studies may be different from each other. Between-studyheterogeneity is incorporated into the weight assigned to

each study:

Weightrandom-effects~1

variancezheterogeneity parameter

In this case the meta-analysis does not calculate a singleestimate, but rather a distribution of treatment effects

(usually assumed to be normally-distributed). That means

that the diamond in the random-effects meta-analysis

represents the mean and the dispersion of the different

treatment effects and not a single ‘fixed’ treatment effect.

Additionally, 95% predictive intervals should always be

calculated for random-effects meta-analyses (Higgins

et al., 2009). These predictive intervals incorporate iden-tified heterogeneity and answer the question: ‘based on

existing evidence, what effect can I expect my treatment

to have in a future application?’ by providing a range of

plausible effects. However, the calculation of predictive

intervals for the random-effects model is not currently

available in RevMan (possibly to be incorporated in a

forthcoming release) and one must use another statistical

package or perform it manually.

There are many methods to calculate and pool effect

measures in RevMan. For the fixed-effect model, the

inverse variance method is a straightforward method that

can be used generally in most situations by weighing

studies according to their precision (the reciprocal of

variance). The Mantel–Haenszel method (Mantel and

Haenszel, 1959; Greenland and Robins, 1985) is a good

method for reviews with few events or small studies

(default method in RevMan). For ORs, there is also the

Peto method (Yusuf et al., 1985), which is a good method

for studies with few events, small effects (OR close to 1)

and similar numbers in the experimental and control

group. For the random-effects model there are two

options (Mantel–Haenszel and inverse variance), but the

difference between them is trivial.

Comparing the results of the two models, the fixed-

effect model gives almost always, narrower CIs for the

pooled estimate than the random-effects model.

Additionally, as under the fixed-effect assumption the

random error is the only source of variability, large and

more precise studies are given more weight with this

model than with the random-effects model. As stated

before, the results of the two models are heavily

influenced by the existence of heterogeneity. When no

heterogeneity is identified, both models give relatively

similar results. However, when heterogeneity is present,

then a random-effects model is much more conservative

compared to a fixed-effect model (wider CIs and larger P

values). The two models also differ, when the meta-

analysis results are related to study size (Papageorgiou

et al., 2014a), as the random-effects model gives relatively

more weight to smaller studies.

In practice, people tend to interpret both models simi-

larly, which is wrong. To illustrate the different interpreta-

tion of the two models, the abovementioned meta-analysis

from Figure 2 is replicated in a simplified form in Figure 3,

both with fixed-effect and random-effects models. The

pooled diamonds for the two meta-analyses are graphi-

cally ‘augmented’ to illustrate the thought process behind

them. In the upper meta-analysis, where a fixed-effect

model is used, we would interpret the results as follows:

(i) functional appliance treatment has exactly the

same effect on the ANB angle in every included

clinical setting (one ‘fixed’ distribution),

(ii) our best estimate of this effect is an average

reduction in ANB angle of 0.89u compared to

untreated patients.

In the lower analysis, where a random-effects model is

used, our interpretation would be the following:

JO December 2014 Invitation to Submit Choice of effect and model in meta-analysis 323

(i) the effect of functional appliance treatment varies

among the included clinical settings (different

distributions),

(ii) the average of the various effects is a reduction in

ANB angle of 1.35u compared to untreated patients.

Comparing the two models, we see that with the random-

effects model:

(i) less weight is given to the biggest study (Florida

1998 study) than with the fixed-effect model,

(ii) the 95% CI of the overall estimate is wider than

with the fixed-effect model,

(iii) the MD moves more to the left compared to the

fixed-effect model, as the imprecise UK 2009

study influences the results more and finally

(iv) the test for the overall effect is now less significant.

The use of a fixed-effect model is not unproblematic, as

many researchers argue that heterogeneity is inevitable

in a meta-analysis. Especially for orthodontic clinical

research, a random-effects model can be in most cases

easier to justify than a fixed-effect model, as factors like

different populations, different chronological/skeletal age

and growth potential, different experimental settings,

clinician preferences and experience, appliance design,

Figure 3 Forest plot of the meta-analysis from Figure 2 with either a fixed-effect model (above) or with at random-effectsmodel. IV, inverse variance; CI, confidence interval

324 Papageorgiou Invitation to Submit JO December 2014

patient compliance and biological response to treatment

might come into play.

A common mistake to be avoided is to test for

heterogeneity and then select a fixed-effect or a random-

effects model according to the test results (Borenstein

et al., 2008; Higgins and Green, 2011; Papageorgiou,

2013; Papageorgiou et al., 2014b). This is problematic

for a number of reasons. The choice of model should be

pre-specified a priori (in the protocol) if possible, by

contemplating the review question you have asked, the

studies you intend to include, and if logically you expect

them to be very diverse. You either ‘believe’ in hetero-

geneity a priori or not at all. As an alternative, one

can apply and present both fixed and random-effects

analyses and compare their results and the role of

heterogeneity. This is also often performed to check the

robustness of the results (sensitivity analysis). How-

ever, the interpretation for each model is different, as

discussed earlier, and this could confuse the reader.

The biggest limitation of the fixed-effect model is that

existing means for the identification of heterogeneity are

very imprecise (Hardy and Thompson, 1998), meaning

that heterogeneity is very likely to exist, whether we find it

or not. In this case ignoring heterogeneity might lead to

false positive results and threatens the validity of the meta-

analysis. For example, an empirical study of meta-analyses

on psychology found that the calculated 95% CIs from the

fixed-effect model were considerably narrower than they

actually were (Schmidt et al., 2009). It is generally ‘safer’ to

use a random-effects model, as it gives similar results to the

fixed effect model in case of homogeneity, but deals much

better with the heterogeneity. Finally, a random-effects

model is also better at the generalizability of the meta-

analysis results, as it covers more clinical scenarios and

therefore, is more useful in evidence-based decision-

making in orthodontics.

ConclusionsRegarding the choice of effect measure:

N For continuous outcomes either the mean difference

or the standardised mean difference can be used,

according to the measurement scale used.

N For binary outcomes, the use of relative risk seems to

be preferable to the use of OR or RD based on

empirical and epidemiological grounds. Re-expressing

the results into the individualised number needed to

treat aids in the clinical translation.

Regarding the choice of a statistical model:

N The choice of model should be pre-specified, where

possible.

N A fixed-effect model is by definition difficult to apply in

orthodontic clinical research, as many factors can result

in between-study variability and absolutely controlled

experimental conditions are almost impossible.

N A random-effects model is easier to justify clinically in

orthodontics and is generally more conservative thana fixed-effect model, reducing the risk of spurious

findings.

N The results of a random-effects model must be

interpreted accordingly, as the average of the various

treatment effects among the included trials and, if

possible, be accompanied by 95% predictive intervals.

Disclaimer StatementsContributors Mr Papageorgiou wrote and revised the

first manuscript and is the guarantor.

Funding None

Conflicts of interest None

Ethics approval No ethical approval needed.

Acknowledgements

I would like to thank Martyn Cobourne for his helpfulcomments on the earlier version of this manuscript.

ReferencesAltman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when events

are common. BMJ 1998; 317: 1318.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to Meta-

analysis. Chichester: John Wiley & Sons. 2008.

Deeks J. When can odds ratios mislead? Odds ratios should be used only in case-

control studies and logistic regression analyses. BMJ 1998; 317: 1155–1156.

Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of

clinical trials with binary outcomes. Stat Med 2002; 21: 1575–1600.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials

1986; 7: 177–188.

Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical

significance in meta-analysis: an empirical study of 125 meta-analyses. Stat

Med 2000; 19: 1707–1728.

Feinstein AR. Indexes of contrast and quantitative significance for comparisons

of two groups. Stat Med 1999; 18: 2557–2581.

Fleiss J. Statistical Methods for Rates and Proportions. 2nd edn. New York:

John Wiley and sons. 1981.

Furukawa TA, Guyatt GH, Griffith LE. Can we individualize the ’number

needed to treat’? An empirical study of summary effect measures in meta-

analyses. Int J Epidemiol 2002; 31: 72–76.

Glasziou P, Guyatt GH, Dans AL, Dans LF, Straus S, Sackett DL. Applying the

results of trials and systematic reviews to individual patients. ACP J Club

1998; 129: A15–A16.

Greenland S. Model-based estimation of relative risks and other epidemiologic

measures in studies of common outcomes and in case-control studies. Am

J Epidemiol 2004; 160: 301–305.

Greenland S, Robins JM. Estimation of a common effect parameter from sparse

follow-up data. Biometrics 1985; 41: 55–68.

Grimes DA, Schulz KF. Making sense of odds and odds ratios. Obstet Gynecol

2008; 111: 423–426.

Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-

analysis. Stat Med 1998; 17: 841–856.

JO December 2014 Invitation to Submit Choice of effect and model in meta-analysis 325

Higgins JP, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects

meta-analysis. J R Stat Soc Ser A Stat Soc 2009; 172: 137–159.

Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of

Interventions Version 5.1.0 (updated March 2011). The Cochrane

Collaboration, 2011. Available from: www.cochrane-handbook.org

Huskisson EC. Measurement of pain. Lancet 1974; 2: 1127–1131.

Knol MJ. Down with odds ratios: risk ratios in cohort studies and randomised

clinical trials. Ned Tijdschr Geneeskd 2012; 156: A4775.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al.

The PRISMA statement for reporting systematic reviews and meta-

analyses of studies that evaluate health care interventions: explanation and

elaboration. J Clin Epidemiol 2009; 62: e1–e34.

Likert R. A technique for the measurement of attitudes. Arch Psychol 1932; 140:

1–55.

Mantel N, Haenszel W. Statistical aspects of the analysis of data from

retrospective studies of disease. J Natl Cancer Inst 1959; 22: 719–748.

McNutt LA, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies

and clinical trials of common outcomes. Am J Epidemiol 2003; 157: 940–943.

Newcombe RG. A deficiency of the odds ratio as a measure of effect size. Stat

Med 2006; 25: 4235–4240.

Papageorgiou SN. Meta-analysis 101. Am J Orthod Dentofacial Orthop 2013;

144: 497.

Papageorgiou SN, Antonoglou G, Tsiranidou E, Jepsen S, Jager A. Bias and small-

study effects influence treatment effect estimates: a meta-epidemiological study

in oral medicine. J Clin Epidemiol 2014a. Epub ahead of print: DOI: 10.1016/

j.jclinepi.2014.04.002.

Papageorgiou SN, Papadopoulos MA, Athanasiou AE. Reporting character-

istics of meta-analyses in orthodontics: methodological assessment and

statistical recommendations. Eur J Orthod 2014b; 36: 74–85.

Review Manager (RevMan) [Computer program]. Version 5.2. Copenhagen:

The Nordic Cochrane Centre, The Cochrane Collaboration, 2012.

Sackett DL, Deeks JJ, Altman D. Down with odds ratios! Evidence-Based Med

1996; 1: 164–167.

Sacket DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-Based

Medicine: How to Practice and Teach EBM. New York: Churchill

Livingstone. 1997.

Schmidt FL, Oh IS, Hayes TL. Fixed- versus random-effects models in meta-

analysis: model properties and an empirical comparison of differences in

results. Br J Math Stat Psychol 2009; 62: 97–128.

Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses

of randomized trials. J Clin Epidemiol 1994; 47: 881–90.

Thiruvenkatachari B, Harrison JE, Worthington HV, O’Brien KD. Orthodontic

treatment for prominent upper front teeth (Class II malocclusion) in

children. Cochrane Database Syst Rev 2013; 11: CD003452.

Walter SD. Choice of effect measure for epidemiological data. J Clin Epidemiol

2000; 53: 931–939.

Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after

myocardial infarction: an overview of the randomised trials. Prog

Cardiovasc Dis 1985; 27: 335–371.

326 Papageorgiou Invitation to Submit JO December 2014