Developing methodologies for evaluating community-wide health promotion

10
HEALTH PROMOTION INTERNATIONAL © Oxford University Press 1996 Vol. 11, No. 3 Printed in Great Britain Developing methodologies for evaluating community-wide health promotion CART Project Team: ROBERT SANSON-FISHER, SALLY REDMAN, LYNNE HANCOCK, STEPHEN HALPIN, PHILIP CLARKE, MARGOT SCHOFTELD, ROBERT BURTON, MICHAEL HENSLEY, ROBERT GIBBERD, ALEXANDER REID and RAOUL WALSH University of Newcastle, Australia AFAF GIRGIS, LOUISE BURTON, ANN McCLINTOCK NSW Cancer Council, Australia ROBERT CARTER Australian Institute of Health ALLAN DONNER University of Western Ontario, Canada SYLVAN GREEN National Cancer Institute, United States SUMMARY There has been growing recognition that health promo- tion programs which target whole communities are more likely to be effective in changing health behaviour. However, studies evaluating the impact of community- wide health promotion programs rarely use adequate methodology. Randomised control trials where multiple whole communities are randomly assigned to control and intervention groups are optimum if evaluators hope to validly attribute changes in health behaviour to the intervention. However, such trials present a number of difficulties including cost and feasibility limitations and the evolving nature of statistical techniques. This paper proposes applying a fairly well-accepted phased evalua- tion approach to the evaluation of community participa- tion programs, using three defined phases. Phase 1 consists of small-scale studies to develop the measures and assess acceptability and feasibility of the interven- tion; Phase 2 consists of studies in a small number of communities designed to trial the intervention in the real world; Phase 3 studies use an appropriate number of entire communities to provide valid evidence of efficacy of the intervention. It is suggested that criteria be resolved to identify adequate studies at each stage and that advantages and limitations of Phase 1 and 2 studies be clearly identified. The paper describes the major design, sampling and analysis considerations for a Phase 3 study. Key words: community; evaluation; health promotion INTRODUCTION Community-wide interventions attempt to imple- ment changes which will simultaneously affect many individuals (Dixon, 1989; Edinburgh Research Unit in Health and Behavioural 227 by guest on July 6, 2011 heapro.oxfordjournals.org Downloaded from

Transcript of Developing methodologies for evaluating community-wide health promotion

HEALTH PROMOTION INTERNATIONAL© Oxford University Press 1996

Vol. 11, No. 3Printed in Great Britain

Developing methodologies for evaluating community-widehealth promotion

CART Project Team:

ROBERT SANSON-FISHER, SALLY REDMAN, LYNNE HANCOCK,STEPHEN HALPIN, PHILIP CLARKE, MARGOT SCHOFTELD, ROBERTBURTON, MICHAEL HENSLEY, ROBERT GIBBERD, ALEXANDER REIDand RAOUL WALSHUniversity of Newcastle, Australia

AFAF GIRGIS, LOUISE BURTON, ANN McCLINTOCKNSW Cancer Council, Australia

ROBERT CARTERAustralian Institute of Health

ALLAN DONNERUniversity of Western Ontario, Canada

SYLVAN GREENNational Cancer Institute, United States

SUMMARYThere has been growing recognition that health promo-tion programs which target whole communities are morelikely to be effective in changing health behaviour.However, studies evaluating the impact of community-wide health promotion programs rarely use adequatemethodology. Randomised control trials where multiplewhole communities are randomly assigned to control andintervention groups are optimum if evaluators hope tovalidly attribute changes in health behaviour to theintervention. However, such trials present a number ofdifficulties including cost and feasibility limitations andthe evolving nature of statistical techniques. This paperproposes applying a fairly well-accepted phased evalua-tion approach to the evaluation of community participa-

tion programs, using three defined phases. Phase 1consists of small-scale studies to develop the measuresand assess acceptability and feasibility of the interven-tion; Phase 2 consists of studies in a small number ofcommunities designed to trial the intervention in the realworld; Phase 3 studies use an appropriate number ofentire communities to provide valid evidence of efficacyof the intervention. It is suggested that criteria beresolved to identify adequate studies at each stage andthat advantages and limitations of Phase 1 and 2 studiesbe clearly identified. The paper describes the majordesign, sampling and analysis considerations for aPhase 3 study.

Key words: community; evaluation; health promotion

INTRODUCTION

Community-wide interventions attempt to imple-ment changes which will simultaneously affect

many individuals (Dixon, 1989; EdinburghResearch Unit in Health and Behavioural

227

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

228 R. Sanson-Fisher et al.

Change, 1989; Tones et al., 1990; Green andKreuter, 1991). There are several reasons whysuch programs have potential to change healthbehaviour: individuals are more likely to adopt ahealth behaviour if there is social support forchange (Pomerleau et al., 1978); action at thecommunity level allows structural changes, suchas modifying cost or availability of resources,services or products (Rothman, 1968); concurrentdelivery of the same message from several sourcescan increase the likelihood of consumer accep-tance (Rogers, 1971; Kottke et al., 1988); andempowerment gained through the opportunityfor active community engagement in the inter-vention process can increase the probability thatchange will occur (Pomerleau et al., 1978; WHO,1986).

Despite the potential advantages of commu-nity-wide programs, debate continues about theireffectiveness in changing health behaviour (Far-quhar, 1978; Dixon, 1989). In part, this is attri-butable to lack of consensus about the role of,and optimal strategies for, the evaluation of suchprograms. The aims of this paper are to describesome of the methodological difficulties in evalu-ating community-wide programs and to proposesome strategies to improve future evaluations.

WHY ARE COMMUNITY-WIDEPROGRAMS DIFFICULT TO EVALUATE?

Any evaluation study should be designed so thatthe results will be believed by the scientific com-munity. Studies should meet minimal methodo-logical criteria, such as use of valid and reliableoutcome measures and adequate response rates.However, the most difficult methodological issuefor evaluations of community-wide interventionsis the establishment of adequate control groups.For all health interventions, a randomised con-trolled trial is accepted as the optimum scientificdesign. In such trials, individuals are randomlyallocated to control or treatment groups andimpact of treatment is assessed by comparingoutcomes between groups. The strength ofrandom allocation is the reduction in potentialconfounding due to the equal distribution ofknown and unknown confounders between theintervention and control groups. Any differencesbetween the two groups at follow-up can beattributed to the intervention.

When community-wide treatments are evalu-ated, communities rather than individuals must

be assigned to control and treatment groups. Ithas long been recognised that such evaluationsmust assign communities to each group. Studydesigns which involve only one or two treatmentcommunities cannot exclude the possibility thatchanges in behaviour are due to factors otherthan the intervention (Koepsell et al., 1991).While studies like North Karelia and Stanfordwere pioneering and state-of-the-art at the timeof their conduct, these landmark studies never-theless failed to allow definitive conclusionsabout the effectiveness of community-wide stra-tegies. This lack of conclusive evidence has beenacknowledged by the designers of the studies(Farquhur et al., 1985; Puska et al., 1985).

There have recently been considerableadvances in our understanding of sampling andanalysis issues in studies where communities (orclusters) rather than individuals are randomlyassigned. In cluster designs, two sets of samplesizes must be considered—the number of indivi-duals in a community to be sampled and thenumber of communities to be included—as var-iation can occur at both these levels (Donner etal., 1981). Additionally, the power to detect anintervention effect will depend more upon thenumber of communities studied than upon thenumber of individuals observed within each com-munity (Hsieh, 1988). In practical terms, thismeans that, for acceptable statistical power, eva-luations of community-wide programs willusually require that many communities beassigned to control and intervention groups.

There can be difficulties in analysing data frommulti-community trials if community sampling isnot accounted for within the analysis. Communityallocation health evaluations should not be ana-lysed as if individuals have been randomlyassigned. If standard statistical methods areused with cluster samples, the degree of statisticalsignificance will tend to be overestimated(Donner, 1982). Therefore, such studies mayreport statistically significant differences in theabsence of a true effect (Type 1 error).

The practical consequences for evaluators ofcommunity-wide health promotion programs arethat they will need to include more communitiesand more people.

Have evaluations of community-wide healthpromotion accounted for these design issues?While the methodological difficulties associatedwith cluster designs were recognised by the 1950s,greater awareness of these issues was stimulated

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

by a seminal paper by Cornfield (1978). Withinthe health promotion field, the issues of appro-priate controls for community evaluations werefirst debated in relation to the Stanford study inthe late 1970s (Farquhar et al., 1990).

Despite recognition of the threats to scientificvalidity, most subsequent evaluations of commu-nity-wide interventions have not met the require-ments for adequate study design. A review byDonner et al. (1990) found that of 16 communityintervention studies published from 1978 to 1989,11 failed to justify the need for cluster randomi-sation, only three correctly accounted forbetween-cluster variation in discussing samplesize and power, and only eight accounted forbetween-cluster variation in the analysis. Onestudy failed to address any of the requirements(Black et al., 1981). Similarly, a recent review of11 studies which evaluated community-widecancer risk reduction programs (publishedbetween 1985 and 1991) found that only onestudy had used more than one pair of controland intervention communities (Clover andRedman, 1995). None of the studies had con-trolled for cluster randomisation in selecting theirsample or in their analysis.

A rare example of an optimally designed eva-luation is the COMMIT project (COMMITResearch Group, 1991). The aim of this studywas to measure the impact of a community-widesmoking cessation and prevention program onsmoking rates. COMMIT uses 11 matched pairsof communities with one of each pair randomlyallocated to intervention condition. Eachmatched pair provides one data point for asses-sing smoking reduction in intervention comparedto control communities. The impact of the pro-gram has been assessed by establishing whetherchange across communities was greater thanexpected by chance alone. Studies such asCOMMIT are designed to produce a valid esti-mate of effectiveness and hence draw convincingconclusions about the impact of the intervention.

Why haven't optimum designs been used forevaluating community-wide programs?

Several factors have contributed to non-scientificevaluations of community-wide health promo-tion programs. First, biostatistics is a rapidlyevolving field. Many of the epidemiological andbiostatistical techniques for evaluating such pro-grams have not been readily accessible toresearchers and practitioners, due to their recencyor continuing development (Zeger, 1988; Donner

Evaluating community-wide health promotion 229

and Klar, 1994). Few practising researchers willhave had the opportunity to absorb informationabout cluster randomisation trials in the sameway that they have about individual randomisa-tion. There is therefore a need to provideresearchers and practitioners with training pro-grams and tools to enable them to incorporatenew statistical technology, and to stimulatedebate about analysis and design issues withinhealth promotion forums.

Second, the assignment of multiple commu-nities is expensive. For example, the cost of theevaluation of COMMIT was estimated to be~US$ 41 million. This level of financial commit-ment means that very few evaluation studiesusing designs like COMMIT will be undertaken.However, lack of information about the efficacyof community-wide interventions, their growinguse in health promotion and the range of poten-tial confounders and predictors of change suggestthat many evaluation studies will be needed.There is, therefore, a need for solutions to thisfunding conundrum; either the development ofmore cost-effective but still scientific evaluationstrategies, changes to current funding strategies,or the development of networked research centreswhere the cost can be shared across sites.

A third barrier is lack of feasibility. It may notbe possible to identify enough clusters or appro-priate communities to take part in the program.For example, within Australia, there are onlyfour or five cities of 1 million or more people.Evaluating a program designed for communitiesof this size will exclude the possibility of arandomised control trial. Likewise, if the pro-gram requires considerable input from theresearch team, it may not be feasible for it to beimplemented in a short time period in manycommunities.

Fourth, within health promotion, there is littleconsensus about the relative importance of find-ing out more about how the intervention worksby collecting detailed process measures comparedwith establishing whether the program is chan-ging health behaviour. There is a perception thatpre-post designs cannot adequately provideinformation for developing and modifying inter-ventions. From this perspective, scientific designissues may be of less importance than the collec-tion of sophisticated process measures which candemonstrate a link between intervention compo-nents and changes in health behaviour.

Finally, despite political pressure on practi-tioners to implement community-wide programs,

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

230 R. Sanson-Fisher et al.

there is insufficient recognition of the time andmoney required to develop and evaluate effica-cious programs (National Centre for Epidemio-logical and Population Health, 1991). While thereis community tolerance for the time delay inintroducing a 'promising' new drug, this accep-tance has been assisted by the disastrous earlyintroduction of drugs, such as thalidomide, withserious but undiscovered side effects (Johnson etal., 1994). However, in health promotion thereare frequently demands to implement programswithout such testing and therefore pressure onevaluators to use quick and simple evaluationdesigns even if they are not scientifically rigor-ous.

It is evident that health promotion needs tofind strategies for reconciling the competingdemands on evaluators. Unless solutions can befound, it seems likely that inadequate methodol-ogies will continue to be used, and the continuingrisk that unproven programs may be implemen-ted without evaluation or with process evaluationonly.

HOW SHOULD COMMUNITY-WIDEHEALTH PROMOTION PROGRAMS BEEVALUATED?

Several principles for evaluating community-widehealth promotion programs can be established,which could resolve some of the competingdemands facing evaluators and form the basisfor guidelines for funding bodies, journal pub-lication policies and researchers and practitionerswho seek to develop their field.

Using community as the unit of analysis shouldbe strongly justifiedUsing the community rather than the individualas the unit of analysis increases the difficulty ofsampling, design and analysis, increases studycosts and can compromise feasibility of an eva-luation. Therefore, the value of randomisingcommunities rather than individuals within acommunity to treatment groups must be justified.Donner et al. (1990) note that of 16 studiesreviewed, only four justified the need for assign-ment of communities rather than individuals.

There are at least three situations in whichcommunity assignments will be necessary. First,when it is perceived that community-wide inter-ventions will produce a greater level of behaviourchange than will individual strategies; for exam-

ple, community action programs. Second, whenstudies need to use community-wide interven-tions to avoid treatment group contamination;for example, health information mail-outs.Finally, when it is not possible to deliver theintervention to individuals, such as programsinvolving policy change, economics or mass-media approaches.

However, community-wide trials are some-times used when individual allocation may havebeen possible. For example, it is frequentlyargued that mass-media programs require com-munity-wide evaluations. However, Robertson etal., 1974) demonstrated that it is possible toevaluate the impact of mass-media programswith individuals as the unit of analysis; in thisstudy, two separate cable TV networks were usedto provide mass-media messages to the treatmentgroup. The outcome of interest was seatbelt use,with car license plates used to identify treatmentor control households. It may be, therefore, thatincreased resources should be directed at devel-oping more sophisticated designs which allocateindividuals rather than necessarily undertakinglarge scale community allocation studies.

The intervention must be vigorously pre-testedIt is important that expensive multi-communityevaluations not be undertaken until the proposedintervention has been rigorously pre-tested. Pre-testing studies should occur in a series of pre-defined stages, each building upon the last. Onlywhen the intervention meets specific precondi-tions should an evaluation involving communityassignment be considered.

It is possible to draw an analogy with drugtrials, which follow a defined sequence. First is apre-test stage in which the drug is trialled withanimals. Then, Phase 1 involves the first mea-surements with humans, to explore the accept-ability of drug doses, using small numbers ofvolunteer subjects. Phase 2 assesses whetherthere is any effect of the treatment by measuringif anything good happens, with close monitoringfor adverse effects. Typically, 20-150 subjectsparticipate and there may not be a controlgroup. Finally, Phase 3 involves full-scale evalua-tion of efficacy with random allocation to controland treatment and appropriate outcome mea-sures. Less common adverse effects are assessed.Importantly, each phase of this testing builds onthe preceding stage; each phase is reasonable andimportant in its own right. The fundamentalreservation is that Phase 1 and 2 studies might

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

Evaluating community-wide health promotion 231

indicate that the drug has promise, but cannotindicate that it is efficacious.

It would seem useful to adopt a similar stagedapproach to evaluating community-wide inter-ventions. While this is not a new idea, and hasin fact been formalised in initiatives such as theUnited States National Cancer Institute's Smok-ing and Tobacco Control Program (STCP)(Glynn et al., 1993), it is evident that somehealth promotion practitioners and researchersare not yet applying these staging principles tocommunity-wide programs (Donner et al., 1990).One reason may be that funding agency supportis much more likely to be gained for Phase 3 typeprojects rather than for Phase 1 and Phase 2studies.

Phase 1During Phase 1, a series of small-scale studieswould be implemented. The studies wouldusually not be community-wide and their purposewould be to develop intervention components, totest feasibility and acceptability of proposed pro-grams, to validate measures and to establish theparameters of intervention intensity. At thisstage, the research should be planned as aniterative series where data are gathered, the pro-gram amended and then re-trialled perhaps overmany modifications. Before embarking, thereshould be a decision rule about when to stopand when to proceed to the next phase.

Phase 2In this phase, studies would address the feasibilityand acceptability of the proposed intervention inreal life. Research during this phase would seekto refine the intervention and establish whetherany change occurs in early outcome or processmeasures. The issue of outcome assessment ispotentially contentious. If there appears to beno change in outcome, it may be because thepower of the evaluation is limited. Alternatively,if outcomes measures do change, there is thedanger that the intervention will be argued tobe effective without adequate data to support thisconclusion. Despite these reservations, mostpractitioners and researchers would want someevidence of potential efficacy before embarkingon a large-scale study. A solution might be toestablish a decision rule ahead of time with aclear understanding that evidence of change inoutcome measures will imply that the program isworth testing but not that it is efficacious. Again,a 'stop or proceed' rule would be useful. The

analogy with drug trials would suggest that aPhase 2 trial would permit researchers to have afirst look at changes in the outcome measure butnot to use the data to decide on the generalisa-bility of the results. Phase 2 evaluations shouldcontinue to collect a wide variety of process

Stepped Wedge

i

Single community receives the intervention with varying timesof onset

Factorial

A(intervention) - B(control)

B(control) - A(intervention)

A(intervention) - B(intervention)

B(control) - A(control)

Need to evaluate two interventions and two conditions(intervention and control), randomised to communities

Randomization

Crossover

rTIME- - control

intervention

Randomization ^

control

' T I M E * intervention

One community is randomised to treatment and the other actsas control. With time, the intervention becomes a control and

control the intervention

Fig. 1: Alternative approaches to random allocation ofcommunities.

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

232 R. Sanson-Fisher et al.

measures, to refine the intervention for use in thereal world.

It is worthwhile to explore Phase 2 designsother than pre-post (see Figure 1). For example,a multiple baseline approach, which uses eachsubject (or community) as their own control,involves the collection of repeated measures ofthe outcome of interest over time, possibly in anumber of communities (Guyatt et al., 1986;Marascuilo and Busk, 1988). One problem withmultiple baseline studies can be the limited capa-city for attribution of intervention effect if seculartrends cause confounding. In this case, the multi-ple baseline approach could be combined withthe stepped wedge design, where outcome mon-itoring starts in each community at the sametime, but onset of intervention is staggered.Thus, some communities are in the interventionphase while others are in the baseline period, thestaggered onset of intervention ensures that anyextraneous events such as, say, an external mediasmoking cessation program, are controlled for.Over time, all communities receive the interven-tion. The impact of the intervention on the out-come of interest can be assessed using time-seriesanalysis or analysis of variance techniques. Oneof the major advantages of the multiple baselinetechnique in Phase 2 is that there is the potentialto map the impact of successive interventions onthe behaviour of interest over time, allowing theaddition of new components and rapid monitor-ing of their impact on the outcome of interest.This type of design is potentially very useful fordeveloping and refining the intervention as wellas assessing behaviour change in response to theprogram. However, multiple baseline designshave received relatively little interest from epide-miologists and biostatisticians when comparedwith the randomised control trial. There are fewdata about trends, between-cluster variation andsize of the intervention effect for this type ofdesign, which would allow researchers to estimatethe number of communities needed and optimumfrequency of measures. Additionally, behaviouralscientists have given little attention to developingor validating continuous measures such astobacco sales compared with that given to pointmeasures such as self-report of smoking withbiochemical validation. There is also little infor-mation about how best to stagger interventionimplementation in different communities,although these parameters are crucial to thevalidity of the findings. So it appears that muchmore research effort could be invested in Phase 2

technologies, so that advancement to Phase 3could be delayed until proposed interventionshad conclusively demonstrated effectiveness inthe real world.

Phase 3Phase 3 studies are designed to provide definitiveevidence about the efficacy of the program. Cur-rently, the most accepted technique for Phase 3trials involves random allocation of multiplecommunities to treatment condition. The metho-dological criteria that such evaluations shouldmeet are described below.

Randomised community trials should employadequate designThere are several considerations that shouldguide the development of an adequate multi-community randomised control trial (RCT), asfollows.

DesignThe first principle is that, given a fixed number ofsubjects, the design will be more efficient if thereare many communities with a smaller number ofindividuals rather than many individuals sampledwithin a smaller number of communities. Thenumber of clusters (communities) which arenecessary will depend upon the anticipated effi-cacy of the intervention. For a binary outcome,there will be a statistically significant result if infive towns out of five the intervention is favoured(p = 0.063), in nine out often (p = 0.021) or 12out of 15 (p = 0.035). Since communities, notindividuals, are being allocated, the most impor-tant sample size consideration is number of com-munities.

Second, the principle of random allocation tocontrol or intervention group is of vital impor-tance. Many previous evaluations of community-wide interventions have not been able to userandom allocation because of political considera-tions (Farquhar et al., 1985; Puska et al., 1985).However, if it is accepted that an adequatePhase 3 trial will involve a large-scale projectwith random allocation of many communities,political pressure will need to be resisted. Thus,the same rule would need to be applied as is thecase with individual patient randomised trials: ifpatients wish to be considered only for the treat-ment arm they are excluded from the trial.

However, when political or public health ethi-cal considerations are paramount, alternativeapproaches can include factorial and crossover

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

designs (see Figure 1). In a factorial design, twointerventions are compared to a control sepa-rately and combined. A limited form of thefactorial design, the reciprocal control design,does not have control or combined arms anduses intervention A as a control for interventionB and vice versa. In a community trial of anintervention designed to increase rates of screen-ing for cervical and breast cancer using thereciprocal control design, half of the communitiesmight be assigned to receive the program forcervical cancer and half to receive the breastcancer program. Those communities receivingthe cervical cancer program would function asthe controls for the breast cancer outcome andvice versa. This approach may help to overcomethe political and ethical obstacles associated withrandomisation since all communities participat-ing receive some intervention. It is essential thatthere is no contamination by one of the interven-tions since this would reduce the apparentimpact.

Third, there remains considerable debate aboutwhether matched pairs, stratification or totalrandomisation is the best strategy. Matchingcan provide a design advantage by increasingpower, but only if the matching factor is onethat is highly correlated with the outcome vari-able. If the matching factor is not highly relatedto the outcome variable, matching will reduce thepower because of the loss of degrees of freedomthat results from using the community pair ratherthan the individual communities as the unit ofanalysis. In stratified randomisation, several fac-tors such as size or geographic area are selectedfor stratification, and communities within eachstratum are randomly allocated. If the number ofcommunities is small, then issues of matching andstratification become more important.

The issue of full randomisation versusmatched or stratified designs is particularly diffi-cult for community-wide studies seeking to alterhealth behaviour. The variables which are likelyto affect outcome at the community level arefrequently not known. For example, while smok-ing rates and smoking cessation rates are knownto be associated with socioeconomic status (Hilland Gray, 1984), there is little information onthe relationship of factors such as communitysize, urban versus rural location or number ofhealth care providers to cessation rates. If theprogram is attempting to mobilise communityaction, other factors, such as community cohe-siveness, the presence of effective community

Evaluating community-wide health promotion 233

leaders, or a well-developed infrastructure,might be important. However, because of thelack of knowledge about the likely role of thesefactors, they could not currently be included in amatching design.

An additional concern with matched designs isthat multivariate methods to model treatmenteffects which adjust for imbalance on individuallevel covariates used in the matching cannot thenbe used. For instance, if communities arematched on socioeconomic status (SES), thenSES cannot be included as a possible predictorof behaviour change. Predictive analyses arelikely to be important in community-widehealth behaviour programs, where they can pro-vide useful information about the context inwhich the programs will work or not, which isneeded for improvements to the intervention.

Finally, consumers of research may have opi-nions about the factors which are likely to makecommunities similar or dissimilar. Unless a'matched pair' of communities somehow appearto be similar to those communities and to politi-cians, it is unlikely that the results of the researchwill be accepted as valid. Thus, there is a need todevelop strategies for selecting appropriatematching variables or for combining them in asystematic manner, based upon better models ofcommunity behaviour and greater understandingof potential confounders.

Sample sizeAssessing sample size is more complex for trialswhich randomise communities rather than indi-viduals, and both number of communities andnumber of individuals in each should be calcu-lated in advance (Donner, 1982; Hsieh, 1988). As

Table 1: Sample size required to detect a reduction insmoking from 25% to 20% with 90% power andalpha = 0.05

Effective samplesize in each town

Number of people to besurveyed per group

Number of people to besurveyed per town

IF

Number

5

300

10 500

2100

7

of towns per group

10

150

6000

600

4

20

75

3750

188

2.5

30

50

3000

100

2

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

234 R. Sanson-Fisher et al.

mentioned before, for the same total number ofindividuals, in general, the study will be mostefficient if a large number of communities areincluded rather than a large number of indivi-duals within each community. An example of thiseffect is shown in Table 1. The example assumesthat the study is attempting to evaluate a smok-ing cessation program where whole communitieswill be randomised to intervention or control.The table illustrates an approach to the problemof how to estimate the number of communities touse and the number of people to sample withineach community.

Sample size requirements for studies whichallocate communities can be calculated by adapt-ing standard formulae. The number of subjectsrequired per treatment group should first becalculated using standard sample size formula.The result is then multiplied by an inflation factor(IF):

IF = 1 + {m-\)p

where m = average cluster size and p = priorestimate of intra-class correlation (Donner et al.,1981). The intra-class, or intra-cluster, correla-tion is the community-level variance; that is, thelevel of statistical dependence of individualswithin a cluster. Reliable estimates of the intra-class correlation will rarely be available, as it isdifficult to estimate the variability of healthbehaviours within and between communities(Donner, 1982). Thus, individual sample sizerequirements will often not be accurate, particu-larly where only a small number of communitiesare being used in the trial. Koepsell et al. (1991)demonstrated that estimates of variance candepend upon the method used to estimate com-munity-level variance, and also upon the com-munities used to provide estimates. Theyrecommend that optimistic and pessimistic esti-mates of sample size be made.

AnalysisThere is still considerable developmental work tobe done on the analysis methods for trials wherethe community is the unit of allocation. There areseveral types of analysis to be considered, asfollows.

(i) Using cluster as the unit of analysis: eachcommunity provides one data point. Forexample, each community might generatethe percentage change in smoking frompre- to post-test. These data are submitted

to a Mest or paired Mest depending onwhether the trial used random allocation orwas a matched pair design. This approach issimple but it may lack power unless a largenumber of communities is included in thestudy. Further, it cannot be used to adjustfor covariates measured at the individuallevel such as differences between the com-munities in terms of age or sex.

(ii) Continuous outcome measures: analyses to beused on continuous outcome measures arewell developed, and include mixed modelanalysis of variance and generalised leastsquares. These are described in detail inKoepsell et al. (1991) and Donner (1985).Standard statistical methods such as /-testscan also be adapted. For example, the teststatistic can be divided by a correction factor(Donner, 1982).

(iii) Dichotomous data: methods of analysingdichotomous data are much less well devel-oped. Available methods are described inDonner and Klar (1994). Randomiseddesigns can be analysed using standard teststatistics such as Pearsons chi-square by acorrection factor (Donald and Donner,1987). There are some techniques availablefor conducting multivariate analysis (Lianget al., 1986; Donner, 1987). However, theyall require a large sample size. Stratified andpair-matched designs can use an extension ofthe Mantel-Haenszel test to incorporateclustering (Donner, 1987, 1992). There aresome methods available for multivariateanalysis although these techniques are stillin the developmental phase.

It is evident that more work is required todevelop, refine and disseminate appropriate ana-lysis for trials where community is the unit ofanalysis, which can be adopted by non-statisti-cians such as practitioners of health promotion.For example, user-friendly computer packageswhich could be used by practitioners wouldincrease the sophistication of analysis of suchtrials.

CONCLUSIONS

There continues to be a lack of consensus aboutthe most effective strategies for evaluating com-munity-wide health promotion programs. Whilethe randomised control trial is the optimum

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

design, it is costly and requires difficult epidemio-logical and biostatistical technology. In addition,it may be politically unacceptable because ofrandomisation of communities to control condi-tions and, unless it includes sophisticated processmeasures, may not yield much information aboutthe condition under which the intervention pro-gram is effective.

It is likely that the resolution of these difficul-ties will require input from epidemiologists, bios-tatisticians and behavioural scientists, if researchdesigns are to be generated which are both scien-tifically valid and feasible in the real world. Theviews of politicians and communities themselveswill also need to be sought, if the results of theresearch are to be accepted outside of the scien-tific community and if the need to fund large-scale, expensive studies is to be recognised.Research funding agencies have a critical role toplay in improving methodologies in this area byreinforcing minimum criteria for different typesof evaluation and indicating to researchers thatdesigns other than randomised trials may beacceptable provided that the conclusions drawnare appropriate.

It is also evident that, as more researchersunderstand the need to include many commu-nities in evaluation studies, there will need to bechanges to our current approach to evaluation.Four issues in particular seem in need of atten-tion from scientists and funding agencies. First,there is a need to develop methodological criteriafor projects which are not yet ready for a large-scale randomised trial; these projects provideessential developmental data but will use meth-ods other than randomised control trials. Thereneeds to be critical debate within the scientificcommunity about the standards that such studiesneed to meet to be considered suitable for fund-ing and publication. Second, there needs to be anincreased understanding of the design and analy-sis issues that are raised by cluster randomisationwithin the health promotion field. Third, thereneeds to be consideration of whether it is possibleto develop alternate methodologies to rando-mised control trials for rigorously evaluatingcommunity-wide programs. Fourth, when multi-community studies are proposed, the fundingcommitment may be less burdensome if co-opera-tion between research centres in developing,resourcing and conducting these trials can beachieved.

Evaluating community-wide health promotion 235

ACKNOWLEDGEMENTS

The CART project is a collaborative projectjointly funded by the National Health and Med-ical Research Council (Australia) and the NSWCancer Council (Australia). This paper arosefrom a workshop held in Newcastle, NSW inSeptember 1992, part-funded by the AustralianCancer Society. The contributions of the work-shop participants are gratefully acknowledged.

Address for correspondence:Dr Lynne HancockFaculty of Medicine and Health SciencesUniversity of NewcastleLocked Bag 10Wallsend 2287 NSWAustralia

REFERENCES

Black, R. E., Dykes, A. C , Anderson, K. E., Wells, J. G.,Sinclair, S. P., Gary, G. W., Hatch, M. H. and Gangarosa,E. J. (1981) Handwashing to prevent diarrhoea in day-carecentres. American Journal of Epidemiology, 113, 445-451.

Clover, K. A. and Redman, S. (1995) Is community participa-tion effective in encouraging change of behaviour? Unpub-lished report, University of Newcastle.

COMMIT Research Group (1991) Community interventionTrial for Smoking Cessation (COMMIT): summary ofdesign and intervention. Journal of the National CancerInstitute, 83, 1620-1628.

Cornfield, J. (1978) Randomization by group: a formalanalysis. American Journal of Epidemiology, 108, 100-102.

Dixon, J. (1989) The limits and potential of communitydevelopment for personal and social change. CommunityHealth Studies, 13, 82-92.

Donner, A. (1982) An empirical study of clusterrandomization. International Journal of Epidemiology, 11,537-543.

Donner, A. (1985) A regression approach to the analysis ofdata arising from cluster randomization. International Jour-nal of Epidemiology, 14, 322-326.

Donner, A. (1987) Statistical methodology for paired clusterdesigns. American Journal of Epidemiology, 126, 972-979.

Donner, A. (1992) Sample size requirements for stratifiedcluster randomization designs. Statistics in Medicine, 11,743-750.

Donald, A. and Donner, A. (1987) Adjustments to theMantel-Haenszel chi-square statistic and odds ratio var-iance estimator when the data are clustered. Statistics inMedicine, 6, 491-499.

Donner, A. and Klar, N. (1994) Methods for comparing eventrates in intervention studies when the unit of allocation is acluster. American Journal of Epidemiology, 140, 279-289.

Donner, A., Birkett, N. and Buck, C. (1981) Randomizationof cluster: sample size requirements and analysis. AmericanJournal of Epidemiology, 114, 906-914.

Donner, A., Brown, K. S. and Brasher, P. (1990) A metho-dological review of non-therapeutic intervention trialsemploying cluster randomization, 1979-1989. InternationalJournal of Epidemiology, 19, 795-800.

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from

236 R. Sanson-Fisher et al.

Edinburgh Research Unit in Health and Behavioural Change,University of Edinburgh (1989) Changing the Public Health,Chapters 5 and 8. John Wiley, New York.

Farquhar, J. W. (1978) The community-based model of lifestyle intervention trials. American Journal of Epidemiology,108, 103-111.

Farquhar, J. W., Fortmann, S. P., Maccoby, N., Haskell, W.L., Williams, P. T., Flora, J. A., Barr, T. C , Brown, B. W.,Jr, Solomon, D. S. and Hulley, S. B. (1985) The Stanfordfive-city project: design and methods. American Journal ofEpidemiology, 122, 323-334.

Farquhar, J. W.. Fortmann, S., Flora, A. J., Taylor, B.,Haskell, W. L., Williams, P. T., Maccoby, N. andWood, P. (1990) Effects of community-wide education oncardiovascular disease risk factors: The Stanford Five CityProject. Journal of the American Medical Association, 264,359-365.

Glynn, T. J., Manley, M. W., Mills, S. L. and Shopland, D. R.(1993) The United States National Cancer Institute and thescience of tobacco control research. Cancer Detection andPrevention, 17, 507-512.

Green, L. W. and Kreuter, M. W. (1991) Health PromotionPlanning: An Educational and Environmental Approach.Mayfield, Mountain View.

Guyatt, G., Sackett, D., Taylor, W., Chong, J., Roberts, R.and Pugsley, S. (1986) Determining optimal therapy—ran-domized trials in individual patients. New England Journalof Medicine, 314, 889-892.

Hill, D. and Gray, N. (1984) Australian patterns of tobaccosmoking and related health beliefs in 1983. CommunityHealth Studies, 8, 307-314.

Hsieh, F. Y. (1988) Sample size formulas for interventionstudies with the cluster as unit of randomization. Statisticsin Medicine,!, 1195-1202.

Johnson, K. A., Ford, L. G., Kramer, B. and Greenwald, P.(1994) Overview of the National Cancer Institute (USNCI)chemoprevention research. Ada Oncologica, 33, 5-11.

Koepsell, T. D., Martin, D. C , Diehr, P. H., Psaty, D. M.,Wagner, E. W., Perrin, E. B. and Cheadle, A. (1991) Dataanalysis and sample size issues in evaluation of community-based health promotion and disease prevention programs: amixed-model analysis of variance approach. Journal ofClinical Epidemiology, 44, 701-713.

Kottke, T. E., Battista, R. N., DeFreise, G. H. and Brekke,

M. L. (1988) Attitudes of successful smoking cessationinterventions in medical practice. A meta-analysis of 39controlled trials. Journal of the American MedicalAssociation, 259, 51-61.

Liang, K. Y., Beaty, T. H. and Cohen, B. H. (1986) Applica-tions of odds ratio regression models for assessing familialaggregation from case-control studies. American Journal ofEpidemiology, 124, 678-683.

Marascuilo, L. A. and Busk, P. L. (1988) Combining statisticsfor multiple baseline AB and replicated ABAB designsacross subjects. Behaviour Assessment, 10, 1—28.

National Centre for Epidemiological and Population Health(1991) The Role of Primary Health Care in Health Promo-tion in Australia: Interim Report to the National BetterHealth Program. Commonwealth Department of Health,Housing and Community Services, Canberra, Australia.

Pomerleau, O., Adkins, D. M. and Pertschuk, M. (1978)Predictors of outcome and recidivism in smoking cessationtreatment. Addictive Behaviours, 3, 65-70.

Puska, P., Nissinen, A., Tuomilehto, J., Salonen, J. T.,Koskela, K., McAlister, A., Kottke, T. E., Maccoby, N.and Farquhar, J. W. (1985) The community-based strategyto prevent coronary heart disease: conclusions from the tenyears of the North Karelia Project. Annual Review of PublicHealth, 6, 147-193.

Robertson, L. S., Kelley, A. B., O'Neill, B., Wixon, C. W.,Eiswirth, R. S. and Haddon, W. (1974) Controlled studyon the effect of television messages on safety belt use.American Journal of Public Health, 64, 1071-1080.

Rogers, E. M. (1971) Diffusion of Innovations. Free Press, NewYork.

Rothman, J. (1968) Three models of community organisationpractice. In Social Work Practice, National Conference onSocial Welfare. Columbia University Press, New York.

Tones, K., Tilford, S. and Robinson, Y. K. (1990) HealthEducation: Effectiveness and Efficiency. Chapman & Hall,London.

World Health Organization (1986) Ottawa Charter for HealthPromotion. International Conference on Health Promotion,17-21 November, Ottawa, Ontario, Canada.

Zeger, S. L. (1988) Discussion of papers on the analysis ofrepeated categorical response. Statistics in Medicine, 7, 161-168.

by guest on July 6, 2011heapro.oxfordjournals.org

Dow

nloaded from