Power Analysis - OSF

58
Power Analysis Open Science Workshop more information available on the last slide

Transcript of Power Analysis - OSF

PowerAnalysis

OpenScienceWorkshop

moreinformationavailableonthelastslide

Howtousetheseslides

• Usethewholeslidedeckasa1.5-2hourspresentation• Usepartsoftheslidesaspartofalongerworkshop

– IdeasforworkshopconceptscanbefoundintheREADMEfolder

• Remixthemwithyourownorotherslides,changethelayout,refinethecontent…(everythinggrantedbytheCC-BYlicense)

OpenScienceintheresearchprocess

Formulatehypotheses&analysisplan

Collectdata

AnalyzedataInterpret&reportresults

Replicateresults

Preregistration

OpenLabNotebook

Publish&distributeresearchoutput

RegisteredReport(1st phase)

OpenAnalysisCode

OpenDataOpenMaterials

OpenAccess

RegisteredReport(2nd phase)

Replicationstudy

Wehaveworkshopmaterialforallblue

topics – getallmaterialhere:https://osf.io/zjrhu/

PowerAnalysis

Researchis Asking Questions

Does this medicine help?

pixabay.com

Isthis substance toxic?

Doweather conditionspredict earthquakes?

Isthere anincreasingresistency toantibiotics?

AreBhuddists happierpeople?

Decisions &Errors

https://effectsizefaq.files.wordpress.com/2010/05/type-i-and-type-ii-errors.jpg

Decisions &ErrorsTypeIerror

(false positive)TypeIIerror

(false negative)

https://effectsizefaq.files.wordpress.com/2010/05/type-i-and-type-ii-errors.jpg

Decisions &Errors

THEFACTS

Reality:effectpresent

Reality:Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent

TruePositive FalsePositive(TypeIerror)

Testindicates:Noeffectispresent

FalseNegative(TypeIIerror) TrueNegative

Iconby flaticon.com(Freepik)

Decisions &Errors:ExerciseWhat kind of error is it?(1)

Mammograms are low-dosex-rays that allow radiologists to detect changes inbreast tissue which can indicate breast cancer.Screeningmammograms areused to look for signs of breast cancer inwomen without symptoms.Theprobability that ascreening mammogram looks abnormal even thoughthe woman does notsuffer from breast cancer is 9.6%.What kind of erroroccurs inthese cases?

https://commons.wikimedia.org/wiki/File:Blausen_0628_Mammogram.png Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)

Decisions &Errors:Exercise

THEFACTS

Reality:effectpresent

Reality:Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent

TruePositive FalsePositive(TypeIerror)

Testindicates:Noeffectispresent

FalseNegative(TypeIIerror) TrueNegative

Iconby flaticon.com(Freepik)

What kind of error is it?(1)Theprobability that ascreening mammogram looks abnormal even thoughthe woman does notsuffer from breast cancer is 9.6%.What kind of erroroccurs inthese cases?

Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)

Decisions &Errors:ExerciseWhat kind of error is it?(2)

Theprobability that ascreening mammogram looks normaleven though thepatient suffers from breast cancer is 0.2%.What kind of error occurs inthesecases?

https://commons.wikimedia.org/wiki/File:Blausen_0628_Mammogram.png Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)

Decisions &Errors:Exercise

THEFACTS

Reality:effectpresent

Reality:Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent

TruePositive FalsePositive(TypeIerror)

Testindicates:Noeffectispresent

FalseNegative(TypeIIerror) TrueNegative

Iconby flaticon.com(Freepik)

What kind of error is it?(2)Theprobability that ascreening mammogram looks normaleven though thepatient suffers from breast cancer is 0.2%.What kind of error occurs inthesecases?

Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)

Decisions &Errors:ExerciseWhat kind of error is it?(2)Additionalmaterial:Full table of probabilities for the breast cancer example

THEFACTSReality:effectpresent

Reality:Noeffectpresent

total

THEDE

CISION Testindicates:

Effectispresent

0.8% 9.6% 10.4%

Testindicates:Noeffectispresent

0.2% 89.4% 89.6%

total 1% 99% 100%

Decisions &Errors:Exercise

Pictureby pixabay.comby Tumisu

What kind of error is it?(3)

Acompany has two candidates for ajob opening.To test the ability of thecandidates,the company performs exhaustiveassessment tests.Thesetestsindicate that candidate Ais asuitable candidate whereas Bwould notperform wellonthe job.Therefore,the company decides to hire A.However,Adoes notfulfill the expectations,sothe company decides nottoextend A`s contract afterthe probation period.

What decision error did the company make byhiring A?

THEFACTS

Reality:effectpresent

Reality:Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent

TruePositive FalsePositive(TypeIerror)

Testindicates:Noeffectispresent

FalseNegative(TypeIIerror) TrueNegative

Decisions &Errors:ExerciseWhat kind of error is it?(3)

What decision error did the company make by hiring A?

Pictureby pixabay.comby Tumisu

Decisions &Errors:ExerciseWhat kind of error is it?(4)

B, the person the company did nothire,applied to other jobs and found asimilar occupation at another firm.Bperforms very wellonthe job,sothefirmdecides to extend B`s contract afterthe probation period.

What decision error did the first company make by nothiring B?

Pictureby pixabay.comby Tumisu

THEFACTS

Reality:effectpresent

Reality:Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent

TruePositive FalsePositive(TypeIerror)

Testindicates:Noeffectispresent

FalseNegative(TypeIIerror) TrueNegative

Decisions &Errors:ExerciseWhat kind of error is it?(4)

What decision error did the company make by nothiring B?

Pictureby pixabay.comby Tumisu

NHSTLogic

AlternativeHypothesis:„There is aneffect.“

NullHypothesis:„There is no effect.“

Reject the nullhypothesis

Donotreject the nullhypothesis

formulate hypotheses

ResearchQuestion

test decision

Indirect support

ErrorControl inNHST

ErrorControl inNHST

Correct decision=true positive

TypeIerror=false positive=alpha error

Iconsby flaticon.com(MaximBasinski)

TypeIIerror=false negative=beta error

Correct decision=true negative

ErrorControl inNHST

ErrorControl inNHST

Power

Power:Theprobability to detect aneffect if aneffect is present.

Iconsby flaticon.com(Dinosoft Labs)

ErrorControl inNHST

THEFACTS

Reality:H1effectpresent

Reality:H0Noeffectpresent

THEDE

CISION Testindicates:

Effectispresent(H1)

Power(1-β) FalsePositive(TypeI/α error)

Testindicates:Noeffectispresent(H0)

FalseNegative(TypeII/β error) TrueNegative

Iconby flaticon.com(Freepik)

How can we control error rates and power?

ErrorControl inNHST

Changedecision region

Smaller αerrorprobability

Lower power,higher βerror probability

ErrorControl inNHST

Largereffect size

Less overlap betweendistributions

Higherpower,lower βerror probability

ErrorControl inNHST

Increase samplesize

Measure more accurately

Higherpower,lower βerror probability

ErrorControl inNHST

Increase measurementaccuracy

Reduce measurementvariance

Higherpower,lower βerror probability

ErrorControl inNHST

Influencing Factor Influence Use to control error rates?

Decision region lower α-error rates,higher β-errorrates,lower power

Use it to control the maximumfalse positive error rate

Effect size the higher, the lower β-errorrate,the higher the power

No control of populationeffect size!Need of effect size estimate tocontrol powerof experiment

Samplesize Lower β-errorrates,higherpower

Use samplesize to controlpowerand maximum falsenegative error rate

Measurementaccuracy

Lower β-errorrates,higherpower

Always use the bestmeasurement tools at hand!

ErrorControl inNHST

Pictureby pixabay.com(sebaie-1992)

Understood everything?

Power =Probability to findan

effect if it exists

Increase powerby increasing samplesize,

the true effect size,and/or measurement

accuracy

ErrorControl inPractice

Lowstatistical powerinbiomedical science

Dumas-Malletetal.(2017)

ErrorControl inPractice

Lowstatistical powerinpsychology

Fraley &Vazire (2014)

ErrorControl inPractice

Lowstatistical powerinneurosciences

Buttonetal.(2013)

Why is PowerImportant?

1. Probability of Discovery

You want significant results?

If the powerof your test is only50%,you willget asignificantresult only halfof the timeevenif the effect is present.

Picturefrom pixabay.com

Why is PowerImportant?

2.Probability of Replicationwith apowerof 35%

0.35

0.65

Effect detected

Effect notdetected

0.35

0.65

0.35

0.65

Effect detected p=0.12

Effect detected p=0.23

Effect notdetected p=0.42

Effect notdetected p=0.23

2Identical experiments of the same(true)effectPower=0.35inboth experiments

Why is PowerImportant?

Colquhoun (2014),Nuzzo (2014),Schönbrodt&Bollmann(2016)

3.Credibility of Significant Results

Why is PowerImportant?

Colquhoun (2014),Nuzzo (2014),Schönbrodt&Bollmann(2016)

Why is PowerImportant?

Colquhoun (2014),Nuzzo (2014),Schönbrodt&Bollmann(2016)

Why is PowerImportant?Tryit out!

shinyapps.org/apps/PPV/

How to DoaPowerAnalysis

Inorderto increase statistical power,you could justcollect hugesamples,but...

... you may need to sacrificelotsof these cuties

Picturesfrom pexels.com(Skitterphoto),media.giphy.com/media/EPwELUbhreEPC/giphy-downsized.gif

... you may lackresources(money,time,place)

How to DoaPowerAnalysis

Picturesfrom nature.com/articles/srep43627,pngimg.com/download/23544,whyopenresearch.org/funding

Ntoo small Ntoo large

• Failto detectexisting effects

• Waste of resources• Ethical issues

Apoweranalysis helps you to findabalance between...

How to DoaPowerAnalysis

Significance Level

Desired Power

Effect Size

Youspecify

SampleSize

How to DoaPowerAnalysis

1. How to specify the significance level

• Convention:α =0.05• Butsome researchers argue for α =0.005(Benjaminetal.,2018)• And some argue that you should justifyyour alpha-level for

every study (Lakens etal., 2018)• Donotforget to include corrections for multipletesting

How to DoaPowerAnalysis

2.How to specify your desired power

• Convention:β =0.8(Cohen,1992)• For critical studies:β =0.9recommendable (Bondareva,2013)

How to DoaPowerAnalysis

3.How to specify the effect size (under the H1)

• Make aneducated guess about realeffect size based on• Existing literature (but:publication bias!)• PilotStudy(but:inaccurate estimate,cost-ineffective)

• Safeguard-Power:Incorporate uncertainty inESestimates inexisting literature (aim for lower endof 60%CI)

• Use the smallest effect size of interest (SESOI)• What is the smallest effect size you would like to detect?

Lakens&Evers(2014),Peruginietal.(2014),Simonsohn etal.(2014),Schönbrodt&Bollmann(2016)

How to DoaPowerAnalysis

What else should you know before?• What hypothesis test doyou want to use?• (repeated measures)ttest• (repeated measures)ANOVA• (multiple)Regression• ...

• Required samplesize depends onthe hypothesis test

How to DoaPowerAnalysis

Picturefrom pixabay.comby max60500

DoIneed to know adifferentapproach foreach hypothesis test?

How to DoaPowerAnalysis

No,there are some really helpful software programsthat help you compute apoweranalysis.

DoIneed to know adifferentapproach foreach hypothesis test?

Picturefrom pixabay.comby max60500

PowerAnalysis:SoftwareG*Powerhttp://www.gpower.hhu.de/

PowerAnalysis:SoftwareExerciseWhat samplesize doyou need to achieve apowerof 80%with agiven significance level of α =0.05for these designs?UseG*Power!

TotalN

Two-samplettest (between design), d=0.5

One-samplet test (within design),d=0.5

Correlation:r=0.21

Difference between two independent correlations(r1=.15,r2=.40à q=0.273)

ANOVA,2x2Design:Interaction effect,f=0.21Alltests are two-tailed

Examples from Schönbrodt&Bollmann(2016)

PowerAnalysis:SoftwareExerciseWhat samplesize doyou need to achieve apowerof 80%with agiven significance level of α =0.05for these designs?UseG*Power!

TotalN

Two-samplettest (between design), d=0.5 128(64pergroup)

One-samplet test (within design),d=0.5 34

Correlation:r=0.21 173

Difference between two independent correlations(r1=.15,r2=.40à q=0.273)

428

ANOVA,2x2Design:Interaction effect,f=0.21 180(45 each group)Alltests are two-tailed

Examples from Schönbrodt&Bollmann(2016)

PowerAnalysis:Software

Rlibraries for poweranalysis• pwr (functionality similar to G*Power)• powerAnalysis (powerfor standard frequentist tests)• samplesize (powerfor ttest and Wilcoxon test)• SIMR(powerfor generalized linearmixed models)• PoweR (powerfor goodness-of-fittests)• powerlmm (powerfor longitudinalmultilevel models)• longpower (powerfor linearmodels of longitudinaldata)• NPHMC(powerfor survival analysis)• ...and probably many more

PowerAnalysis:SoftwareConducting apoweranalysis with R

Input

Output

PowerAnalysis:SoftwareOthersoftware• There is plenty of other software outthere to conduct power

analyses,e.g.• http://powerandsamplesize.com/Calculators/• http://www.bristol.ac.uk/cmm/software/mlpowsim/• https://www.stat.ubc.ca/~rollin/stats/ssize/• https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/...

Power:Misinterpretations

Poweris afrequentist property – beware of fallacies!

• Poweris apre-data measure that averages overinfinitehypothetical experiments– Only 1of these experiments willbe observed– Poweris aproperty of the test procedure, notof an

individualstudy‘s outcome

• Poweris conditional oneffect size – notconditionalonthe data obtained

Wagenmakers etal.(2014)

Power:3EasySteps

HowyoucanimproveyourOSrecord(almost)withouteffort1. Asareviewer,checkifthereisajustificationofsamplesize

thatreferstoapowercalculation.Isthepowerofthedesignreported?

2. Asareviewer,checkifthearticlecorrectedformultipletesting

3. Donotrelyonrulesofthumbforsamplesizecalculation

Power:What You Learned

• What are typeI(α)and typeII(β)error rates?• What is statistical power?• What is statistical powerinthe context of nullhypothesis

significance testing?• How is statistical powerrelated to the credibility of research?• What are influencing factors onstatistical power?• How doyou determine study samplesizes to achieve acertain

statistical powerand control error rates?

FurtherResources

• Cohen(1988).Statisticalpoweranalysis for the behavioral sciences (2nded.).NewYork,NY:LawrenceErlbaum:utstat.toronto.edu/~brunner/oldclass/378f16/readings/CohenPower.pdf

• Gelman,A.&Hill,J.(2007).Dataanalysis using regression and multilevel /hierarchical models.CambridgeUniversityPress.Chapter20.5

• Kruschke,J.(2014).Doing Bayesian data analysis:Atutorial with R,JAGS,and Stan(2nded.).Boston:AcademicPress.Chapter13

• Schönbrodt&Bollmann(2016).Materialsof the Advanced PowerAnalysisworkshop at LMU:https://osf.io/d76gc/

• Schönbrodt&Wagenmakers (2017).Bayes factor designanalysis:Planningfor compelling evidence.Psychonomic Bulletin&Review.1-15.doi:10.3758/s13423-017-1230-y

• Winer,B.J. (1962).Statisticalprinciples inexperimentaldesign.NewYork,NY:McGraw-Hill.

References• AmericanCancer Society(2017).Mammograms.Available on:cancer.org/cancer/breast-cancer/screening-tests-and-early-

detection/mammograms/mammograms-what-to-know-before-you-go.html• Benjamin,D.J.,Berger,J.O.,Johannesson,M.,Nosek,B.A.,Wagenmakers,E.J.,Berk,R.,...&Cesarini,D.(2018).Redefine statistical

significance. NatureHumanBehaviour, 2(1),6.doi:10.1038/s41562-017-0189-z• Betsch,T.,Funke,J.,&Plessner,H.(2011).Denken– Urteilen,Entscheiden,Problemlösen.Berlin:Springer• Bondareva,D.(2013).Introduction to PowerAnalysis.Presentation for EPSE482IntroductiontoStatisticsforResearchinEducation.

Slides available onslideshare.net/dbondareva/introduction-to-power-analysis• Button,K.S.,Ioannidis,J.P.,Mokrysz,C.,Nosek,B.A.,Flint,J.,Robinson,E.S.,&Munafò,M.R.(2013).Powerfailure:why small sample

size undermines the reliability of neuroscience. NatureReviewsNeuroscience, 14(5),365.doi:10.1038/nrn3475• Cohen,J.(1992).Statisticalpoweranalysis. Current directions inpsychological science, 1(3),98-101.

csuw3.csuohio.edu/offices/assessment/Assessment%20Reports%202006/CoS/Psychology%203%20of%203.pdf• Colquhoun,D.(2014).Aninvestigationofthefalsediscoveryrateandthemisinterpretationofp-values. RoyalSocietyopen

science, 1(3),140216.doi:10.1098/rsos.140216• Dumas-Mallet,E.,Button,K.S.,Boraud,T.,Gonon,F.,&Munafò,M.R.(2017).Lowstatisticalpowerinbiomedicalscience:areviewof

threehumanresearchdomains. RoyalSocietyopenscience, 4(2),160254.doi:10.1098/rsos.160254• Fraley,R.C.,&Vazire,S.(2014).TheN-pactfactor:Evaluatingthequalityofempiricaljournalswithrespecttosamplesizeandstatistical

power. PloS one, 9(10),e109019.doi:10.1371/journal.pone.0109019• Lakens,D.,Adolfi,F.G.,Albers,C.J.,Anvari,F.,Apps,M.A.,Argamon,S.E.,...&Buchanan,E.M.(2018).Justify your alpha. Nature

HumanBehaviour,2,168-171.psyarxiv.com/9s3y6/• Lakens,D.,&Evers,E.R.(2014).Sailingfromtheseasofchaosintothecorridorofstability:Practicalrecommendationstoincreasethe

informationalvalueofstudies. PerspectivesonPsychologicalScience, 9(3),278-292.doi:pure.tue.nl/ws/files/3866816/388206811435826.pdf

• Nuzzo,R.(2014).Scientificmethod:statisticalerrors. NatureNews, 506(7487),150.doi:10.1038/506150a• Perugini,M.,Gallucci,M.,&Costantini,G.(2014).Safeguard poweras aprotection against imprecise powerestimates. Perspectives on

PsychologicalScience, 9(3),319-332.doi:10.1177/1745691614528519• Schönbrodt,F.D.&Bollmann,S.(2016)Advanced PowerAnalysis.Workshopat LMUMunich.Slides available at osf.io/d76gc/• Simonsohn,U.,Nelson,L.D.,&Simmons,J.P.(2014).P-curveandeffectsize:Correctingforpublicationbiasusingonlysignificant

results. PerspectivesonPsychologicalScience, 9(6),666-681.doi:datacolada.org/wp-content/uploads/2017/12/pcp2-P-curve-2-published-.pdf

• Wagenmakers,E.J.,Verhagen,J.,Ly,A.,Bakker,M.,Lee,M.D.,Matzke,D.,...&Morey,R.D.(2015).Apowerfallacy. Behavior ResearchMethods, 47(4),913-917.doi:ejwagenmakers.com/2015/WagenmakersEtAlAPowerFallacy2015.pdf

Credentials

Thecreation of this workshop materialwaspartially funded by the BerkeleyInitiativefor Transparency intheSocial Sciences (BITSS)Catalyst Program.For more information,please visit www.bitss.org,sign up for the BITSSblog,and [email protected] alsokindly thank the LMUGraduateCenter for their support.

TheseslideswerecreatedbyAngelikaStefan,JuliaBrandt,andFelixSchönbrodt.TheworkislicensedunderaCreativeCommonsAttribution4.0InternationalLicense.Thatmeans,youcanreusethisslidesinyourownworkshops,remixthem,orcopythem,aslongasyouattributetheoriginalcreators.