Power Analysis - OSF
-
Upload
khangminh22 -
Category
Documents
-
view
4 -
download
0
Transcript of Power Analysis - OSF
Howtousetheseslides
• Usethewholeslidedeckasa1.5-2hourspresentation• Usepartsoftheslidesaspartofalongerworkshop
– IdeasforworkshopconceptscanbefoundintheREADMEfolder
• Remixthemwithyourownorotherslides,changethelayout,refinethecontent…(everythinggrantedbytheCC-BYlicense)
OpenScienceintheresearchprocess
Formulatehypotheses&analysisplan
Collectdata
AnalyzedataInterpret&reportresults
Replicateresults
Preregistration
OpenLabNotebook
Publish&distributeresearchoutput
RegisteredReport(1st phase)
OpenAnalysisCode
OpenDataOpenMaterials
OpenAccess
RegisteredReport(2nd phase)
Replicationstudy
Wehaveworkshopmaterialforallblue
topics – getallmaterialhere:https://osf.io/zjrhu/
PowerAnalysis
Researchis Asking Questions
Does this medicine help?
pixabay.com
Isthis substance toxic?
Doweather conditionspredict earthquakes?
Isthere anincreasingresistency toantibiotics?
AreBhuddists happierpeople?
Decisions &ErrorsTypeIerror
(false positive)TypeIIerror
(false negative)
https://effectsizefaq.files.wordpress.com/2010/05/type-i-and-type-ii-errors.jpg
Decisions &Errors
THEFACTS
Reality:effectpresent
Reality:Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent
TruePositive FalsePositive(TypeIerror)
Testindicates:Noeffectispresent
FalseNegative(TypeIIerror) TrueNegative
Iconby flaticon.com(Freepik)
Decisions &Errors:ExerciseWhat kind of error is it?(1)
Mammograms are low-dosex-rays that allow radiologists to detect changes inbreast tissue which can indicate breast cancer.Screeningmammograms areused to look for signs of breast cancer inwomen without symptoms.Theprobability that ascreening mammogram looks abnormal even thoughthe woman does notsuffer from breast cancer is 9.6%.What kind of erroroccurs inthese cases?
https://commons.wikimedia.org/wiki/File:Blausen_0628_Mammogram.png Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)
Decisions &Errors:Exercise
THEFACTS
Reality:effectpresent
Reality:Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent
TruePositive FalsePositive(TypeIerror)
Testindicates:Noeffectispresent
FalseNegative(TypeIIerror) TrueNegative
Iconby flaticon.com(Freepik)
What kind of error is it?(1)Theprobability that ascreening mammogram looks abnormal even thoughthe woman does notsuffer from breast cancer is 9.6%.What kind of erroroccurs inthese cases?
Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)
Decisions &Errors:ExerciseWhat kind of error is it?(2)
Theprobability that ascreening mammogram looks normaleven though thepatient suffers from breast cancer is 0.2%.What kind of error occurs inthesecases?
https://commons.wikimedia.org/wiki/File:Blausen_0628_Mammogram.png Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)
Decisions &Errors:Exercise
THEFACTS
Reality:effectpresent
Reality:Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent
TruePositive FalsePositive(TypeIerror)
Testindicates:Noeffectispresent
FalseNegative(TypeIIerror) TrueNegative
Iconby flaticon.com(Freepik)
What kind of error is it?(2)Theprobability that ascreening mammogram looks normaleven though thepatient suffers from breast cancer is 0.2%.What kind of error occurs inthesecases?
Betsch,Funke,&Plessner(2011),AmericanCancer Society(2017)
Decisions &Errors:ExerciseWhat kind of error is it?(2)Additionalmaterial:Full table of probabilities for the breast cancer example
THEFACTSReality:effectpresent
Reality:Noeffectpresent
total
THEDE
CISION Testindicates:
Effectispresent
0.8% 9.6% 10.4%
Testindicates:Noeffectispresent
0.2% 89.4% 89.6%
total 1% 99% 100%
Decisions &Errors:Exercise
Pictureby pixabay.comby Tumisu
What kind of error is it?(3)
Acompany has two candidates for ajob opening.To test the ability of thecandidates,the company performs exhaustiveassessment tests.Thesetestsindicate that candidate Ais asuitable candidate whereas Bwould notperform wellonthe job.Therefore,the company decides to hire A.However,Adoes notfulfill the expectations,sothe company decides nottoextend A`s contract afterthe probation period.
What decision error did the company make byhiring A?
THEFACTS
Reality:effectpresent
Reality:Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent
TruePositive FalsePositive(TypeIerror)
Testindicates:Noeffectispresent
FalseNegative(TypeIIerror) TrueNegative
Decisions &Errors:ExerciseWhat kind of error is it?(3)
What decision error did the company make by hiring A?
Pictureby pixabay.comby Tumisu
Decisions &Errors:ExerciseWhat kind of error is it?(4)
B, the person the company did nothire,applied to other jobs and found asimilar occupation at another firm.Bperforms very wellonthe job,sothefirmdecides to extend B`s contract afterthe probation period.
What decision error did the first company make by nothiring B?
Pictureby pixabay.comby Tumisu
THEFACTS
Reality:effectpresent
Reality:Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent
TruePositive FalsePositive(TypeIerror)
Testindicates:Noeffectispresent
FalseNegative(TypeIIerror) TrueNegative
Decisions &Errors:ExerciseWhat kind of error is it?(4)
What decision error did the company make by nothiring B?
Pictureby pixabay.comby Tumisu
NHSTLogic
AlternativeHypothesis:„There is aneffect.“
NullHypothesis:„There is no effect.“
Reject the nullhypothesis
Donotreject the nullhypothesis
formulate hypotheses
ResearchQuestion
test decision
Indirect support
ErrorControl inNHST
Correct decision=true positive
TypeIerror=false positive=alpha error
Iconsby flaticon.com(MaximBasinski)
TypeIIerror=false negative=beta error
Correct decision=true negative
ErrorControl inNHST
Power
Power:Theprobability to detect aneffect if aneffect is present.
Iconsby flaticon.com(Dinosoft Labs)
ErrorControl inNHST
THEFACTS
Reality:H1effectpresent
Reality:H0Noeffectpresent
THEDE
CISION Testindicates:
Effectispresent(H1)
Power(1-β) FalsePositive(TypeI/α error)
Testindicates:Noeffectispresent(H0)
FalseNegative(TypeII/β error) TrueNegative
Iconby flaticon.com(Freepik)
How can we control error rates and power?
ErrorControl inNHST
Changedecision region
Smaller αerrorprobability
Lower power,higher βerror probability
ErrorControl inNHST
Largereffect size
Less overlap betweendistributions
Higherpower,lower βerror probability
ErrorControl inNHST
Increase samplesize
Measure more accurately
Higherpower,lower βerror probability
ErrorControl inNHST
Increase measurementaccuracy
Reduce measurementvariance
Higherpower,lower βerror probability
ErrorControl inNHST
Influencing Factor Influence Use to control error rates?
Decision region lower α-error rates,higher β-errorrates,lower power
Use it to control the maximumfalse positive error rate
Effect size the higher, the lower β-errorrate,the higher the power
No control of populationeffect size!Need of effect size estimate tocontrol powerof experiment
Samplesize Lower β-errorrates,higherpower
Use samplesize to controlpowerand maximum falsenegative error rate
Measurementaccuracy
Lower β-errorrates,higherpower
Always use the bestmeasurement tools at hand!
ErrorControl inNHST
Pictureby pixabay.com(sebaie-1992)
Understood everything?
Power =Probability to findan
effect if it exists
Increase powerby increasing samplesize,
the true effect size,and/or measurement
accuracy
Why is PowerImportant?
1. Probability of Discovery
You want significant results?
If the powerof your test is only50%,you willget asignificantresult only halfof the timeevenif the effect is present.
Picturefrom pixabay.com
Why is PowerImportant?
2.Probability of Replicationwith apowerof 35%
0.35
0.65
Effect detected
Effect notdetected
0.35
0.65
0.35
0.65
Effect detected p=0.12
Effect detected p=0.23
Effect notdetected p=0.42
Effect notdetected p=0.23
2Identical experiments of the same(true)effectPower=0.35inboth experiments
Why is PowerImportant?
Colquhoun (2014),Nuzzo (2014),Schönbrodt&Bollmann(2016)
3.Credibility of Significant Results
How to DoaPowerAnalysis
Inorderto increase statistical power,you could justcollect hugesamples,but...
... you may need to sacrificelotsof these cuties
Picturesfrom pexels.com(Skitterphoto),media.giphy.com/media/EPwELUbhreEPC/giphy-downsized.gif
... you may lackresources(money,time,place)
How to DoaPowerAnalysis
Picturesfrom nature.com/articles/srep43627,pngimg.com/download/23544,whyopenresearch.org/funding
Ntoo small Ntoo large
• Failto detectexisting effects
• Waste of resources• Ethical issues
Apoweranalysis helps you to findabalance between...
How to DoaPowerAnalysis
1. How to specify the significance level
• Convention:α =0.05• Butsome researchers argue for α =0.005(Benjaminetal.,2018)• And some argue that you should justifyyour alpha-level for
every study (Lakens etal., 2018)• Donotforget to include corrections for multipletesting
How to DoaPowerAnalysis
2.How to specify your desired power
• Convention:β =0.8(Cohen,1992)• For critical studies:β =0.9recommendable (Bondareva,2013)
How to DoaPowerAnalysis
3.How to specify the effect size (under the H1)
• Make aneducated guess about realeffect size based on• Existing literature (but:publication bias!)• PilotStudy(but:inaccurate estimate,cost-ineffective)
• Safeguard-Power:Incorporate uncertainty inESestimates inexisting literature (aim for lower endof 60%CI)
• Use the smallest effect size of interest (SESOI)• What is the smallest effect size you would like to detect?
Lakens&Evers(2014),Peruginietal.(2014),Simonsohn etal.(2014),Schönbrodt&Bollmann(2016)
How to DoaPowerAnalysis
What else should you know before?• What hypothesis test doyou want to use?• (repeated measures)ttest• (repeated measures)ANOVA• (multiple)Regression• ...
• Required samplesize depends onthe hypothesis test
How to DoaPowerAnalysis
Picturefrom pixabay.comby max60500
DoIneed to know adifferentapproach foreach hypothesis test?
How to DoaPowerAnalysis
No,there are some really helpful software programsthat help you compute apoweranalysis.
DoIneed to know adifferentapproach foreach hypothesis test?
Picturefrom pixabay.comby max60500
PowerAnalysis:SoftwareExerciseWhat samplesize doyou need to achieve apowerof 80%with agiven significance level of α =0.05for these designs?UseG*Power!
TotalN
Two-samplettest (between design), d=0.5
One-samplet test (within design),d=0.5
Correlation:r=0.21
Difference between two independent correlations(r1=.15,r2=.40à q=0.273)
ANOVA,2x2Design:Interaction effect,f=0.21Alltests are two-tailed
Examples from Schönbrodt&Bollmann(2016)
PowerAnalysis:SoftwareExerciseWhat samplesize doyou need to achieve apowerof 80%with agiven significance level of α =0.05for these designs?UseG*Power!
TotalN
Two-samplettest (between design), d=0.5 128(64pergroup)
One-samplet test (within design),d=0.5 34
Correlation:r=0.21 173
Difference between two independent correlations(r1=.15,r2=.40à q=0.273)
428
ANOVA,2x2Design:Interaction effect,f=0.21 180(45 each group)Alltests are two-tailed
Examples from Schönbrodt&Bollmann(2016)
PowerAnalysis:Software
Rlibraries for poweranalysis• pwr (functionality similar to G*Power)• powerAnalysis (powerfor standard frequentist tests)• samplesize (powerfor ttest and Wilcoxon test)• SIMR(powerfor generalized linearmixed models)• PoweR (powerfor goodness-of-fittests)• powerlmm (powerfor longitudinalmultilevel models)• longpower (powerfor linearmodels of longitudinaldata)• NPHMC(powerfor survival analysis)• ...and probably many more
PowerAnalysis:SoftwareOthersoftware• There is plenty of other software outthere to conduct power
analyses,e.g.• http://powerandsamplesize.com/Calculators/• http://www.bristol.ac.uk/cmm/software/mlpowsim/• https://www.stat.ubc.ca/~rollin/stats/ssize/• https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/...
Power:Misinterpretations
Poweris afrequentist property – beware of fallacies!
• Poweris apre-data measure that averages overinfinitehypothetical experiments– Only 1of these experiments willbe observed– Poweris aproperty of the test procedure, notof an
individualstudy‘s outcome
• Poweris conditional oneffect size – notconditionalonthe data obtained
Wagenmakers etal.(2014)
Power:3EasySteps
HowyoucanimproveyourOSrecord(almost)withouteffort1. Asareviewer,checkifthereisajustificationofsamplesize
thatreferstoapowercalculation.Isthepowerofthedesignreported?
2. Asareviewer,checkifthearticlecorrectedformultipletesting
3. Donotrelyonrulesofthumbforsamplesizecalculation
Power:What You Learned
• What are typeI(α)and typeII(β)error rates?• What is statistical power?• What is statistical powerinthe context of nullhypothesis
significance testing?• How is statistical powerrelated to the credibility of research?• What are influencing factors onstatistical power?• How doyou determine study samplesizes to achieve acertain
statistical powerand control error rates?
FurtherResources
• Cohen(1988).Statisticalpoweranalysis for the behavioral sciences (2nded.).NewYork,NY:LawrenceErlbaum:utstat.toronto.edu/~brunner/oldclass/378f16/readings/CohenPower.pdf
• Gelman,A.&Hill,J.(2007).Dataanalysis using regression and multilevel /hierarchical models.CambridgeUniversityPress.Chapter20.5
• Kruschke,J.(2014).Doing Bayesian data analysis:Atutorial with R,JAGS,and Stan(2nded.).Boston:AcademicPress.Chapter13
• Schönbrodt&Bollmann(2016).Materialsof the Advanced PowerAnalysisworkshop at LMU:https://osf.io/d76gc/
• Schönbrodt&Wagenmakers (2017).Bayes factor designanalysis:Planningfor compelling evidence.Psychonomic Bulletin&Review.1-15.doi:10.3758/s13423-017-1230-y
• Winer,B.J. (1962).Statisticalprinciples inexperimentaldesign.NewYork,NY:McGraw-Hill.
References• AmericanCancer Society(2017).Mammograms.Available on:cancer.org/cancer/breast-cancer/screening-tests-and-early-
detection/mammograms/mammograms-what-to-know-before-you-go.html• Benjamin,D.J.,Berger,J.O.,Johannesson,M.,Nosek,B.A.,Wagenmakers,E.J.,Berk,R.,...&Cesarini,D.(2018).Redefine statistical
significance. NatureHumanBehaviour, 2(1),6.doi:10.1038/s41562-017-0189-z• Betsch,T.,Funke,J.,&Plessner,H.(2011).Denken– Urteilen,Entscheiden,Problemlösen.Berlin:Springer• Bondareva,D.(2013).Introduction to PowerAnalysis.Presentation for EPSE482IntroductiontoStatisticsforResearchinEducation.
Slides available onslideshare.net/dbondareva/introduction-to-power-analysis• Button,K.S.,Ioannidis,J.P.,Mokrysz,C.,Nosek,B.A.,Flint,J.,Robinson,E.S.,&Munafò,M.R.(2013).Powerfailure:why small sample
size undermines the reliability of neuroscience. NatureReviewsNeuroscience, 14(5),365.doi:10.1038/nrn3475• Cohen,J.(1992).Statisticalpoweranalysis. Current directions inpsychological science, 1(3),98-101.
csuw3.csuohio.edu/offices/assessment/Assessment%20Reports%202006/CoS/Psychology%203%20of%203.pdf• Colquhoun,D.(2014).Aninvestigationofthefalsediscoveryrateandthemisinterpretationofp-values. RoyalSocietyopen
science, 1(3),140216.doi:10.1098/rsos.140216• Dumas-Mallet,E.,Button,K.S.,Boraud,T.,Gonon,F.,&Munafò,M.R.(2017).Lowstatisticalpowerinbiomedicalscience:areviewof
threehumanresearchdomains. RoyalSocietyopenscience, 4(2),160254.doi:10.1098/rsos.160254• Fraley,R.C.,&Vazire,S.(2014).TheN-pactfactor:Evaluatingthequalityofempiricaljournalswithrespecttosamplesizeandstatistical
power. PloS one, 9(10),e109019.doi:10.1371/journal.pone.0109019• Lakens,D.,Adolfi,F.G.,Albers,C.J.,Anvari,F.,Apps,M.A.,Argamon,S.E.,...&Buchanan,E.M.(2018).Justify your alpha. Nature
HumanBehaviour,2,168-171.psyarxiv.com/9s3y6/• Lakens,D.,&Evers,E.R.(2014).Sailingfromtheseasofchaosintothecorridorofstability:Practicalrecommendationstoincreasethe
informationalvalueofstudies. PerspectivesonPsychologicalScience, 9(3),278-292.doi:pure.tue.nl/ws/files/3866816/388206811435826.pdf
• Nuzzo,R.(2014).Scientificmethod:statisticalerrors. NatureNews, 506(7487),150.doi:10.1038/506150a• Perugini,M.,Gallucci,M.,&Costantini,G.(2014).Safeguard poweras aprotection against imprecise powerestimates. Perspectives on
PsychologicalScience, 9(3),319-332.doi:10.1177/1745691614528519• Schönbrodt,F.D.&Bollmann,S.(2016)Advanced PowerAnalysis.Workshopat LMUMunich.Slides available at osf.io/d76gc/• Simonsohn,U.,Nelson,L.D.,&Simmons,J.P.(2014).P-curveandeffectsize:Correctingforpublicationbiasusingonlysignificant
results. PerspectivesonPsychologicalScience, 9(6),666-681.doi:datacolada.org/wp-content/uploads/2017/12/pcp2-P-curve-2-published-.pdf
• Wagenmakers,E.J.,Verhagen,J.,Ly,A.,Bakker,M.,Lee,M.D.,Matzke,D.,...&Morey,R.D.(2015).Apowerfallacy. Behavior ResearchMethods, 47(4),913-917.doi:ejwagenmakers.com/2015/WagenmakersEtAlAPowerFallacy2015.pdf
Credentials
Thecreation of this workshop materialwaspartially funded by the BerkeleyInitiativefor Transparency intheSocial Sciences (BITSS)Catalyst Program.For more information,please visit www.bitss.org,sign up for the BITSSblog,and [email protected] alsokindly thank the LMUGraduateCenter for their support.
TheseslideswerecreatedbyAngelikaStefan,JuliaBrandt,andFelixSchönbrodt.TheworkislicensedunderaCreativeCommonsAttribution4.0InternationalLicense.Thatmeans,youcanreusethisslidesinyourownworkshops,remixthem,orcopythem,aslongasyouattributetheoriginalcreators.