of Political and Social Science The ANNALS of the American ...
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of of Political and Social Science The ANNALS of the American ...
http://ann.sagepub.com/of Political and Social Science
The ANNALS of the American Academy
http://ann.sagepub.com/content/628/1/132The online version of this article can be found at:
DOI: 10.1177/0002716209351519
2010 628: 132The ANNALS of the American Academy of Political and Social ScienceFernando Martel Garcia and Leonard Wantchekon
Theory, External Validity, and Experimental Inference: Some Conjectures
Published by:
http://www.sagepublications.com
On behalf of:
American Academy of Political and Social Science
can be found at:ScienceThe ANNALS of the American Academy of Political and SocialAdditional services and information for
http://ann.sagepub.com/cgi/alertsEmail Alerts:
http://ann.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://ann.sagepub.com/content/628/1/132.refs.htmlCitations:
What is This?
- Feb 23, 2010Version of Record >>
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
132 ANNALS,AAPSS,628,March2010
Itisoftenarguedthatexperimentsarestrongoncausalidentification(internalvalidity)butweakongeneraliz-ability (externalvalidity).Onewidelyacceptedway tolimitthreatstoexternalvalidityistoincorporateasmuchvariationinthebackgroundconditionsandinthecovari-atesaspossiblethroughreplication.Anotherstrategyistomakethetheoreticalfoundationsoftheexperimentmoreexplicit.Thelatterrequiresthatwedeveloptrajec-toriesofexperimentsthatareconsistentwithatheoreti-calargument.Inotherwords,newexperimentsshouldnotsimplyconsistofchangingthecontextofoldones,butdosoinwaysthatexplicitlytestvariousaspectsofatheoryinacoherentway.
Keywords: causalinference;randomizedexperiments;externalvalidity
OnDecember17,2007,1,431poorfamiliesinNewYorkCityreceivedatotalof$740,000
aspartofanew,$53million conditionalcashtransfer (CCT) anti-poverty program calledOpportunity New York City.1 The design of
Theory,External
Validity,andExperimental
Inference:Some
Conjectures
ByFERNANDOMARTEL
GARCIAand
LEONARDWANTCHEKON
Fernando Martel Garcia ([email protected]) is a PhD student at New York University’s Wilf Family Department of Politics. His research focuses on accountability mechanisms in public service delivery and how research design and statistical techniques can help us not only uncover such causal mechanisms, but also test potential improvements, make outofsample forecasts, and ultimately, improve our theories. He is currently involved in various field experiments in Mexico. Prior to starting his PhD, Mr. Martel Garcia worked as a staff economist for the World Bank.
Leonard Wantchekon ([email protected]) is a professor of politics and economics at New York University. He taught at Yale University (1995–2000) and was a visiting fellow at the Center of International Studies at Princeton University (2000–01). He is the author of several articles on post–civil war democratization, resource curse, electoral clientelism, and experimental methods, which have appeared in the AmericanPoliticalScienceReview, WorldPolitics, ComparativePoliticalStudies, QuarterlyJournalofEconomics, and Journal of Conflict Resolution. He is the founding director of the Institute for Empirical Research in Political Economy, which is based in Benin (West Africa) and at New York University.
DOI:10.1177/0002716209351519
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 133
OpportunityNYCwasinformedtoalargeextentbyProgresa,ahighlyregardedCCTprograminitiatedinMexico in1997asarandomizedfieldexperiment.2Accordingly,itisonlynaturaltoask,willtheNewYorkinitiativebeassuccessfulasitsMexicanpredecessor?
Thisisessentiallyaquestionofexternalvalidity,namely,thevalidityofinfer-ences about whether a causal relationship holds over variation in treatments,outcomemeasures,units,andsettings(Shadish,Cook,andCampbell2002,38).Assuch,itgoestotheheartofpolicy-motivatedresearch,withitsemphasisondeterminingwhatworksunderwhatconditions.Thatis,what,ifanything,canwesayaboutthecausaleffectofsimilarpoliciesunderdifferentcontexts?
Thelatterquestionwouldnotmatterwereitnotthatitarisesquiteofteninappliedwork.Indeed,inthepresentcase,thedifferencesbetweenOpportunityNYCandProgresaarejustasstrikingastheircommonpedigree.Tobeginwith,OpportunityNYCisthefirsttimeaCCTprogramisbeingtriedinalarge,wealthycity,whereasProgresastartedoutasaprogramfortheruralpoor.Second,theexperimentalunits(therelevantNewYorkhouseholds)aresignificantlyricherinabsoluteandrelativetermsthantheirMexicancounterparts.Indeed,povertyintheUnitedStatesmightwellbequitedifferentfrompovertyinruralMexico,intermsofcauses,consequences,andsolutions.Third,theinterventioninNewYorkisalsodifferent, inthat“[it] isthefirstprogramtoincludeasignificantwork-forceparticipationcomponentinadditiontothetraditionalhealthandeducationcomponents.”3
Givensignificantvariationintreatment,outcomemeasures,units,andsettings,somehavequestionedwhetherOpportunityNYCwillworkatall:
[NewYorkCityMayorMichael]Bloomberghasmisreadthepurposeofthird-worldcon-ditionalcash-transferprograms,andthushasmisreadtheirapplicabilitytoNewYork....InNewYork,unlikeinthethirdworld,poorparentsdon’thavetopaytosendtheirchil-drentoschool.Nordotheyfacethetoughchoiceofeducatingthekidsorhavingenoughmoneytoputfoodonthetableeverynight.(Gelinas2006)
Accordingtothiscritic,then,Progresaworkedbyrelaxingabindingbudgetcon-straintonpoorMexicanhouseholds,andbecausesuchaconstraintis,underthisinterpretation,notbindingamongpoorNewYorkers,OpportunityNYCwillprob-ablyfail.
Theuncertaintysurroundingthegeneralizabilityofcauseandeffectrelation-ships from field experiments such as Progresa questions not just MayorBloomberg’swisdom,buttheveryenterpriseofrandomizedexperimentationforpolicymaking.Experiments,itwouldseem,havetoofferonlyadeeplyunsatisfy-ingFaustianbargainbetween internalandexternalvalidity:Yes,wearehighlycertain CCTs worked in Mexico, but remain deeply ambiguous about theirpotentialsuccessinNewYork.
ThisarticleismotivatedbythedesiretohelpimprovethetermsofthisFaustianbargain. In what follows we distinguish between the robustness and analyticalapproachestoexternalvalidity.Theanalyticalapproachproposesaseriesoftheo-reticallymotivatedreplications.Thisapproachseestheproblemofgeneralizability
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
134 THEANNALSOFTHEAMERICANACADEMY
asintrinsicallytheoretical,inthattheoriesaboutcausalmechanisms,constructs,andselectionarewhatallowustogeneralizebeyondsamplingparticularsinindi-vidualcases.
InthecontextoftheNewYorkexperiment,thisapproachcouldbeginbytheo-rizingaboutmediatorandmoderatorvariablesthatmaydampentheextrapolationfromProgresaandthendesignasequenceofexperimentstotestthesehypothe-ses.4Indeed,randomizedCCTshavebeenimplementedinnumerouscountries,offeringampleopportunitytotestalternativecausalmechanismsforbetterpre-dictionoutofsample(Rawlings2005;Das,Do,andOzler2005).Moreover,wemaydesign small testsof specific implicationsof the theorywithouthaving toreplicateProgresaeachtime.
Incontrast,therobustnessapproachreliesonreplicationacrossvarioussettings,treatments,outcomemeasures,andunits.RatherthanfretwhetherProgresawillworkinNewYork,theapproachistojustgoaheadandtestit.Suchbruteforcereplicationis,indeed,anobviouspathtodissolvetheuncertainty.Butbesidesthepotentialpracticaldrawbacks,ithasseverelimitations.Supposewerunthetestandtheresultsarenegative.Whatshallweconclude?ThatCCTsworkinMexicoandnotinNewYork?Thattheyworkonlyinpoorersocieties?OrinallsocietiesbutNewYork?ThattheydidnotworkinNewYorkin2008-12butmayworkatalaterdate?Shouldwethereforerepeatthesameexperimenteveryfewyears?Thelistispotentiallyendless.Forus,theory-drivensequencesofexperimentsarekeyforoptimallearning.
Externalvaliditycanbegreatlyimprovedbyconnectingindividualexperimentswithatheory,evenifthoseindividualexperimentsarenotthemselvestheoreticallygrounded,especiallyat theearlystagesofaresearchprogram.Indeed, inwhatfollows,wereemphasizethedistinctionbetweentheexternalvalidityoftheoriesassociatedwitharesearchprogramandtheparticularexperimentsonwhichtheprogramisbased,therebyunderliningthethree-wayrelationbetweenexperi-ments,researchprograms,andexternalvalidity.Afterall,externalvalidityisanattributeof inferencesandnotofanyparticularmethod(Shadish,Cook,andCampbell2002).Indeed,contracommonlyheldbeliefs,experimentationiskeytotestingtheoriesaboutexternalvalidity.
Besidesmotivatingthedesireforoptimallearning,thedistinctionbetweentheanalytical and robustness approacheshas importantpractical consequences, asthepayofftomorecarefulreplicationcouldbelarge.Forexample,thebudgetforOpportunityNYChasbeensetat$53million,financedbythemayorhimselfand private foundations (it receives no public funds). This is a huge gamble.Somerelatively inexpensivepilot testinganddiagnosismaywellreduceuncer-tainty,increasingtheexpectedvalueofthisandotherfuturereplications.Thus,theremaybelargeprivateandexternalgainstobehad.
Finally,twocaveats:First,whereasthesharpdistinctionbetweenrobustnessandanalyticalapproachesisusefulasarhetoricaldevice,itisunlikelytobeborneout inpractice.Inreality, there is likelytobesomeamountofoverlap, inthatreplications often embody some implicit prior or theory. Second, because ourgoal,atthisstage,istohighlightsomeconjectures,theapproachisdiscursiveand
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 135
notexplicitlydeductive.Inongoingwork,weformallyapplythestatisticallearn-inganddecision-makingliteraturestothenotionofexternalvalidity.
Thisarticleproceedsasfollows.First,wediscusswhatexternalvalidityis,whyitisproblematicandwhatthecriteriaareforevaluatingexternalvalidity.Wethenexplainwhattheanalyticalapproachis.Next,wediscussthepotentialprosandcons.Finally,weexplorehowexternalvalidityrelatestocausality,predictionandunderstanding,beforeofferingaconclusion.
WhatDoes“ExternalValidity”ReallyMean?
Imaginetestingtheeffectofnon-partisanget-out-the-votemailingsonturnoutamongstNindividuals,aboutwhomallweknowistheirindividualidentifiersi=1,2,. . . ,N.VariableYrecordstheturnoutoutcome,suchthatyi=1ifunititurnedouttovoteandyi=0otherwise.Supposethetreatmenteffectispositiveandpracticallyandstatisticallysignificant.Now,supposeweareaskedtoplaceabetonwhetherthetreatmentwillworkonsomenewunits,unitsN+1,N+2,andN+3.Allweknowisthattheseunitswerenotintheoriginalsample.Howshallweplaceourbet?Andhowwillweknowifwehavewon?Thatis,whenisareplicationresultcloseenoughtotheoriginal inferencethatexternalvaliditymaybedeemedupheld?Weanswerthesetwoquestionsinturn.
External validity as extrapolation
Under a squared loss function, say, a common predictor for an outcome ofinterestyistheconditionalmeanE[P(y|Treatment)],whereP(y|x)describesthedensityofyconditionalonx.5Althoughthismayseemnatural,wedon’tknow,with the informationwearegiven in thehypotheticalvotingexperimentmen-tionedabove,whethertheoutcomesforthesenewunitsareidenticallyandinde-pendentlydistributed inrelationto theexperimentalsample.Forallweknow,thenewunitsareacat,adog,andagoldfish,noneofwhichvote.
Whatwe face isaproblemofpredictiveambiguityas, strictly speaking, theexperimentrevealsnothingaboutthedensityP'(y|Treatment)forthenewunits.Predictiveambiguityariseswheneverchoiceorbehaviordependsonanobjectivefunctionwithanunknownprobabilitydistribution(Manski2008).Atthispoint,wethereforeneedtomakeassumptions.Wemay,forexample,assumethatthenewunitsarenotmateriallydifferentfromtheoldones,inwhichcaseourchoiceofpredictorwouldbejustified(i.e., is inaccordancewithourassumptionsandlossfunction).
Thefundamentalproblemisthatexternalvalidityinvolvesextrapolationoftreatmenteffectstonewunitsor,moregenerally,makingpredictionsoffthesup-portoftheestimateddensityfunction(whichincludestheexperimentalsetting,outcomemeasure,andtreatment).Assuch,externalvalidityclaimsareinherentlyambiguous.Suchproblemswithextrapolationarisewheneverweignorewhethertheunitswewanttomakeapredictionaboutbelongtogetherwiththeunitsused
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
136 THEANNALSOFTHEAMERICANACADEMY
toestimatethedensity,eitherbecausewelackobservableinformationabouttheseunits, or because some potentially relevant characteristics may be intrinsicallyunobservable.This,then,istheprobleminmakingtheprojectionfromProgresatoNewYork:wejustcannotbesurethesetwopolicyexperimentsaresufficientlysimilar.Butsimilarinwhatrespects?Inthenextsection,wearguethattheyneedto be similar in theoretically relevant aspects or, alternatively, that we have atheorythatallowsustobridgetheirdifferences.
Aswasarguedabove,theproblemofextrapolationmaybeovercomeintwoways(Manski2008).First,accordingtotherobustnessapproach,wemaycircum-vent it altogetherbyperforminga test that includesa sample from thesenewvalues(e.g.,testingwhetherthenewunitsvoteafterreceivingthemailing).Ofcourse,suchtrialanderrorcanbeveryexpensiveandoftenimpractical.Amorenuancedversionofthisapproachistoperformapilottestofamuchlargerinter-vention, say, by randomly sampling from the target population of treatments,units,settings,andoutcomes.Unfortunately,suchapopulationisilldefinedformanyexperimentalaspects,suchastreatments,outcomemeasures,andsettings(Shadish,Cook,andCampbell2002).Forthisreason,wedon’tthinkthatreplica-tionusingrandomsamplesfromawell-definedpopulationovercomestheprob-lemofexternalvalidityaltogether,asoftenthisisnotpossibleoritistoocostly.
Second, the analytical approach relies on testable theories about cause andeffectrelationshipstoimposeglobalshaperestrictions,suchasinvariance,linear-ity,ormonotonicityoftheestimateddensity.Forexample,todealwithextrapola-tiononecouldrelyontheoreticallymotivatedassumptionsregardinginvariance(e.g.,P(y|x)=P'(y|x'),x≠x').Here,weassumethatthenewunits,despitebeingoffthesupportofP(y|x),aresufficientlysimilarintheoreticallyrelevantaspectstotheNexperimentalsubjectsthatwecanpredicttheiroutcomesusingtheesti-mated density. Gelinas’s (2006) criticism of the New York replication, above,essentiallyquestionsthisinvarianceassumption,byarguingthatNewYorkCityisnotatallsimilartoMexico.6
Alternatively,wemayassumelinearityormonotonicity,whichallowustoadjustforrelevantdifferencesacrossunits,settings,andsoon.Therelevantassumptioninthiscaseisthatthemodeliswellspecifiedforcasesonandoffthesupportandthat relevant moderators have been incorporated as factors into the originalexperiment.Accordingly,tomakeanex-postprediction,wecollectdataontherelevantcovariatesspecifiedbyourtheoryfortheunitswewanttomakeapre-dictionaboutandfeedtheseinputsintothemodeltogetavectorofpredictionsastheoutput.Ifthemodelpredictswelloutofsample,wemaygainfurthercon-fidenceintheassumptionthatitiswellspecifiedforsomeuniverseofcases,andsowemayhavemoreconfidenceinitspredictionsasmoreandmoresuccessfulreplicationsaccumulate.
External validity as subjective
Nowthatwehave,itishoped,justifiedsomeassumptions andsettledonapre-dictortoinformourbets,itremainsforustoagreeonacriterionthatdetermines
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 137
whichbetswin.Thatis,whatcriteriondeterminesexternalvalidity,or“whetheracausalrelationshipholdsovervariationintreatments,outcomemeasures,units,andsettings”?
Bywhatcriteriadowejudgethesuccessorfailureofexternalvalidity?OnecriterionistodeclaretheexperimentsuccessfuliftheconditionalestimatesareidenticaltotheonesinMexico.However,thiswouldyieldfewifanysuccesses.Even if theunderlyingdata-generatingprocesseswerethesameandwecouldconditionawaydifferencesbetweenexperiments,samplingvariabilitywouldstillmakeexactmatchesbetweenestimatedparametershighlyunlikely.
To fix ideas, suppose large enough samples allowed us to ignore samplingvariabilityaltogether.Wouldamoderatedifferencebetweentheestimatedparam-etersquestiontheexternalvalidityoftheinferencesfromProgresa?Andwhatif,despite thismoderatedifference, theestimatestillpreserves thedirectionandpracticalrelevanceof thecausaleffectacrossbothsettings?Doestheoriginalpredictionremainexternallyvalid?And,ifnot,whatexactlydowemeanbytheexternalvalidityofacausalinference?
Shadish,Cook,andCampbell(2002,34)interpretvalidityas“theapproximatetruthofaninference”yet,totheextentthatweneverreallygettoobservetruth,thisishighlyproblematic.Besides,evenifwedid,westillneedtodealwiththeissueof“moderate”differencesintrueparametersas,formostpracticalpurposes,exactmatchesarenotwhatweareafter.Instead,weproposeamorepragmaticinterpretationofvalidity,onebuiltonthepillarsofpredictionandstatisticaldeci-siontheory.
Underthepragmaticcriterionwepropose,ourconcernwithexternalvaliditystems from the need to decide whether or not to implement a program likeProgresa,say,inNewYork.Thatdecisionisofteninthehandsofapoliticianorhigh-levelbureaucrat.Ifso,thepreferences,opportunities,andconstraintsofthedecision-maker should enter into the ex-post assessment of the validity of theinference(GrangerandMachina2006).Consequently,toafirstapproximation,itisdecision-makersthatdeterminethecriteriaforsuccess.7
Althoughdifferentdecision-makersarelikelytohavedifferentlossfunctions,andhenceevaluateexternalvaliditydifferently,itisnotacaseofanythinggoes.Ataminimum,anexternallyvalidextrapolationofacausalrelationshouldbeonethat,oncerealized,resultsinacausaleffectthatisinthesamedirectionastheprediction.Inaddition,thesizeofthecausaleffectmustremainpracticallysignificantatsomepredeterminedlevel.Thatis,thedecision-makermustevalu-atetheoutcomeinafashionconsistentwithhisex-antelossfunction.Animplica-tionofthisunderstanding,thoughonenotpursuedinthisarticle,istheneedforbetteruseofstatisticaldecisiontheoryinpoliticalscience.8
Torecap,externalvalidityisproblematicbecauseitinvolvesextrapolation,thatis,makingpredictionsoffthesupportoftheestimateddensity.Whethertheinfer-encesregardingtheapplicabilityoftheparametersofthatdensityoffthesupportareexternallyvalidornotwillthereforedependontheaccuracyofitspredictionsoutofsample—thatis,acrosscombinationsoftreatments,outcomemeasures,units,andsettingsnotintheoriginalsample.Theaccuracyoftheseforecasts,inturn,
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
138 THEANNALSOFTHEAMERICANACADEMY
dependsonavectorofex-postpredictionerrorsanda lossfunctionevaluatingthesethatisspecifictothedecision-maker(Manski2008).Accordingly,weareopentotheideathatthevalidityofaninferencelies,withintheboundsdefinedabove,intheeyeofthebeholder.Externalvalidityisamatterofdegree.
TheAnalyticalApproachtoExternalValidity
Intheprevioussection,whiledefiningwhatwemeanbyexternalvalidity,wesuggestedtheideathattheorycouldbeusedtojustifyglobalshaperestrictionstomakeextrapolations.Inthissection,weexpandonpreciselythetypeoftheo-rizingneededtojustifysuchrestrictions.Wedosoinfourparts.First,weprovidesome examples of why general knowledge is, indeed, theoretical knowledge,explaining how theories about constructs, causal mechanisms, and complianceandselectionarekey forexternalvalidity.Second,weprovidesomeexamples.Third,weconsidersomeprosandconsthatmightbemadeagainsttheanalyticalapproach.Fourth,wediscusswhatisandisnotnewinthisscheme.Toavoidalong digression, for the instant purposes we will define theories as falsifiablestatementsaboutcausalrelationshipsbetweenclassesofevents.Thesetheoriesmaybecomplexandmicro-founded,ormoresimpleandaggregate;eitherway,theyoughttoexplaincausalregularitiesratherthanindividualparticulars.
Theories about constructs, causal mechanisms, and compliance and selection
Theorizing for generalizability comes at three levels: theories about con-structs, theories about causalmechanisms, and theories about complianceandselection.First,theoriesaboutconstructsareessentialinordertobeabletotalkmeaningfullyabouttheresultsofexperiments.Forexample,Wantchekon(2008)studiestheimpactofinformedcampaignsonvotingbehavior.Yet,beforewecangeneralizehis results fromtheparticular implementation,weneed tobeclearwhat the theoretically relevant attributes of a campaign are that make it aninformedcampaign, the ideabeing thatcampaignssharing theseessential fea-tureswouldhavethesamecausaleffectceterisparibus.
Second,wecanspecifyatheoreticalcausalmechanism,onewherebothmedi-atorandmoderatorvariablesareconjectured,measured,andtested,andwhatwebelieveareirrelevanciesareleftoutexante(unlessthebudgetallowsformoretesting).Withtheparametersofthisfullyspecifiedmodelathand,wecanthenmeasurethelevelandprevalenceofsuchmoderatorsandmediatorsinthenewtargetpopulationofinterest,andusethemasinputsintothemodeltopredicttheaverage treatmenteffect in thatpopulation.Anobviousapplicationof this is instudiesofthegeneticbasisofdiseaseandtheirinteractionwithpotentialremedies.
Third,mostsocial scienceexperimentsare,explicitlyor implicitly,encour-agement designs (Horiuchi, Imai, and Taniguchi 2007). As such, treatmentassignmentsdonotguaranteecompliance.Totheextentthatcompliancerates
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 139
varyacrosstreatments,outcomes,units,andsettings,sowilltheaveragecausaleffect.Toprovidebetterpredictionsonecouldthenproceedasabove,modelingthepopulationbycharacterizing itasacombinationofcompliers,defiers,andnever-andalways-takers(Angrist,Imbens,andRubin1996),accordingtotheirindividual attributes; then using this information to predict the proportion ofcompliersandothers in thenewtargetpopulation;and, finally,computing theimpliedaveragetreatmenteffect(orothereffectofchoice)inthetargetpopula-tionusingthepredictedproportions(seeFrangakis,Rubin,andZhou[2002]andHoriuchi,Imai,andTaniguchi[2007]forspecificapplications).
In practice, all three issues—construct validity; systematic differences inlevels of mediators and moderators; and systematic differences in shares ofcompliers,defiers,andnever-andalways-takers—will impactourestimatesofaveragetreatmenteffects,andsoallthreeneedtobeconsideredsimultaneously.Accordingly,whatweultimatelyneedaregoodtheoreticalmodelsof the latentconstruct, the causal mechanism, and selection into treatment—what are com-monly referred to as structural equation models.9 Such models embody globalshaperestrictionsthat,ifcorrect,arewhatallowustomakegoodpredictionsoutofsample.However,ourpurposeisnottoargueforareturntothelargestruc-turalequationmodelsofthetypeoncesponsoredbytheCowlesCommissionforResearchinEconomics,say,butsimplytonotethatconceptualizingsuchmodelsmay help in the design of research programs, even if individual experimentsremainmuchlesscomplicated.
Theory, research programs, and individual experiments
Inpractice,mostexperimentsshouldnotattempttoestimatesuchcomplexstructural models, nor do they need the backing of a fully specified theory.Indeed,intheearlystagesofaresearchprogramconcernsaboutexternalvalidityareoftensecondarytoconstructorinternalvalidity.Rather,ourargumentisthat,totheextentthatexternalvalidityisdesiredinmatureresearchprograms,thereare more efficient ways of achieving it than testing large structural models orblindlyreplicating,asintherobustnessapproach.Rather,experimentsoughttoquitedeliberatelytesttheappropriatenessofassumedglobalshaperestrictions.
Forexample,Mook(1983)citestheexampleofEkmanandFriesen(1971),whoaskedwhetherrecognitionofemotionalfacialexpressionsdependedonculture.Ratherthandoinnumerablereplicationsacrossallpossiblecultures,theytheo-rizedthatiffacialexpressionwereinterpretedsimilarlyacrosscultures,thenthismustbe trueacross themostdistantcultures.Hence, they“stress tested” thetheorybycomparingAmericanstothemostdistantculturetheycouldthinkof,theForeofPapuaNewGuinea.Thefindingthattheybothrecognizehappinessineachothersuggeststheuniversalityofemotionalexpression.
Similarly,goingbacktotheMexico–NewYorkexampleabove,ifwethinkthatahistoryofsubstanceabuseisasignificantmoderatoroftheeffectofCCTs,say,thenwemaydesignanexperimentthatstressteststhisaspect.Despiteitsnarrowfocus,thefactthatthisexperimentisembeddedinalargerresearchprojectmay
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
140 THEANNALSOFTHEAMERICANACADEMY
informusgreatly about theexternal validityofpredictionsbasedonProgresa,insofarasitmayallowustoconditionourexpectationsontheprevalenceofsub-stanceabuseintheNewYorktargetpopulation.
AnotherexampleistheBeninelectoralexperiments(seeWantchekon2003,2008).Oneofthefindingsofthe2001experimentisthatvotersaremorelikelytoreactpositivelytoa“publicgoods”messagewhenitcomesfromaco-ethniccandidate. A possible explanation of this result is that voters trust a candidatefromtheirethnicgroupmorethanthey trustacandidate fromanothergroup.Thismeansthatthemediatingvariablebetweenethnictiesandvoteistrust,orthecredibilityofthecandidate.Bytestinginthecontextofthefollowingexperi-mentin2006therelationshipbetweencredibilityofcandidatesandvotingbehav-ior,Wantchekon(2008)improvedtheexternalvalidityoftheresultsofthe2001experiment.
Pros and cons of the analytical approach
Theanalyticalapproachtoexternalvalidityweareproposingmaybeprefer-ableforatleastthreereasons.First,thequestionofexternalvalidityislargelyatheoreticalone.ResearchonexternalvalidityasksnotjustwhetheraCCTpro-gramwillwork inNewYork, say,butwhyorwhynot.That is, it demands anexplicationintermsofacausalmechanismorrelevantdifferencesbetweenunitsinandoutof sample. Indeed, forpolicyanalysis,aswellas for thepurposeofscientific advancement through comparative research, the question of where,underwhatconditions,andamongwhichsubpopulationsatreatmentislikelytoworkisoftenasinterestingaswhetherornotthetreatmentworkedinthefirstplace(Heckman2005).Assuch,theanalyticalapproachisofinterestinandofitself.
Second, answering these questions may provide us with tighter bounds onfuturepredictionsoutofsample,andincreaseourconfidenceinthemaintainedshape restrictions. Moreover, a program of research that focuses on externalvaliditywillsubjecttheoriestothestrongestpossibletests,out-of-sampletests,therebyofferingthepotentialforlargeupdatesinourpriors.
Third,weconjecturethattheoreticallydrivenresearchprogramswillpermitus to design experimental replications for optimal learning—or, to paraphraseMilton Friedman, to come up with sequences of experiments that explain asmuchaspossibleintheshortestpossiblesequence.
Thisbeingsaid,theanalyticalapproachisnotfoolproof.Onecouldarguethatthe external validity criticism applies just as readily to a more fully specifiedcausalmechanism,construct,andselectionprocessthantoasimpleone.Afterall,moderatorormediatorvariablesinonesettingmaybedifferent,orhavedif-ferentimpacts, innewsettings,populations,treatments,oroutcomemeasures,andsoextrapolation isnotpossible.Forexample,household incomemaybeamoderatoroftheeffectsofProgresainMexicobutnotinNewYork,perhapsduetosomeunobservedinteractingvariable.Allthisiscertainlypossible,butthereareatleastthreepowerfulrejoinders.
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 141
First,wecantestthiscriticism.IfthemoderatorsarefoundtobehavesimilarlyinMexicoandNewYork,thenwehavemoreconfidenceinprojectingtreatmenteffectstoLosAngelesandKualaLumpur,say,thanifwehadreplicatedwithoutatheory.Thelatterdoesnotallowustomakeaccuratepredictionsbecausewearenotusingalltheavailableinformationefficiently,byconditioningontherelevantmoderatorsandmediators.Asaresult,robustnessreplicationyieldsmoreuncer-tainpredictionsandrepresentsavery inefficientwayofcumulatingknowledge.Ourviewisthattheoristsexplainandempiricistscondition;doingneithergreatlylimitsexternalvalidity.
Second,thisisareductio ad absurdumcriticism.Itsimplynegatesthewholebasisofcomparativeresearch.Ifcausalandselectionmechanismsaredifferentacross all variations in settings, populations, treatments, or outcome, then wehavenobasisforgeneralizations.Withouttheorizing,eachconstruct,treatment,outcomemeasure,unit,andsettingisunique—thepossibilityofscientificlearn-ingdenied.Thismaywellbethecase,butitisatestableproposition.
Third,theburdenofproofisongeneralizabilityskepticstoproposewhytheythinktheinferenceisnotgeneralizable.Indeed,thisiswhatRosenbaum(2002,9)callstangiblecriticism:“aspecificandplausiblealternativeinterpretationoftheavailabledata;indeedatangiblecriticismisitselfascientifictheory,itselfcapableofempiricalinvestigation.”Incontrast,dismissivecriticism“restsontheauthor-ity of the critic and is so broad and vague that its claims cannot be studiedempirically.”10
Tosumup,thinkofthereplicationprocessasanoptimallearningproblem.Ourcontentionisthattheanalyticalapproachwillrequirefeweriterationstoachieveagivenuncertaintytolerance(aboutexternalvalidity)thantherobustnessoneor,alternatively,reduceuncertaintyfurtherforanygivennumberofreplications.AsGualaandMittone(2005,499)state,“externalvalidity isanimportant issueforexperimentersandtheoristsalike”(emphasisinoriginal).
The difference between the robustness and analytical approaches somewhatresemblesthedifferencebetweenactiveandpassivelearning(Castroetal.2008)ortheproblemsfacedindynamiccontrolprogramming.Asthatliteratureimplies,therearetrade-offstobemadebetweentheanalyticalandrobustnessapproaches.Forexample, testingsometheorieswill require factorialdesigns,which in turnmayinvolvelargersamples.Yetthismaystillbecheaperthantestinghypothesesseparatelyand,totheextentthattheygeneratesensiblefindings,maysubstantiallyimproveourpredictionsoutofsample.
Old wine in new bottles?
Theoryalreadypermeateseverythingwedo,fromthequestionsweask,tothedatawecollect,tohowwedefineourconcepts.11Forexample,takeoneofthemostsuccessfulandcomprehensiveexperimentalresearchprogramsinpoliticalscience, the series of field experiments on voter mobilization spawned by theworkofGerberandGreen (2000)and reviewed inGreenandGerber (2008).Theuseoftheoryisclearinthechoiceoftreatments:researchershavestudied
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
142 THEANNALSOFTHEAMERICANACADEMY
whethermassmailing,door-to-doorcanvassing, and soon impact turnout,butnottheimpactofmailingsprintedwithTimesNewRomanversusArialfont,aswehavenotheoreticalbasisforpresumingthesemightbecausallyrelevant.
The fact is,most replicationsarealready theorydriven,explicitlyornot.Forexample,thefindingbyGerberandGreen(2000)thatphonecallsareineffectiveingettingoutthevoterelativetopersonalface-to-facecontactmotivatedNickerson(2006) to testwhether thiswasdue to the face-to-facecomponentor theextrapersonalattentionassociatedwithdoor-to-doorcanvassing.Hefindsthatthequal-ityofthephonecallsmatterandthatbrief,nonpartisanphonecallscanraisevoterturnoutiftheyaresufficientlypersonal.Wecanusethisfindingtohelpuspredicthowothermeansofcommunicationmayfareconditionalonthedegreeofper-sonal attention provided. Also, at times, these replications shed light on unex-pectedresults,motivatingnewtheoryandfurtherexperimentation(Gerber2004).
If theory already permeates many of our experimental research programs,whatisourpoint?First,wewanttodispelcriticismsthatindividualexperimentshavelittleornoexternalvalidity:solongastheycontributetogeneraltheoriesofvoterbehavior,say,experimentsmayexpandtherangeofpredictionsignificantly.Moreover,wecantestexternalvalidityclaimsusingexperimentsstrongoninter-nalvalidity.Second,fortheirpart,experimentersoughttomakemoreexplicitthetheoretical context of their experiments, even if it is only an enumeration ofpotentialcausalpathwayslackinginformalisms.Giventhatexperimentalreplica-tionishighlydecentralized,itisuptoeachreplicationtodoitspartinexpandingtheexternalvalidityofthewhole.
WhenAreConcernsaboutExternalValidityJustified?PredictionversusUnderstanding
Webeganthisarticlebynotingthepolicyrelevanceofexternalvalidityques-tions,inparticularbecauseoftheperceivedneedtomakepredictionsaboutcausaleffectsoutof sample.But ispredictive success thehallmarkof a good theory?Whataboutexplanationandunderstanding?Andwhenshouldexperimentalistsfocusonexternalvalidity?Besides,whatabouttheideathatthebestpredictorsareoftenatheoreticalassociationsàlaSims(1980)?
Predictionandunderstandingarecloselyrelated.YetanexampleduetoRubin(1996, 475) perhaps best highlights their differences: suppose an unfair coinyieldsheadswithprobability.6.Amodelthatpredictsheadswithprobability1willgetitright60percentofthetime,onepredictingheadswithprobabilityof.6willgetitright52percentofthetime.Onthebasisofpredictivesuccess,wewould be tempted to choose the wrong model, even if it cannot explain theobservedsequenceoftossesand,inparticular,the40percentoftails.Sopredic-tioncannotbeallthatthereistoit.However,ourpointisnotthatexternalvalid-ityoughttobetheonlygoalofscience;rather,ourclaimisthatthebestwaytotestexternalvalidityistotestpredictionsoutofsample.Differentgoalsrequiredifferenttests.
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 143
Forexample,inawidelycitedarticle,Mook(1983)criticizedthepreferencethenprevalent inpsychology forexperimentswith strongexternalvalidity.Hisargumentreliedonadistinctionbetweenpredictionandunderstanding,or,asheputit,betweentwomodesofresearch:theanalogueandtheanalyticalmodels.In theanaloguemodel, theobjective is tomodel therealworld forpredictionpurposes.Thus,thevariablesthataccountforthemostreal-worldvariancearethemostimportant,sincetheyaretheonesthatspeakmostdirectlytoourabilitytomakepredictionsaboutcausalrelations.12Intheanalyticmodelofresearch,bycontrast,theobjectiveistounderstandtheworkingsofasystem,totesttheinter-nalvalidityoftheories.Thesetheoriesmayapplytoreallife,butthereisnoattemptatgeneralizationatthispoint.
Inasimilarvein,Przeworski (2007) is interested,notonly in theeffectsofcauses,butalsointhecausesofeffects:thelistoffactorsX1,X2,...Xnthatexplaintheobserved(innature)outcomeY,saylungcancerprevalence.13Przeworski’spointisthatweoftendesiretounderstandnotonlywhatthepotentialcausesofYare,butalsowhattheactualcausesofYareinaparticularpopulationandhowthesecomeabout.DemonstratingthatXcancausechangesinY isanecessarybutinsufficientconditionfortheinferencethatXexplainsY,orthatXcanbeusedto manipulate changes in Y in any particular target population in any period.TranslatedtoMook’slanguage,theanalyticalmodeofresearch(ortheeffectsofcauses)asks,“CouldXhavecausedY?”whereastheanaloguemodeofresearch(orthecausesofeffects)asks,“DoesXtypicallycauseY?”
Note thatour answer to thesequestionsmay significantly influencehowweanswertheircorollary,“CanXbemanipulatedsoastochangeYinadesireddirec-tion?” This is important for two reasons: first, because by knowing the actualcausesofadisease,say,wemightbebetterabletodesignpreventionmeasures;and,second,becausecausesthatareeffectiveinthelabmaynotbeefficaciousinthefield.
Forexample,taketheefficacyofbednetsinpreventingmalaria.Undercon-trol conditionsbednetshavebeen shown tobehighly effective inpreventingmalaria:householdsrandomly“treated”withbednetsexperienceareductioninmalaria incidence relative to households randomly allocated to control condi-tions.Thesecontrolledexperimentsidentifythe“effectsofcauses”—inthiscase,bednetusereducesmalariaincidence.Basedonthisevidence,numerouspro-gramshavebeen implemented that freelydistributebednets in areasofhighmalariaincidence.Andyet,todate,thejuryisstilloutastotheeffectivenessoftheseinterventionsinreducingmalariaincidenceinthetreatedareas.Why?Oneanswerislackofcompliance—providingafreebednetdoesnotimplythatitwillbeusedappropriately.
ThisexampleillustratesthepointthatjustbecauseXcanbeshowntocausechangesinY,itdoesnotfollowthatitexplainsanyoftheobservedvarianceinYintherealworldnor,indeed,thatitcanbeaneffectivecauseintherealworld(aproblemofexternalvalidity),wherewemaylacksufficientcontroltoensurefullcompliance.Inotherwords,thereareotherfactors(potentiallyunobserved)thatmoderatethecausaleffectinrealapplications.InMook’sterminology,wegain
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
144 THEANNALSOFTHEAMERICANACADEMY
understandingofpotentialcausesforreducingmalaria,butwemaystillendupwithbadpolicypredictions.
Understandingwhattheactualcausesofsomephenomenonareshouldinformour predictions and research. This is what motivated psychologist EgonBrunswicktoadvocate“representativedesigns”where,inparticular,levelsofthecausalvariableinquestionwouldbechosenaccordingtotheirconditionaldistri-butioninthenaturalworld(AlbrightandMalloy2000).
Accordingly,theideathatdistributionoffreebednetswillsomehowgeneratethetreatedcounterfactualobservedincontrolledtrialsmayappearfar-fetched.Forallweknow,thegivendistributionofbednetsinanycountryisanequilibrium—everyonewhowantsonehasone—soreducingtheirpricetozeromayhaveatinyeffect in that, at some point, there is no longer a binding budget constraint(Gelinas’s[2006]pointabove).Infact,itisentirelypossiblethatthereissimplynoway togenerate theexperimentalcounterfactual in the fieldwithoutat thesametimechangingthelevelsofother(potentiallyunobserved)covariates.Perhapsmalariaeradicationrequiresaprocessofmodernizationthatincludesimprovededucationastothepathogenicnatureofthedisease,betterdrainage,urbaniza-tion, air-conditioning, better medical facilities, and so on—which, by the way,speakstotheimportantdevelopmentliteratureonthesequencingofreformsandtheneedfortheoriesofchange.
Linkingexternalvalidity to thecapacity tomakereasonablecausalpredictionsoutofsampleinnowayunderminestheimportanceoftheorytotheenterprise,northevalueofunderstanding.Predictionandunderstandingmayhaveslightlydiffer-entgoals,but,ultimately,agoodmeasureofusefulknowledgeisatestoftheexter-nalvalidityofitspredictions.Inaddition,soundunderstandingoftheactualcausesofeffectsmayhelpustheorizeaboutcausalmechanismandoptimalinterventions.
Conclusion
Inthisarticle,wearguethatclaimstoexternalvalidityofrandomizedfieldexperimentsarestrongerwhentheoreticalconnectionsbetweenexperimentsareestablished and tested. In an experiment that establishes a causal relationshipbetweenthetwovariables(thetreatmentandanoutcomeofinterest)underasetofconditions,wecanimproveexternalvalidityinatleasttwoways:(1)byreplicat-ingtherelationshipbetweenthetwovariablesundernewconditions(therobust-ness approach) or (2) by establishing that the relationship is mediated ormoderatedbythesetofvariables—thatis,theanalyticalapproach.
Webelievetheanalyticalapproachmayturnouttobemoreeffectivethantherobustnessapproach.Thisisbecausethemediatorislikelytorepresentalargersetofexperimentalconditions.Werecommendthat follow-upexperimentsbe primarily focused on testing the theoretical argument of original experi-ments, instead of simply replicating them in a different context. If externalvalidityistheAchilles’heelofrandomizedexperiments,thentestingmechanismsunderlyingalreadyestablishedcausalrelationshipsshouldbethetoppriorityoftheexperimentalresearch.
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 145
Notes 1.“[Conditionalcashtransfer]programsprovidemonetaryincentivestohouseholdslivinginpoverty
whentheycompleteactivitiesaimedatincreasinghumancapitaldevelopmentandbreakingthecycleofpoverty.” From http://www.nyc.gov/html/ceo/html/programs/opportunity_nyc.shtml (accessed February14,2009).ForanoverviewofOpportunityNYCseedeSáeSilva(2008).
2.RockefellerFoundation,http://www.rockfound.org/efforts/nycof/opportunity_nyc.shtml(accessedFebruary14,2009).The2002successorprogramtoProgresaiscalledOportunidades.
3.RockefellerFoundation,ibid. 4.Onthedefinitionofmoderatorandmediator,seeBaronandKenny(1986).Therearenumerous
ways to labelvariables in thecontextofacausalmechanism,yet “conceptually,moderators identifyonwhomandunderwhatcircumstancestreatmentshavedifferenteffects.Mediatorsidentifywhyandhowtreatmentshaveeffects”(Kraemeretal.2002,877).Theformermaybeunderstoodasinteractioneffects,whereasthelatteridentifypossiblemechanismsbywhichthetreatmentcomestohaveitseffect.Onthecomplexitiesoftestingmediatoreffects,seeBullock,Green,andHa(2008).
5.Letyidenotetestscoresforpupili,Tidenotewhetherhewasrandomlyassignedtoreceiveschoolvouchersornot,xibeasetofcovariatesstronglycorrelatedwithy,andeibearandom(iid)errortermsuchthatwemaywriteyi=a+b0Ti+xib+ei,i=1,2,...N.WeareinterestedinJY/JT,orb0.Accordingly,thequestionofwhetherProgresawillworkinNYCasks, loosely,whethertheestimatedb0 intheNYCsample will be statistically significant and of same sign and magnitude as in Mexico. We may test thisdirectlywith theNYCprogrambecause it is randomized.If, instead,everyonegot treated inNYC,wewouldhaveonlyexanteandexpostmeasureswithnoexpostcontrols.Althoughnotideal,wecouldusepre-testdatatopredictoutcomesintheabsenceoftreatmentbygeneratingapredictedcounterfactual.Finally,notethatitispossibletohaveJY/JT>0andyetDY=0ifsomeothereventactedintheoppositedirection(sayateacherstrike).Thisiswhy,intheabsenceofrandomizedreplication,itiscrucialtohavegoodpredictorsofY,asourinferenceswilldependonthemodelforthecounterfactualbeingcorrectlyspecified.
6.Accordingtoher,thepoorinNewYorkhaveaccesstofreeeducationandtheirchildrenareidle,featuresthatdonotrhymewiththeMexicanstoryofbindingbudgetconstraintskeepingMexicanchildrenatworkandawayfromschool.
7.Atthisstage,forsimplicity,weignoretheissueofwhothedecision-makeris,whethersheisahigh-levelbureaucrat,apolitician,a setofvoters, someabstractwelfaremaximizer,or, indeed,a researcherevaluatingatheory.Wewillsimplypostulateadecision-maker,typicallyapolitician,andleaveitatthat,asthe relevant person may differ between applications. We are aware this renders scientific knowledgesomewhatsubjective,butthenagain,alongliteratureinthephilosophyofsciencequestionsthepossibilityofobjectivescience.
8.SeeGerber,Green,andKaplan(2002)forarareexceptioninpoliticalscience,andBerger(1985)and Manski (2008) for a more general discussion. This has three important corollaries, especially forpolicy-motivatedexperiments,thatoftengounappreciated.First,onlyifthedecision-makerisplanningtorepeat the experiment elsewhere will she care about its conventional statistical significance over andbeyond itspractical significance.Second,having awell-specified loss function allows the researcher tomoveawayfromsimplepointestimation,relaxingsomeassumptionsandthesomewhatexaggeratedobses-sionwithbias(seeRosenbaum[2002]foradiscussionofsensitivityanalysisandManski[2008]forintervalestimation). As such, the unbiased estimation made possible by randomized experiments may only beneededwhenevergoodenoughpriorsonthebiasarelacking,preventingusfromadjustingobservationalestimatestocorrectforthebias(Gerber,Green,andKaplan2002).Third,weneedmoreempiricaldataonthekindofdecisioncriteriausedbypolicymakers:Bayes,Maximin,orMinimax-regret,say.Thatis,onthebasisofwhatcriteriadopolicymakerschoosepolicyexperiments?Laboratoryexperimentsonhigh-levelofficialsmayhelprevealthese.
9.ThisisverymuchinlinewiththeargumentmadeinHeckman(2008).10.Experimentersoughttomaketheexperimentassoundaspossibletobeginwith,bystatingpoten-
tialmediators,say,and,wheneverthebudgetallows,testingthem.Ourpointisthatinordertohelpdis-coverwhetheradesignisindeedflawed,thecriticneedstospecifywhyhethinksthedesignisfaultyinthe first place. Off-the-cuff criticism of the sort “experiments have no external validity” are, on thisaccount,unhelpfulandoftendismissive.
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
146 THEANNALSOFTHEAMERICANACADEMY
11.See,interalia,McDermott(2002);Druckmanetal.(2006);GualaandMittone(2005);LevittandList (2007);Lucas (2003);Lynch (1999);Moffitt (2004);Shadish,Cook, andCampbell (2002);Schram(2005).
12.Wedistinguishbetweenmakingcausalpredictionsofthetype“achangeinXwillcauseachangeinY ofD percent,” say, versus simply forecastingY, say, although to reduce thevarianceofour causalpredictions,goodforecastsofYareoftenwelcome.
13.Confusingly,MahoneyandGoertz(2006)interpret“causesofeffects”asexplainingindividualcases,e.g.,whatcausedJoeDoetogetlungcancer.This,inourview,isnotthestandardinterpretation.WeadheretoPrzeworski’sandHeckman’sinterpretation.
ReferencesAlbright,Linda,andThomasE.Malloy.2000.Experimentalvalidity:Brunswik,Campbell,Cronbach,and
enduringissues.Review of General Psychology4(4):337-53.Angrist,JoshuaD.,GuidoW.Imbens,andDonaldB.Rubin.1996.Identificationofcausaleffectsusing
instrumentalvariables.Journal of the American Statistical Association91(434):444-55.Baron, Reuben M., and David A. Kenny. 1986. The moderator-mediator variable distinction in social
psychologicalresearch:Conceptual,strategic,andstatisticalconsiderations.Journal of Personality and Social Psychology51(6):1173-82.
Berger,JamesO.1985.Statistical decision theory and Bayesian analysis.2nded.NewYork:Springer.Bullock,JohnG.,DonaldP.Green,andShangE.Ha.2008.Experimentalapproachestomediation:Anew
guide for assessing causal pathways. Unpublished manuscript, Yale University. Accessed online athttp://www.ipeg.org.uk/events/field_experiments/Documents/A%20Critique%20of%20Conventional%20Mediation%20Analyses%20--%20Bullock%20Green%20and%20Ha.pdf
Castro,RuiM.,CharlesKalish,RobertNowak,RuichenQian,TimothyJ.Rogers,andXiaojinZhu.2008.Humanactivelearning.Technicalreport.ColumbiaUniversity,NewYork.
Das,Jishnu,Quy-ToanDo,andBerkOzler.2005.Reassessingconditionalcashtransferprograms.World Bank Research Observer20(1):57-80.
deSáeSilva,MichelleMorais.2008.OpportunityNYC:Aperformance-basedconditionalcashtransferprogramme. A qualitative analysis. Working Paper 49, International Poverty Centre, Brasilia, andColumbiaUniversity,NewYork.
Druckman,JamesN.,DonaldP.Green,JamesH.Kuklinski,andArthurLupia.2006.Thegrowthanddevel-opmentofexperimentalresearchinpoliticalscience.American Political Science Review100(4):627-35.
Ekman,Paul,andWallaceV.Friesen.1971.Constantsacrossculturesinthefaceandemotion.Journal of Personality and Social Psychology17:124-29.
Frangakis, Constantine E., Donald B. Rubin, and Xiao-Hua Zhou. 2002. Clustered encouragementdesignwithindividualnoncompliance:Bayesianinferenceandapplicationtoadvancedirectiveforms.Biostatistics3:147-64.
Gelinas,Nicole.2006.NewYork isn’tMexico.City Journal.Accessed fromhttp://www.city-journal.org/html/eon2006-10-20ng.html.
Gerber,AlanS.2004.Doescampaignspendingwork?Fieldexperimentsprovideevidenceandsuggestnewtheory.American Behavioral Scientist47(5):541-74.
Gerber,AlanS.,andDonaldP.Green.2000.Theeffectsofcanvassing,telephonecalls,anddirectmailonvoterturnout:Afieldexperiment.American Political Science Review94(3):653-63.
Gerber,AlanS.,DonaldP.Green,andEdwardH.Kaplan.2002.Theillusionoflearningfromobservationalresearch.InstitutionforSocialandPolicyStudiesWorkingPaper,YaleUniversity,NewHaven,CT.
Granger,CliveW.J.,andMarkJ.Machina.2006.Forecastinganddecisiontheory.InHandbook of economic forecasting,vol.1,81-98.Amsterdam:Elsevier.
Green,DonaldP.,andAlanS.Gerber.2008.Get out the vote: How to increase voter turnout.2nded.Washington,DC:BrookingsInstitution.
Guala,Francesco,andLuigiMittone.2005.Experimentsineconomics:Externalvalidityandtherobustnessofphenomena.Journal of Economic Methodology12(4):495-515.
Heckman,JamesJ.2005.Thescientificmodelofcausality.Sociological Methodology35(1):1-98.
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from
THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 147
Heckman,JamesJ.2008.Econometriccausality.SocialScienceResearchNetworkElectronicLibrary.Horiuchi, Yusaku, Kosuke Imai, and Naoko Taniguchi. 2007. Designing and analyzing randomized
experiments:ApplicationtoaJapaneseelectionsurveyexperiment.American Journal of Political Science51(3):669-87.
Kraemer,HelenaC.,G.TerenceWilson,ChristopherG.Fairburn,andStewartW.Agras.2002.Mediatorsandmoderatorsoftreatmenteffectsinrandomizedclinicaltrials.Archives of General Psychiatry59(10):877-83.
Levitt,StevenD.,andJohnA.List.2007.Whatdolaboratoryexperimentsmeasuringsocialpreferencesrevealabouttherealworld?Journal ofEconomic Perspectives21(2):153-74.
Lucas,JeffreyW.2003.Theory-testing,generalization,andtheproblemofexternalvalidity.Sociological Theory21:236-53.
Lynch,JohnG.,Jr.1999.Theoryandexternalvalidity.Journal of the Academy of Marketing Science27(3):367-76.
Mahoney,James,andGaryGoertz.2006.Ataleoftwocultures:Contrastingquantitativeandqualitativeresearch.Political Analysis14(3):227-49.
Manski,CharlesF.2008.Identification for prediction and decision.Cambridge,MA:HarvardUniversityPress.
McDermott,Rose.2002.Experimentalmethodologyinpoliticalscience.Political Analysis10(4):325-42.Moffitt,RobertA.2004.Theroleofrandomizedfieldtrialsinsocialscienceresearch:Aperspectivefrom
evaluationsofreformsofsocialwelfareprograms.American Behavioral Scientist47(5):506-40.Mook,DouglasG.1983.Indefenseofexternalinvalidity.American Psychologist38(4):379-87.Nickerson,DavidW.2006.Volunteerphonecallscanincreaseturnout:Evidencefromeightfieldexperi-
ments.American Politics Research34(3):271-92.Przeworski,Adam.2007.Isthescienceofcomparativepoliticspossible?InThe Oxford handbook of com
parative politics (Oxford handbooks of political science),ed.CarlesBoixandSusanC.Stokes,147-71.Oxford,UK:OxfordUniversityPress.
Rawlings, Laura B. 2005. Evaluating the impact of conditional cash transfer programs. World Bank Research Observer20(1):29-55.
Rosenbaum,PaulR.2002.Observational studies.NewYork:Springer.Rubin, Donald B. 1996. Multiple imputation after 18+ years. Journal of the American Statistical
Association 91(434):473-489.Schram,Arthur.2005.Artificiality:Thetensionbetweeninternalandexternalvalidityineconomicexper-
iments.Journal of Economic Methodology12:225-37.Shadish,WilliamR.,ThomasD.Cook,andDonaldT.Campbell.2002.Experimental and quasiexperimental
designs for generalized causal inference.2nded.NewYork:HoughtonMifflin.Sims,ChristopherA.1980.Macroeconomicsandreality.Econometrica48(1):1-48.Wantchekon,Leonard.2003.Clientelismandvotingbehavior:EvidencefromafieldexperimentinBenin.
World Politics55:399-422.Wantchekon, Leonard. 2008. Expert information, public deliberation and electoral support for good
governance: Experimental evidence from Benin. Mimeo. Paper presented at the Freeman SpogliInstituteforInternationalStudiesonDecember2nd2008,inPaloAlto,California.Accessedonlineat http://fsi.stanford.edu/events/expert_information_public_deliberation_and_electoral_support_for_good_governance_experimental_evidence_from_benin/
at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from