of Political and Social Science The ANNALS of the American ...

17
http://ann.sagepub.com/ of Political and Social Science The ANNALS of the American Academy http://ann.sagepub.com/content/628/1/132 The online version of this article can be found at: DOI: 10.1177/0002716209351519 2010 628: 132 The ANNALS of the American Academy of Political and Social Science Fernando Martel Garcia and Leonard Wantchekon Theory, External Validity, and Experimental Inference: Some Conjectures Published by: http://www.sagepublications.com On behalf of: American Academy of Political and Social Science can be found at: Science The ANNALS of the American Academy of Political and Social Additional services and information for http://ann.sagepub.com/cgi/alerts Email Alerts: http://ann.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://ann.sagepub.com/content/628/1/132.refs.html Citations: What is This? - Feb 23, 2010 Version of Record >> at PRINCETON UNIV LIBRARY on November 17, 2011 ann.sagepub.com Downloaded from

Transcript of of Political and Social Science The ANNALS of the American ...

http://ann.sagepub.com/of Political and Social Science

The ANNALS of the American Academy

http://ann.sagepub.com/content/628/1/132The online version of this article can be found at:

 DOI: 10.1177/0002716209351519

2010 628: 132The ANNALS of the American Academy of Political and Social ScienceFernando Martel Garcia and Leonard Wantchekon

Theory, External Validity, and Experimental Inference: Some Conjectures  

Published by:

http://www.sagepublications.com

On behalf of: 

  American Academy of Political and Social Science

can be found at:ScienceThe ANNALS of the American Academy of Political and SocialAdditional services and information for

    

  http://ann.sagepub.com/cgi/alertsEmail Alerts:

 

http://ann.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://ann.sagepub.com/content/628/1/132.refs.htmlCitations:  

What is This? 

- Feb 23, 2010Version of Record >>

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

132 ANNALS,AAPSS,628,March2010

Itisoftenarguedthatexperimentsarestrongoncausalidentification(internalvalidity)butweakongeneraliz-ability (externalvalidity).Onewidelyacceptedway tolimitthreatstoexternalvalidityistoincorporateasmuchvariationinthebackgroundconditionsandinthecovari-atesaspossiblethroughreplication.Anotherstrategyistomakethetheoreticalfoundationsoftheexperimentmoreexplicit.Thelatterrequiresthatwedeveloptrajec-toriesofexperimentsthatareconsistentwithatheoreti-calargument.Inotherwords,newexperimentsshouldnotsimplyconsistofchangingthecontextofoldones,butdosoinwaysthatexplicitlytestvariousaspectsofatheoryinacoherentway.

Keywords: causalinference;randomizedexperiments;externalvalidity

OnDecember17,2007,1,431poorfamiliesinNewYorkCityreceivedatotalof$740,000

aspartofanew,$53million conditionalcashtransfer (CCT) anti-poverty program calledOpportunity New York City.1 The design of

Theory,External

Validity,andExperimental

Inference:Some

Conjectures

ByFERNANDOMARTEL

GARCIAand

LEONARDWANTCHEKON

Fernando Martel Garcia ([email protected]) is a PhD student at New York University’s Wilf Family Depart­ment of Politics. His research focuses on accountability mechanisms in public service delivery and how research design and statistical techniques can help us not only uncover such causal mechanisms, but also test potential improvements, make out­of­sample forecasts, and ulti­mately, improve our theories. He is currently involved in various field experiments in Mexico. Prior to starting his PhD, Mr. Martel Garcia worked as a staff economist for the World Bank.

Leonard Wantchekon ([email protected]) is a professor of politics and economics at New York University. He taught at Yale University (1995–2000) and was a visiting fellow at the Center of International Studies at Princeton University (2000–01). He is the author of several articles on post–civil war democrati­zation, resource curse, electoral clientelism, and experi­mental methods, which have appeared in the AmericanPoliticalScienceReview, WorldPolitics, ComparativePoliticalStudies, QuarterlyJournalofEconomics, and Journal of Conflict Resolution. He is the founding director of the Institute for Empirical Research in Political Economy, which is based in Benin (West Africa) and at New York University.

DOI:10.1177/0002716209351519

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 133

OpportunityNYCwasinformedtoalargeextentbyProgresa,ahighlyregardedCCTprograminitiatedinMexico in1997asarandomizedfieldexperiment.2Accordingly,itisonlynaturaltoask,willtheNewYorkinitiativebeassuccessfulasitsMexicanpredecessor?

Thisisessentiallyaquestionofexternalvalidity,namely,thevalidityofinfer-ences about whether a causal relationship holds over variation in treatments,outcomemeasures,units,andsettings(Shadish,Cook,andCampbell2002,38).Assuch,itgoestotheheartofpolicy-motivatedresearch,withitsemphasisondeterminingwhatworksunderwhatconditions.Thatis,what,ifanything,canwesayaboutthecausaleffectofsimilarpoliciesunderdifferentcontexts?

Thelatterquestionwouldnotmatterwereitnotthatitarisesquiteofteninappliedwork.Indeed,inthepresentcase,thedifferencesbetweenOpportunityNYCandProgresaarejustasstrikingastheircommonpedigree.Tobeginwith,OpportunityNYCisthefirsttimeaCCTprogramisbeingtriedinalarge,wealthycity,whereasProgresastartedoutasaprogramfortheruralpoor.Second,theexperimentalunits(therelevantNewYorkhouseholds)aresignificantlyricherinabsoluteandrelativetermsthantheirMexicancounterparts.Indeed,povertyintheUnitedStatesmightwellbequitedifferentfrompovertyinruralMexico,intermsofcauses,consequences,andsolutions.Third,theinterventioninNewYorkisalsodifferent, inthat“[it] isthefirstprogramtoincludeasignificantwork-forceparticipationcomponentinadditiontothetraditionalhealthandeducationcomponents.”3

Givensignificantvariationintreatment,outcomemeasures,units,andsettings,somehavequestionedwhetherOpportunityNYCwillworkatall:

[NewYorkCityMayorMichael]Bloomberghasmisreadthepurposeofthird-worldcon-ditionalcash-transferprograms,andthushasmisreadtheirapplicabilitytoNewYork....InNewYork,unlikeinthethirdworld,poorparentsdon’thavetopaytosendtheirchil-drentoschool.Nordotheyfacethetoughchoiceofeducatingthekidsorhavingenoughmoneytoputfoodonthetableeverynight.(Gelinas2006)

Accordingtothiscritic,then,Progresaworkedbyrelaxingabindingbudgetcon-straintonpoorMexicanhouseholds,andbecausesuchaconstraintis,underthisinterpretation,notbindingamongpoorNewYorkers,OpportunityNYCwillprob-ablyfail.

Theuncertaintysurroundingthegeneralizabilityofcauseandeffectrelation-ships from field experiments such as Progresa questions not just MayorBloomberg’swisdom,buttheveryenterpriseofrandomizedexperimentationforpolicymaking.Experiments,itwouldseem,havetoofferonlyadeeplyunsatisfy-ingFaustianbargainbetween internalandexternalvalidity:Yes,wearehighlycertain CCTs worked in Mexico, but remain deeply ambiguous about theirpotentialsuccessinNewYork.

ThisarticleismotivatedbythedesiretohelpimprovethetermsofthisFaustianbargain. In what follows we distinguish between the robustness and analyticalapproachestoexternalvalidity.Theanalyticalapproachproposesaseriesoftheo-reticallymotivatedreplications.Thisapproachseestheproblemofgeneralizability

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

134 THEANNALSOFTHEAMERICANACADEMY

asintrinsicallytheoretical,inthattheoriesaboutcausalmechanisms,constructs,andselectionarewhatallowustogeneralizebeyondsamplingparticularsinindi-vidualcases.

InthecontextoftheNewYorkexperiment,thisapproachcouldbeginbytheo-rizingaboutmediatorandmoderatorvariablesthatmaydampentheextrapolationfromProgresaandthendesignasequenceofexperimentstotestthesehypothe-ses.4Indeed,randomizedCCTshavebeenimplementedinnumerouscountries,offeringampleopportunitytotestalternativecausalmechanismsforbetterpre-dictionoutofsample(Rawlings2005;Das,Do,andOzler2005).Moreover,wemaydesign small testsof specific implicationsof the theorywithouthaving toreplicateProgresaeachtime.

Incontrast,therobustnessapproachreliesonreplicationacrossvarioussettings,treatments,outcomemeasures,andunits.RatherthanfretwhetherProgresawillworkinNewYork,theapproachistojustgoaheadandtestit.Suchbruteforcereplicationis,indeed,anobviouspathtodissolvetheuncertainty.Butbesidesthepotentialpracticaldrawbacks,ithasseverelimitations.Supposewerunthetestandtheresultsarenegative.Whatshallweconclude?ThatCCTsworkinMexicoandnotinNewYork?Thattheyworkonlyinpoorersocieties?OrinallsocietiesbutNewYork?ThattheydidnotworkinNewYorkin2008-12butmayworkatalaterdate?Shouldwethereforerepeatthesameexperimenteveryfewyears?Thelistispotentiallyendless.Forus,theory-drivensequencesofexperimentsarekeyforoptimallearning.

Externalvaliditycanbegreatlyimprovedbyconnectingindividualexperimentswithatheory,evenifthoseindividualexperimentsarenotthemselvestheoreticallygrounded,especiallyat theearlystagesofaresearchprogram.Indeed, inwhatfollows,wereemphasizethedistinctionbetweentheexternalvalidityoftheoriesassociatedwitharesearchprogramandtheparticularexperimentsonwhichtheprogramisbased,therebyunderliningthethree-wayrelationbetweenexperi-ments,researchprograms,andexternalvalidity.Afterall,externalvalidityisanattributeof inferencesandnotofanyparticularmethod(Shadish,Cook,andCampbell2002).Indeed,contracommonlyheldbeliefs,experimentationiskeytotestingtheoriesaboutexternalvalidity.

Besidesmotivatingthedesireforoptimallearning,thedistinctionbetweentheanalytical and robustness approacheshas importantpractical consequences, asthepayofftomorecarefulreplicationcouldbelarge.Forexample,thebudgetforOpportunityNYChasbeensetat$53million,financedbythemayorhimselfand private foundations (it receives no public funds). This is a huge gamble.Somerelatively inexpensivepilot testinganddiagnosismaywellreduceuncer-tainty,increasingtheexpectedvalueofthisandotherfuturereplications.Thus,theremaybelargeprivateandexternalgainstobehad.

Finally,twocaveats:First,whereasthesharpdistinctionbetweenrobustnessandanalyticalapproachesisusefulasarhetoricaldevice,itisunlikelytobeborneout inpractice.Inreality, there is likelytobesomeamountofoverlap, inthatreplications often embody some implicit prior or theory. Second, because ourgoal,atthisstage,istohighlightsomeconjectures,theapproachisdiscursiveand

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 135

notexplicitlydeductive.Inongoingwork,weformallyapplythestatisticallearn-inganddecision-makingliteraturestothenotionofexternalvalidity.

Thisarticleproceedsasfollows.First,wediscusswhatexternalvalidityis,whyitisproblematicandwhatthecriteriaareforevaluatingexternalvalidity.Wethenexplainwhattheanalyticalapproachis.Next,wediscussthepotentialprosandcons.Finally,weexplorehowexternalvalidityrelatestocausality,predictionandunderstanding,beforeofferingaconclusion.

WhatDoes“ExternalValidity”ReallyMean?

Imaginetestingtheeffectofnon-partisanget-out-the-votemailingsonturnoutamongstNindividuals,aboutwhomallweknowistheirindividualidentifiersi=1,2,. . . ,N.VariableYrecordstheturnoutoutcome,suchthatyi=1ifunititurnedouttovoteandyi=0otherwise.Supposethetreatmenteffectispositiveandpracticallyandstatisticallysignificant.Now,supposeweareaskedtoplaceabetonwhetherthetreatmentwillworkonsomenewunits,unitsN+1,N+2,andN+3.Allweknowisthattheseunitswerenotintheoriginalsample.Howshallweplaceourbet?Andhowwillweknowifwehavewon?Thatis,whenisareplicationresultcloseenoughtotheoriginal inferencethatexternalvaliditymaybedeemedupheld?Weanswerthesetwoquestionsinturn.

External validity as extrapolation

Under a squared loss function, say, a common predictor for an outcome ofinterestyistheconditionalmeanE[P(y|Treatment)],whereP(y|x)describesthedensityofyconditionalonx.5Althoughthismayseemnatural,wedon’tknow,with the informationwearegiven in thehypotheticalvotingexperimentmen-tionedabove,whethertheoutcomesforthesenewunitsareidenticallyandinde-pendentlydistributed inrelationto theexperimentalsample.Forallweknow,thenewunitsareacat,adog,andagoldfish,noneofwhichvote.

Whatwe face isaproblemofpredictiveambiguityas, strictly speaking, theexperimentrevealsnothingaboutthedensityP'(y|Treatment)forthenewunits.Predictiveambiguityariseswheneverchoiceorbehaviordependsonanobjectivefunctionwithanunknownprobabilitydistribution(Manski2008).Atthispoint,wethereforeneedtomakeassumptions.Wemay,forexample,assumethatthenewunitsarenotmateriallydifferentfromtheoldones,inwhichcaseourchoiceofpredictorwouldbejustified(i.e., is inaccordancewithourassumptionsandlossfunction).

Thefundamentalproblemisthatexternalvalidityinvolvesextrapolationoftreatmenteffectstonewunitsor,moregenerally,makingpredictionsoffthesup-portoftheestimateddensityfunction(whichincludestheexperimentalsetting,outcomemeasure,andtreatment).Assuch,externalvalidityclaimsareinherentlyambiguous.Suchproblemswithextrapolationarisewheneverweignorewhethertheunitswewanttomakeapredictionaboutbelongtogetherwiththeunitsused

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

136 THEANNALSOFTHEAMERICANACADEMY

toestimatethedensity,eitherbecausewelackobservableinformationabouttheseunits, or because some potentially relevant characteristics may be intrinsicallyunobservable.This,then,istheprobleminmakingtheprojectionfromProgresatoNewYork:wejustcannotbesurethesetwopolicyexperimentsaresufficientlysimilar.Butsimilarinwhatrespects?Inthenextsection,wearguethattheyneedto be similar in theoretically relevant aspects or, alternatively, that we have atheorythatallowsustobridgetheirdifferences.

Aswasarguedabove,theproblemofextrapolationmaybeovercomeintwoways(Manski2008).First,accordingtotherobustnessapproach,wemaycircum-vent it altogetherbyperforminga test that includesa sample from thesenewvalues(e.g.,testingwhetherthenewunitsvoteafterreceivingthemailing).Ofcourse,suchtrialanderrorcanbeveryexpensiveandoftenimpractical.Amorenuancedversionofthisapproachistoperformapilottestofamuchlargerinter-vention, say, by randomly sampling from the target population of treatments,units,settings,andoutcomes.Unfortunately,suchapopulationisilldefinedformanyexperimentalaspects,suchastreatments,outcomemeasures,andsettings(Shadish,Cook,andCampbell2002).Forthisreason,wedon’tthinkthatreplica-tionusingrandomsamplesfromawell-definedpopulationovercomestheprob-lemofexternalvalidityaltogether,asoftenthisisnotpossibleoritistoocostly.

Second, the analytical approach relies on testable theories about cause andeffectrelationshipstoimposeglobalshaperestrictions,suchasinvariance,linear-ity,ormonotonicityoftheestimateddensity.Forexample,todealwithextrapola-tiononecouldrelyontheoreticallymotivatedassumptionsregardinginvariance(e.g.,P(y|x)=P'(y|x'),x≠x').Here,weassumethatthenewunits,despitebeingoffthesupportofP(y|x),aresufficientlysimilarintheoreticallyrelevantaspectstotheNexperimentalsubjectsthatwecanpredicttheiroutcomesusingtheesti-mated density. Gelinas’s (2006) criticism of the New York replication, above,essentiallyquestionsthisinvarianceassumption,byarguingthatNewYorkCityisnotatallsimilartoMexico.6

Alternatively,wemayassumelinearityormonotonicity,whichallowustoadjustforrelevantdifferencesacrossunits,settings,andsoon.Therelevantassumptioninthiscaseisthatthemodeliswellspecifiedforcasesonandoffthesupportandthat relevant moderators have been incorporated as factors into the originalexperiment.Accordingly,tomakeanex-postprediction,wecollectdataontherelevantcovariatesspecifiedbyourtheoryfortheunitswewanttomakeapre-dictionaboutandfeedtheseinputsintothemodeltogetavectorofpredictionsastheoutput.Ifthemodelpredictswelloutofsample,wemaygainfurthercon-fidenceintheassumptionthatitiswellspecifiedforsomeuniverseofcases,andsowemayhavemoreconfidenceinitspredictionsasmoreandmoresuccessfulreplicationsaccumulate.

External validity as subjective

Nowthatwehave,itishoped,justifiedsomeassumptions andsettledonapre-dictortoinformourbets,itremainsforustoagreeonacriterionthatdetermines

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 137

whichbetswin.Thatis,whatcriteriondeterminesexternalvalidity,or“whetheracausalrelationshipholdsovervariationintreatments,outcomemeasures,units,andsettings”?

Bywhatcriteriadowejudgethesuccessorfailureofexternalvalidity?OnecriterionistodeclaretheexperimentsuccessfuliftheconditionalestimatesareidenticaltotheonesinMexico.However,thiswouldyieldfewifanysuccesses.Even if theunderlyingdata-generatingprocesseswerethesameandwecouldconditionawaydifferencesbetweenexperiments,samplingvariabilitywouldstillmakeexactmatchesbetweenestimatedparametershighlyunlikely.

To fix ideas, suppose large enough samples allowed us to ignore samplingvariabilityaltogether.Wouldamoderatedifferencebetweentheestimatedparam-etersquestiontheexternalvalidityoftheinferencesfromProgresa?Andwhatif,despite thismoderatedifference, theestimatestillpreserves thedirectionandpracticalrelevanceof thecausaleffectacrossbothsettings?Doestheoriginalpredictionremainexternallyvalid?And,ifnot,whatexactlydowemeanbytheexternalvalidityofacausalinference?

Shadish,Cook,andCampbell(2002,34)interpretvalidityas“theapproximatetruthofaninference”yet,totheextentthatweneverreallygettoobservetruth,thisishighlyproblematic.Besides,evenifwedid,westillneedtodealwiththeissueof“moderate”differencesintrueparametersas,formostpracticalpurposes,exactmatchesarenotwhatweareafter.Instead,weproposeamorepragmaticinterpretationofvalidity,onebuiltonthepillarsofpredictionandstatisticaldeci-siontheory.

Underthepragmaticcriterionwepropose,ourconcernwithexternalvaliditystems from the need to decide whether or not to implement a program likeProgresa,say,inNewYork.Thatdecisionisofteninthehandsofapoliticianorhigh-levelbureaucrat.Ifso,thepreferences,opportunities,andconstraintsofthedecision-maker should enter into the ex-post assessment of the validity of theinference(GrangerandMachina2006).Consequently,toafirstapproximation,itisdecision-makersthatdeterminethecriteriaforsuccess.7

Althoughdifferentdecision-makersarelikelytohavedifferentlossfunctions,andhenceevaluateexternalvaliditydifferently,itisnotacaseofanythinggoes.Ataminimum,anexternallyvalidextrapolationofacausalrelationshouldbeonethat,oncerealized,resultsinacausaleffectthatisinthesamedirectionastheprediction.Inaddition,thesizeofthecausaleffectmustremainpracticallysignificantatsomepredeterminedlevel.Thatis,thedecision-makermustevalu-atetheoutcomeinafashionconsistentwithhisex-antelossfunction.Animplica-tionofthisunderstanding,thoughonenotpursuedinthisarticle,istheneedforbetteruseofstatisticaldecisiontheoryinpoliticalscience.8

Torecap,externalvalidityisproblematicbecauseitinvolvesextrapolation,thatis,makingpredictionsoffthesupportoftheestimateddensity.Whethertheinfer-encesregardingtheapplicabilityoftheparametersofthatdensityoffthesupportareexternallyvalidornotwillthereforedependontheaccuracyofitspredictionsoutofsample—thatis,acrosscombinationsoftreatments,outcomemeasures,units,andsettingsnotintheoriginalsample.Theaccuracyoftheseforecasts,inturn,

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

138 THEANNALSOFTHEAMERICANACADEMY

dependsonavectorofex-postpredictionerrorsanda lossfunctionevaluatingthesethatisspecifictothedecision-maker(Manski2008).Accordingly,weareopentotheideathatthevalidityofaninferencelies,withintheboundsdefinedabove,intheeyeofthebeholder.Externalvalidityisamatterofdegree.

TheAnalyticalApproachtoExternalValidity

Intheprevioussection,whiledefiningwhatwemeanbyexternalvalidity,wesuggestedtheideathattheorycouldbeusedtojustifyglobalshaperestrictionstomakeextrapolations.Inthissection,weexpandonpreciselythetypeoftheo-rizingneededtojustifysuchrestrictions.Wedosoinfourparts.First,weprovidesome examples of why general knowledge is, indeed, theoretical knowledge,explaining how theories about constructs, causal mechanisms, and complianceandselectionarekey forexternalvalidity.Second,weprovidesomeexamples.Third,weconsidersomeprosandconsthatmightbemadeagainsttheanalyticalapproach.Fourth,wediscusswhatisandisnotnewinthisscheme.Toavoidalong digression, for the instant purposes we will define theories as falsifiablestatementsaboutcausalrelationshipsbetweenclassesofevents.Thesetheoriesmaybecomplexandmicro-founded,ormoresimpleandaggregate;eitherway,theyoughttoexplaincausalregularitiesratherthanindividualparticulars.

Theories about constructs, causal mechanisms, and compliance and selection

Theorizing for generalizability comes at three levels: theories about con-structs, theories about causalmechanisms, and theories about complianceandselection.First,theoriesaboutconstructsareessentialinordertobeabletotalkmeaningfullyabouttheresultsofexperiments.Forexample,Wantchekon(2008)studiestheimpactofinformedcampaignsonvotingbehavior.Yet,beforewecangeneralizehis results fromtheparticular implementation,weneed tobeclearwhat the theoretically relevant attributes of a campaign are that make it aninformedcampaign, the ideabeing thatcampaignssharing theseessential fea-tureswouldhavethesamecausaleffectceterisparibus.

Second,wecanspecifyatheoreticalcausalmechanism,onewherebothmedi-atorandmoderatorvariablesareconjectured,measured,andtested,andwhatwebelieveareirrelevanciesareleftoutexante(unlessthebudgetallowsformoretesting).Withtheparametersofthisfullyspecifiedmodelathand,wecanthenmeasurethelevelandprevalenceofsuchmoderatorsandmediatorsinthenewtargetpopulationofinterest,andusethemasinputsintothemodeltopredicttheaverage treatmenteffect in thatpopulation.Anobviousapplicationof this is instudiesofthegeneticbasisofdiseaseandtheirinteractionwithpotentialremedies.

Third,mostsocial scienceexperimentsare,explicitlyor implicitly,encour-agement designs (Horiuchi, Imai, and Taniguchi 2007). As such, treatmentassignmentsdonotguaranteecompliance.Totheextentthatcompliancerates

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 139

varyacrosstreatments,outcomes,units,andsettings,sowilltheaveragecausaleffect.Toprovidebetterpredictionsonecouldthenproceedasabove,modelingthepopulationbycharacterizing itasacombinationofcompliers,defiers,andnever-andalways-takers(Angrist,Imbens,andRubin1996),accordingtotheirindividual attributes; then using this information to predict the proportion ofcompliersandothers in thenewtargetpopulation;and, finally,computing theimpliedaveragetreatmenteffect(orothereffectofchoice)inthetargetpopula-tionusingthepredictedproportions(seeFrangakis,Rubin,andZhou[2002]andHoriuchi,Imai,andTaniguchi[2007]forspecificapplications).

In practice, all three issues—construct validity; systematic differences inlevels of mediators and moderators; and systematic differences in shares ofcompliers,defiers,andnever-andalways-takers—will impactourestimatesofaveragetreatmenteffects,andsoallthreeneedtobeconsideredsimultaneously.Accordingly,whatweultimatelyneedaregoodtheoreticalmodelsof the latentconstruct, the causal mechanism, and selection into treatment—what are com-monly referred to as structural equation models.9 Such models embody globalshaperestrictionsthat,ifcorrect,arewhatallowustomakegoodpredictionsoutofsample.However,ourpurposeisnottoargueforareturntothelargestruc-turalequationmodelsofthetypeoncesponsoredbytheCowlesCommissionforResearchinEconomics,say,butsimplytonotethatconceptualizingsuchmodelsmay help in the design of research programs, even if individual experimentsremainmuchlesscomplicated.

Theory, research programs, and individual experiments

Inpractice,mostexperimentsshouldnotattempttoestimatesuchcomplexstructural models, nor do they need the backing of a fully specified theory.Indeed,intheearlystagesofaresearchprogramconcernsaboutexternalvalidityareoftensecondarytoconstructorinternalvalidity.Rather,ourargumentisthat,totheextentthatexternalvalidityisdesiredinmatureresearchprograms,thereare more efficient ways of achieving it than testing large structural models orblindlyreplicating,asintherobustnessapproach.Rather,experimentsoughttoquitedeliberatelytesttheappropriatenessofassumedglobalshaperestrictions.

Forexample,Mook(1983)citestheexampleofEkmanandFriesen(1971),whoaskedwhetherrecognitionofemotionalfacialexpressionsdependedonculture.Ratherthandoinnumerablereplicationsacrossallpossiblecultures,theytheo-rizedthatiffacialexpressionwereinterpretedsimilarlyacrosscultures,thenthismustbe trueacross themostdistantcultures.Hence, they“stress tested” thetheorybycomparingAmericanstothemostdistantculturetheycouldthinkof,theForeofPapuaNewGuinea.Thefindingthattheybothrecognizehappinessineachothersuggeststheuniversalityofemotionalexpression.

Similarly,goingbacktotheMexico–NewYorkexampleabove,ifwethinkthatahistoryofsubstanceabuseisasignificantmoderatoroftheeffectofCCTs,say,thenwemaydesignanexperimentthatstressteststhisaspect.Despiteitsnarrowfocus,thefactthatthisexperimentisembeddedinalargerresearchprojectmay

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

140 THEANNALSOFTHEAMERICANACADEMY

informusgreatly about theexternal validityofpredictionsbasedonProgresa,insofarasitmayallowustoconditionourexpectationsontheprevalenceofsub-stanceabuseintheNewYorktargetpopulation.

AnotherexampleistheBeninelectoralexperiments(seeWantchekon2003,2008).Oneofthefindingsofthe2001experimentisthatvotersaremorelikelytoreactpositivelytoa“publicgoods”messagewhenitcomesfromaco-ethniccandidate. A possible explanation of this result is that voters trust a candidatefromtheirethnicgroupmorethanthey trustacandidate fromanothergroup.Thismeansthatthemediatingvariablebetweenethnictiesandvoteistrust,orthecredibilityofthecandidate.Bytestinginthecontextofthefollowingexperi-mentin2006therelationshipbetweencredibilityofcandidatesandvotingbehav-ior,Wantchekon(2008)improvedtheexternalvalidityoftheresultsofthe2001experiment.

Pros and cons of the analytical approach

Theanalyticalapproachtoexternalvalidityweareproposingmaybeprefer-ableforatleastthreereasons.First,thequestionofexternalvalidityislargelyatheoreticalone.ResearchonexternalvalidityasksnotjustwhetheraCCTpro-gramwillwork inNewYork, say,butwhyorwhynot.That is, it demands anexplicationintermsofacausalmechanismorrelevantdifferencesbetweenunitsinandoutof sample. Indeed, forpolicyanalysis,aswellas for thepurposeofscientific advancement through comparative research, the question of where,underwhatconditions,andamongwhichsubpopulationsatreatmentislikelytoworkisoftenasinterestingaswhetherornotthetreatmentworkedinthefirstplace(Heckman2005).Assuch,theanalyticalapproachisofinterestinandofitself.

Second, answering these questions may provide us with tighter bounds onfuturepredictionsoutofsample,andincreaseourconfidenceinthemaintainedshape restrictions. Moreover, a program of research that focuses on externalvaliditywillsubjecttheoriestothestrongestpossibletests,out-of-sampletests,therebyofferingthepotentialforlargeupdatesinourpriors.

Third,weconjecturethattheoreticallydrivenresearchprogramswillpermitus to design experimental replications for optimal learning—or, to paraphraseMilton Friedman, to come up with sequences of experiments that explain asmuchaspossibleintheshortestpossiblesequence.

Thisbeingsaid,theanalyticalapproachisnotfoolproof.Onecouldarguethatthe external validity criticism applies just as readily to a more fully specifiedcausalmechanism,construct,andselectionprocessthantoasimpleone.Afterall,moderatorormediatorvariablesinonesettingmaybedifferent,orhavedif-ferentimpacts, innewsettings,populations,treatments,oroutcomemeasures,andsoextrapolation isnotpossible.Forexample,household incomemaybeamoderatoroftheeffectsofProgresainMexicobutnotinNewYork,perhapsduetosomeunobservedinteractingvariable.Allthisiscertainlypossible,butthereareatleastthreepowerfulrejoinders.

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 141

First,wecantestthiscriticism.IfthemoderatorsarefoundtobehavesimilarlyinMexicoandNewYork,thenwehavemoreconfidenceinprojectingtreatmenteffectstoLosAngelesandKualaLumpur,say,thanifwehadreplicatedwithoutatheory.Thelatterdoesnotallowustomakeaccuratepredictionsbecausewearenotusingalltheavailableinformationefficiently,byconditioningontherelevantmoderatorsandmediators.Asaresult,robustnessreplicationyieldsmoreuncer-tainpredictionsandrepresentsavery inefficientwayofcumulatingknowledge.Ourviewisthattheoristsexplainandempiricistscondition;doingneithergreatlylimitsexternalvalidity.

Second,thisisareductio ad absurdumcriticism.Itsimplynegatesthewholebasisofcomparativeresearch.Ifcausalandselectionmechanismsaredifferentacross all variations in settings, populations, treatments, or outcome, then wehavenobasisforgeneralizations.Withouttheorizing,eachconstruct,treatment,outcomemeasure,unit,andsettingisunique—thepossibilityofscientificlearn-ingdenied.Thismaywellbethecase,butitisatestableproposition.

Third,theburdenofproofisongeneralizabilityskepticstoproposewhytheythinktheinferenceisnotgeneralizable.Indeed,thisiswhatRosenbaum(2002,9)callstangiblecriticism:“aspecificandplausiblealternativeinterpretationoftheavailabledata;indeedatangiblecriticismisitselfascientifictheory,itselfcapableofempiricalinvestigation.”Incontrast,dismissivecriticism“restsontheauthor-ity of the critic and is so broad and vague that its claims cannot be studiedempirically.”10

Tosumup,thinkofthereplicationprocessasanoptimallearningproblem.Ourcontentionisthattheanalyticalapproachwillrequirefeweriterationstoachieveagivenuncertaintytolerance(aboutexternalvalidity)thantherobustnessoneor,alternatively,reduceuncertaintyfurtherforanygivennumberofreplications.AsGualaandMittone(2005,499)state,“externalvalidity isanimportant issueforexperimentersandtheoristsalike”(emphasisinoriginal).

The difference between the robustness and analytical approaches somewhatresemblesthedifferencebetweenactiveandpassivelearning(Castroetal.2008)ortheproblemsfacedindynamiccontrolprogramming.Asthatliteratureimplies,therearetrade-offstobemadebetweentheanalyticalandrobustnessapproaches.Forexample, testingsometheorieswill require factorialdesigns,which in turnmayinvolvelargersamples.Yetthismaystillbecheaperthantestinghypothesesseparatelyand,totheextentthattheygeneratesensiblefindings,maysubstantiallyimproveourpredictionsoutofsample.

Old wine in new bottles?

Theoryalreadypermeateseverythingwedo,fromthequestionsweask,tothedatawecollect,tohowwedefineourconcepts.11Forexample,takeoneofthemostsuccessfulandcomprehensiveexperimentalresearchprogramsinpoliticalscience, the series of field experiments on voter mobilization spawned by theworkofGerberandGreen (2000)and reviewed inGreenandGerber (2008).Theuseoftheoryisclearinthechoiceoftreatments:researchershavestudied

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

142 THEANNALSOFTHEAMERICANACADEMY

whethermassmailing,door-to-doorcanvassing, and soon impact turnout,butnottheimpactofmailingsprintedwithTimesNewRomanversusArialfont,aswehavenotheoreticalbasisforpresumingthesemightbecausallyrelevant.

The fact is,most replicationsarealready theorydriven,explicitlyornot.Forexample,thefindingbyGerberandGreen(2000)thatphonecallsareineffectiveingettingoutthevoterelativetopersonalface-to-facecontactmotivatedNickerson(2006) to testwhether thiswasdue to the face-to-facecomponentor theextrapersonalattentionassociatedwithdoor-to-doorcanvassing.Hefindsthatthequal-ityofthephonecallsmatterandthatbrief,nonpartisanphonecallscanraisevoterturnoutiftheyaresufficientlypersonal.Wecanusethisfindingtohelpuspredicthowothermeansofcommunicationmayfareconditionalonthedegreeofper-sonal attention provided. Also, at times, these replications shed light on unex-pectedresults,motivatingnewtheoryandfurtherexperimentation(Gerber2004).

If theory already permeates many of our experimental research programs,whatisourpoint?First,wewanttodispelcriticismsthatindividualexperimentshavelittleornoexternalvalidity:solongastheycontributetogeneraltheoriesofvoterbehavior,say,experimentsmayexpandtherangeofpredictionsignificantly.Moreover,wecantestexternalvalidityclaimsusingexperimentsstrongoninter-nalvalidity.Second,fortheirpart,experimentersoughttomakemoreexplicitthetheoretical context of their experiments, even if it is only an enumeration ofpotentialcausalpathwayslackinginformalisms.Giventhatexperimentalreplica-tionishighlydecentralized,itisuptoeachreplicationtodoitspartinexpandingtheexternalvalidityofthewhole.

WhenAreConcernsaboutExternalValidityJustified?PredictionversusUnderstanding

Webeganthisarticlebynotingthepolicyrelevanceofexternalvalidityques-tions,inparticularbecauseoftheperceivedneedtomakepredictionsaboutcausaleffectsoutof sample.But ispredictive success thehallmarkof a good theory?Whataboutexplanationandunderstanding?Andwhenshouldexperimentalistsfocusonexternalvalidity?Besides,whatabouttheideathatthebestpredictorsareoftenatheoreticalassociationsàlaSims(1980)?

Predictionandunderstandingarecloselyrelated.YetanexampleduetoRubin(1996, 475) perhaps best highlights their differences: suppose an unfair coinyieldsheadswithprobability.6.Amodelthatpredictsheadswithprobability1willgetitright60percentofthetime,onepredictingheadswithprobabilityof.6willgetitright52percentofthetime.Onthebasisofpredictivesuccess,wewould be tempted to choose the wrong model, even if it cannot explain theobservedsequenceoftossesand,inparticular,the40percentoftails.Sopredic-tioncannotbeallthatthereistoit.However,ourpointisnotthatexternalvalid-ityoughttobetheonlygoalofscience;rather,ourclaimisthatthebestwaytotestexternalvalidityistotestpredictionsoutofsample.Differentgoalsrequiredifferenttests.

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 143

Forexample,inawidelycitedarticle,Mook(1983)criticizedthepreferencethenprevalent inpsychology forexperimentswith strongexternalvalidity.Hisargumentreliedonadistinctionbetweenpredictionandunderstanding,or,asheputit,betweentwomodesofresearch:theanalogueandtheanalyticalmodels.In theanaloguemodel, theobjective is tomodel therealworld forpredictionpurposes.Thus,thevariablesthataccountforthemostreal-worldvariancearethemostimportant,sincetheyaretheonesthatspeakmostdirectlytoourabilitytomakepredictionsaboutcausalrelations.12Intheanalyticmodelofresearch,bycontrast,theobjectiveistounderstandtheworkingsofasystem,totesttheinter-nalvalidityoftheories.Thesetheoriesmayapplytoreallife,butthereisnoattemptatgeneralizationatthispoint.

Inasimilarvein,Przeworski (2007) is interested,notonly in theeffectsofcauses,butalsointhecausesofeffects:thelistoffactorsX1,X2,...Xnthatexplaintheobserved(innature)outcomeY,saylungcancerprevalence.13Przeworski’spointisthatweoftendesiretounderstandnotonlywhatthepotentialcausesofYare,butalsowhattheactualcausesofYareinaparticularpopulationandhowthesecomeabout.DemonstratingthatXcancausechangesinY isanecessarybutinsufficientconditionfortheinferencethatXexplainsY,orthatXcanbeusedto manipulate changes in Y in any particular target population in any period.TranslatedtoMook’slanguage,theanalyticalmodeofresearch(ortheeffectsofcauses)asks,“CouldXhavecausedY?”whereastheanaloguemodeofresearch(orthecausesofeffects)asks,“DoesXtypicallycauseY?”

Note thatour answer to thesequestionsmay significantly influencehowweanswertheircorollary,“CanXbemanipulatedsoastochangeYinadesireddirec-tion?” This is important for two reasons: first, because by knowing the actualcausesofadisease,say,wemightbebetterabletodesignpreventionmeasures;and,second,becausecausesthatareeffectiveinthelabmaynotbeefficaciousinthefield.

Forexample,taketheefficacyofbednetsinpreventingmalaria.Undercon-trol conditionsbednetshavebeen shown tobehighly effective inpreventingmalaria:householdsrandomly“treated”withbednetsexperienceareductioninmalaria incidence relative to households randomly allocated to control condi-tions.Thesecontrolledexperimentsidentifythe“effectsofcauses”—inthiscase,bednetusereducesmalariaincidence.Basedonthisevidence,numerouspro-gramshavebeen implemented that freelydistributebednets in areasofhighmalariaincidence.Andyet,todate,thejuryisstilloutastotheeffectivenessoftheseinterventionsinreducingmalariaincidenceinthetreatedareas.Why?Oneanswerislackofcompliance—providingafreebednetdoesnotimplythatitwillbeusedappropriately.

ThisexampleillustratesthepointthatjustbecauseXcanbeshowntocausechangesinY,itdoesnotfollowthatitexplainsanyoftheobservedvarianceinYintherealworldnor,indeed,thatitcanbeaneffectivecauseintherealworld(aproblemofexternalvalidity),wherewemaylacksufficientcontroltoensurefullcompliance.Inotherwords,thereareotherfactors(potentiallyunobserved)thatmoderatethecausaleffectinrealapplications.InMook’sterminology,wegain

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

144 THEANNALSOFTHEAMERICANACADEMY

understandingofpotentialcausesforreducingmalaria,butwemaystillendupwithbadpolicypredictions.

Understandingwhattheactualcausesofsomephenomenonareshouldinformour predictions and research. This is what motivated psychologist EgonBrunswicktoadvocate“representativedesigns”where,inparticular,levelsofthecausalvariableinquestionwouldbechosenaccordingtotheirconditionaldistri-butioninthenaturalworld(AlbrightandMalloy2000).

Accordingly,theideathatdistributionoffreebednetswillsomehowgeneratethetreatedcounterfactualobservedincontrolledtrialsmayappearfar-fetched.Forallweknow,thegivendistributionofbednetsinanycountryisanequilibrium—everyonewhowantsonehasone—soreducingtheirpricetozeromayhaveatinyeffect in that, at some point, there is no longer a binding budget constraint(Gelinas’s[2006]pointabove).Infact,itisentirelypossiblethatthereissimplynoway togenerate theexperimentalcounterfactual in the fieldwithoutat thesametimechangingthelevelsofother(potentiallyunobserved)covariates.Perhapsmalariaeradicationrequiresaprocessofmodernizationthatincludesimprovededucationastothepathogenicnatureofthedisease,betterdrainage,urbaniza-tion, air-conditioning, better medical facilities, and so on—which, by the way,speakstotheimportantdevelopmentliteratureonthesequencingofreformsandtheneedfortheoriesofchange.

Linkingexternalvalidity to thecapacity tomakereasonablecausalpredictionsoutofsampleinnowayunderminestheimportanceoftheorytotheenterprise,northevalueofunderstanding.Predictionandunderstandingmayhaveslightlydiffer-entgoals,but,ultimately,agoodmeasureofusefulknowledgeisatestoftheexter-nalvalidityofitspredictions.Inaddition,soundunderstandingoftheactualcausesofeffectsmayhelpustheorizeaboutcausalmechanismandoptimalinterventions.

Conclusion

Inthisarticle,wearguethatclaimstoexternalvalidityofrandomizedfieldexperimentsarestrongerwhentheoreticalconnectionsbetweenexperimentsareestablished and tested. In an experiment that establishes a causal relationshipbetweenthetwovariables(thetreatmentandanoutcomeofinterest)underasetofconditions,wecanimproveexternalvalidityinatleasttwoways:(1)byreplicat-ingtherelationshipbetweenthetwovariablesundernewconditions(therobust-ness approach) or (2) by establishing that the relationship is mediated ormoderatedbythesetofvariables—thatis,theanalyticalapproach.

Webelievetheanalyticalapproachmayturnouttobemoreeffectivethantherobustnessapproach.Thisisbecausethemediatorislikelytorepresentalargersetofexperimentalconditions.Werecommendthat follow-upexperimentsbe primarily focused on testing the theoretical argument of original experi-ments, instead of simply replicating them in a different context. If externalvalidityistheAchilles’heelofrandomizedexperiments,thentestingmechanismsunderlyingalreadyestablishedcausalrelationshipsshouldbethetoppriorityoftheexperimentalresearch.

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 145

Notes 1.“[Conditionalcashtransfer]programsprovidemonetaryincentivestohouseholdslivinginpoverty

whentheycompleteactivitiesaimedatincreasinghumancapitaldevelopmentandbreakingthecycleofpoverty.” From http://www.nyc.gov/html/ceo/html/programs/opportunity_nyc.shtml (accessed February14,2009).ForanoverviewofOpportunityNYCseedeSáeSilva(2008).

2.RockefellerFoundation,http://www.rockfound.org/efforts/nycof/opportunity_nyc.shtml(accessedFebruary14,2009).The2002successorprogramtoProgresaiscalledOportunidades.

3.RockefellerFoundation,ibid. 4.Onthedefinitionofmoderatorandmediator,seeBaronandKenny(1986).Therearenumerous

ways to labelvariables in thecontextofacausalmechanism,yet “conceptually,moderators identifyonwhomandunderwhatcircumstancestreatmentshavedifferenteffects.Mediatorsidentifywhyandhowtreatmentshaveeffects”(Kraemeretal.2002,877).Theformermaybeunderstoodasinteractioneffects,whereasthelatteridentifypossiblemechanismsbywhichthetreatmentcomestohaveitseffect.Onthecomplexitiesoftestingmediatoreffects,seeBullock,Green,andHa(2008).

5.Letyidenotetestscoresforpupili,Tidenotewhetherhewasrandomlyassignedtoreceiveschoolvouchersornot,xibeasetofcovariatesstronglycorrelatedwithy,andeibearandom(iid)errortermsuchthatwemaywriteyi=a+b0Ti+xib+ei,i=1,2,...N.WeareinterestedinJY/JT,orb0.Accordingly,thequestionofwhetherProgresawillworkinNYCasks, loosely,whethertheestimatedb0 intheNYCsample will be statistically significant and of same sign and magnitude as in Mexico. We may test thisdirectlywith theNYCprogrambecause it is randomized.If, instead,everyonegot treated inNYC,wewouldhaveonlyexanteandexpostmeasureswithnoexpostcontrols.Althoughnotideal,wecouldusepre-testdatatopredictoutcomesintheabsenceoftreatmentbygeneratingapredictedcounterfactual.Finally,notethatitispossibletohaveJY/JT>0andyetDY=0ifsomeothereventactedintheoppositedirection(sayateacherstrike).Thisiswhy,intheabsenceofrandomizedreplication,itiscrucialtohavegoodpredictorsofY,asourinferenceswilldependonthemodelforthecounterfactualbeingcorrectlyspecified.

6.Accordingtoher,thepoorinNewYorkhaveaccesstofreeeducationandtheirchildrenareidle,featuresthatdonotrhymewiththeMexicanstoryofbindingbudgetconstraintskeepingMexicanchildrenatworkandawayfromschool.

7.Atthisstage,forsimplicity,weignoretheissueofwhothedecision-makeris,whethersheisahigh-levelbureaucrat,apolitician,a setofvoters, someabstractwelfaremaximizer,or, indeed,a researcherevaluatingatheory.Wewillsimplypostulateadecision-maker,typicallyapolitician,andleaveitatthat,asthe relevant person may differ between applications. We are aware this renders scientific knowledgesomewhatsubjective,butthenagain,alongliteratureinthephilosophyofsciencequestionsthepossibilityofobjectivescience.

8.SeeGerber,Green,andKaplan(2002)forarareexceptioninpoliticalscience,andBerger(1985)and Manski (2008) for a more general discussion. This has three important corollaries, especially forpolicy-motivatedexperiments,thatoftengounappreciated.First,onlyifthedecision-makerisplanningtorepeat the experiment elsewhere will she care about its conventional statistical significance over andbeyond itspractical significance.Second,having awell-specified loss function allows the researcher tomoveawayfromsimplepointestimation,relaxingsomeassumptionsandthesomewhatexaggeratedobses-sionwithbias(seeRosenbaum[2002]foradiscussionofsensitivityanalysisandManski[2008]forintervalestimation). As such, the unbiased estimation made possible by randomized experiments may only beneededwhenevergoodenoughpriorsonthebiasarelacking,preventingusfromadjustingobservationalestimatestocorrectforthebias(Gerber,Green,andKaplan2002).Third,weneedmoreempiricaldataonthekindofdecisioncriteriausedbypolicymakers:Bayes,Maximin,orMinimax-regret,say.Thatis,onthebasisofwhatcriteriadopolicymakerschoosepolicyexperiments?Laboratoryexperimentsonhigh-levelofficialsmayhelprevealthese.

9.ThisisverymuchinlinewiththeargumentmadeinHeckman(2008).10.Experimentersoughttomaketheexperimentassoundaspossibletobeginwith,bystatingpoten-

tialmediators,say,and,wheneverthebudgetallows,testingthem.Ourpointisthatinordertohelpdis-coverwhetheradesignisindeedflawed,thecriticneedstospecifywhyhethinksthedesignisfaultyinthe first place. Off-the-cuff criticism of the sort “experiments have no external validity” are, on thisaccount,unhelpfulandoftendismissive.

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

146 THEANNALSOFTHEAMERICANACADEMY

11.See,interalia,McDermott(2002);Druckmanetal.(2006);GualaandMittone(2005);LevittandList (2007);Lucas (2003);Lynch (1999);Moffitt (2004);Shadish,Cook, andCampbell (2002);Schram(2005).

12.Wedistinguishbetweenmakingcausalpredictionsofthetype“achangeinXwillcauseachangeinY ofD percent,” say, versus simply forecastingY, say, although to reduce thevarianceofour causalpredictions,goodforecastsofYareoftenwelcome.

13.Confusingly,MahoneyandGoertz(2006)interpret“causesofeffects”asexplainingindividualcases,e.g.,whatcausedJoeDoetogetlungcancer.This,inourview,isnotthestandardinterpretation.WeadheretoPrzeworski’sandHeckman’sinterpretation.

ReferencesAlbright,Linda,andThomasE.Malloy.2000.Experimentalvalidity:Brunswik,Campbell,Cronbach,and

enduringissues.Review of General Psychology4(4):337-53.Angrist,JoshuaD.,GuidoW.Imbens,andDonaldB.Rubin.1996.Identificationofcausaleffectsusing

instrumentalvariables.Journal of the American Statistical Association91(434):444-55.Baron, Reuben M., and David A. Kenny. 1986. The moderator-mediator variable distinction in social

psychologicalresearch:Conceptual,strategic,andstatisticalconsiderations.Journal of Personality and Social Psychology51(6):1173-82.

Berger,JamesO.1985.Statistical decision theory and Bayesian analysis.2nded.NewYork:Springer.Bullock,JohnG.,DonaldP.Green,andShangE.Ha.2008.Experimentalapproachestomediation:Anew

guide for assessing causal pathways. Unpublished manuscript, Yale University. Accessed online athttp://www.ipeg.org.uk/events/field_experiments/Documents/A%20Critique%20of%20Conventional%20Mediation%20Analyses%20--%20Bullock%20Green%20and%20Ha.pdf

Castro,RuiM.,CharlesKalish,RobertNowak,RuichenQian,TimothyJ.Rogers,andXiaojinZhu.2008.Humanactivelearning.Technicalreport.ColumbiaUniversity,NewYork.

Das,Jishnu,Quy-ToanDo,andBerkOzler.2005.Reassessingconditionalcashtransferprograms.World Bank Research Observer20(1):57-80.

deSáeSilva,MichelleMorais.2008.OpportunityNYC:Aperformance-basedconditionalcashtransferprogramme. A qualitative analysis. Working Paper 49, International Poverty Centre, Brasilia, andColumbiaUniversity,NewYork.

Druckman,JamesN.,DonaldP.Green,JamesH.Kuklinski,andArthurLupia.2006.Thegrowthanddevel-opmentofexperimentalresearchinpoliticalscience.American Political Science Review100(4):627-35.

Ekman,Paul,andWallaceV.Friesen.1971.Constantsacrossculturesinthefaceandemotion.Journal of Personality and Social Psychology17:124-29.

Frangakis, Constantine E., Donald B. Rubin, and Xiao-Hua Zhou. 2002. Clustered encouragementdesignwithindividualnoncompliance:Bayesianinferenceandapplicationtoadvancedirectiveforms.Biostatistics3:147-64.

Gelinas,Nicole.2006.NewYork isn’tMexico.City Journal.Accessed fromhttp://www.city-journal.org/html/eon2006-10-20ng.html.

Gerber,AlanS.2004.Doescampaignspendingwork?Fieldexperimentsprovideevidenceandsuggestnewtheory.American Behavioral Scientist47(5):541-74.

Gerber,AlanS.,andDonaldP.Green.2000.Theeffectsofcanvassing,telephonecalls,anddirectmailonvoterturnout:Afieldexperiment.American Political Science Review94(3):653-63.

Gerber,AlanS.,DonaldP.Green,andEdwardH.Kaplan.2002.Theillusionoflearningfromobservationalresearch.InstitutionforSocialandPolicyStudiesWorkingPaper,YaleUniversity,NewHaven,CT.

Granger,CliveW.J.,andMarkJ.Machina.2006.Forecastinganddecisiontheory.InHandbook of eco­nomic forecasting,vol.1,81-98.Amsterdam:Elsevier.

Green,DonaldP.,andAlanS.Gerber.2008.Get out the vote: How to increase voter turnout.2nded.Washington,DC:BrookingsInstitution.

Guala,Francesco,andLuigiMittone.2005.Experimentsineconomics:Externalvalidityandtherobustnessofphenomena.Journal of Economic Methodology12(4):495-515.

Heckman,JamesJ.2005.Thescientificmodelofcausality.Sociological Methodology35(1):1-98.

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from

THEORY,EXTERNALVALIDITY,ANDEXPERIMENTALINFERENCE 147

Heckman,JamesJ.2008.Econometriccausality.SocialScienceResearchNetworkElectronicLibrary.Horiuchi, Yusaku, Kosuke Imai, and Naoko Taniguchi. 2007. Designing and analyzing randomized

experiments:ApplicationtoaJapaneseelectionsurveyexperiment.American Journal of Political Science51(3):669-87.

Kraemer,HelenaC.,G.TerenceWilson,ChristopherG.Fairburn,andStewartW.Agras.2002.Mediatorsandmoderatorsoftreatmenteffectsinrandomizedclinicaltrials.Archives of General Psychiatry59(10):877-83.

Levitt,StevenD.,andJohnA.List.2007.Whatdolaboratoryexperimentsmeasuringsocialpreferencesrevealabouttherealworld?Journal ofEconomic Perspectives21(2):153-74.

Lucas,JeffreyW.2003.Theory-testing,generalization,andtheproblemofexternalvalidity.Sociological Theory21:236-53.

Lynch,JohnG.,Jr.1999.Theoryandexternalvalidity.Journal of the Academy of Marketing Science27(3):367-76.

Mahoney,James,andGaryGoertz.2006.Ataleoftwocultures:Contrastingquantitativeandqualitativeresearch.Political Analysis14(3):227-49.

Manski,CharlesF.2008.Identification for prediction and decision.Cambridge,MA:HarvardUniversityPress.

McDermott,Rose.2002.Experimentalmethodologyinpoliticalscience.Political Analysis10(4):325-42.Moffitt,RobertA.2004.Theroleofrandomizedfieldtrialsinsocialscienceresearch:Aperspectivefrom

evaluationsofreformsofsocialwelfareprograms.American Behavioral Scientist47(5):506-40.Mook,DouglasG.1983.Indefenseofexternalinvalidity.American Psychologist38(4):379-87.Nickerson,DavidW.2006.Volunteerphonecallscanincreaseturnout:Evidencefromeightfieldexperi-

ments.American Politics Research34(3):271-92.Przeworski,Adam.2007.Isthescienceofcomparativepoliticspossible?InThe Oxford handbook of com­

parative politics (Oxford handbooks of political science),ed.CarlesBoixandSusanC.Stokes,147-71.Oxford,UK:OxfordUniversityPress.

Rawlings, Laura B. 2005. Evaluating the impact of conditional cash transfer programs. World Bank Research Observer20(1):29-55.

Rosenbaum,PaulR.2002.Observational studies.NewYork:Springer.Rubin, Donald B. 1996. Multiple imputation after 18+ years. Journal of the American Statistical

Association 91(434):473-489.Schram,Arthur.2005.Artificiality:Thetensionbetweeninternalandexternalvalidityineconomicexper-

iments.Journal of Economic Methodology12:225-37.Shadish,WilliamR.,ThomasD.Cook,andDonaldT.Campbell.2002.Experimental and quasi­experimental

designs for generalized causal inference.2nded.NewYork:HoughtonMifflin.Sims,ChristopherA.1980.Macroeconomicsandreality.Econometrica48(1):1-48.Wantchekon,Leonard.2003.Clientelismandvotingbehavior:EvidencefromafieldexperimentinBenin.

World Politics55:399-422.Wantchekon, Leonard. 2008. Expert information, public deliberation and electoral support for good

governance: Experimental evidence from Benin. Mimeo. Paper presented at the Freeman SpogliInstituteforInternationalStudiesonDecember2nd2008,inPaloAlto,California.Accessedonlineat http://fsi.stanford.edu/events/expert_information_public_deliberation_and_electoral_support_for_good_governance_experimental_evidence_from_benin/

at PRINCETON UNIV LIBRARY on November 17, 2011ann.sagepub.comDownloaded from