SOCIAL EXPERIMENTATION FOR PUBLIC POLICY 1

95
SOCIAL EXPERIMENTATION FOR PUBLIC POLICY 1. Police experiments The statement, made by the british conservative politican enought powell, highlihghts the fact the public policy making involves not only the higher arts of principle, intellect, and persuasion, but also the play of interests and the pushing and hauling of partisans for power and control. While the centrality of interests and prejudices has received a great deal of attention in both the scholarly and popular media, it is powell’s “guesses about the future” and that “staff of economists” that concern us in this chapter. Policy inevitably deals with an uncertain future even with the plethora of statistical series and policy research currently available, policy making has to be based on some degree of guesswork. Powell’s economists who project past trends into the future, now supplemented by sociologists of several hues, shed sometimes flickering light on what the effects of policy interventions will be it is to get closer to

Transcript of SOCIAL EXPERIMENTATION FOR PUBLIC POLICY 1

SOCIAL EXPERIMENTATION FOR PUBLIC POLICY

1. Police experiments

The statement, made by the british conservative

politican enought powell, highlihghts the fact the

public policy making involves not only the higher arts

of principle, intellect, and persuasion, but also the

play of interests and the pushing and hauling of

partisans for power and control. While the centrality

of interests and prejudices has received a great deal

of attention in both the scholarly and popular media,

it is powell’s “guesses about the future” and that

“staff of economists” that concern us in this chapter.

Policy inevitably deals with an uncertain future

even with the plethora of statistical series and policy

research currently available, policy making has to be

based on some degree of guesswork. Powell’s economists

who project past trends into the future, now

supplemented by sociologists of several hues, shed

sometimes flickering light on what the effects of

policy interventions will be it is to get closer to

understanding the likely effects of a prospective

policy that social experimentation was born. The idea

is simple try out a policy on a small scale and see

what happenes.

Since the late 1960s, spending on trials of social

policy proposals in the USA has consumed over a billion

dollars ( burtless 1995). In this chapter we the nature

of social experiments that have been conducted in the

past forty years we review the efforts of many social

scientists and economists to develop systematic empiral

evidence about the likely advantages and disadvantages

of specific policy proposals throughthe conduct of

social experiments themselves and try to project the

current trend line into the hazy future.

2. Definition

Social experiments are randomized field

trials of a social interventation. Within that

rubric, two emphases jostle for primary (and a

third emphasis tags along). Some authors define

social experiments (SE) by emphasizing the “trial”

in randomized field trial. For them, the hallmark

is that prospective intervention is being tried

out on a small scale before it is widely adopted

not only is it being tried out” it is being

studied in its pilot version. The aim is to fine

out whether the intervention achieves its aim. If

so, the assumption is that policy makers should

adopt it on a system-wide basis there is sense of

self-conscious intention to influence policy, and

often this intention is accompanied by a sense of

urgency as the policy window opens.

Other authours put the stress on

randomization. It is randomization that allows

experimentes to have confidence that the

intervention was the cause of whatever changes are

observed. In a randomized study, the experiment

select samples from the same population, assign

one to the intervention, or “experimental”

condition and the other to a “control” condition

at the end of the period, the groups are compared

inasmuch as they were very much the same at the

star and the only thing that differed over time

was exposure to the intervention. From a

methodological point of view, randomization gives

experimenters confidence in their estimates of

effects.

The third focus in the definitions of social

experiments, now widely taken for granted is that

the trial is done in the “field”. Gone is the

comfortable milieu of the laboratory for studying

outcomes. Rather the social scientist counducts

the studies in the precincts in which the actual

policy will be run. Thus we have randomized field

trials.

If the emphasis on randomization is accepted

as the guiding principle. Then any study of

desired outcome conducted through randomization is

an SE Such a definition sweeps in large numbers of

evaluations of existing programs. Many evaluations

of social programs are conduced after the program

are enacted, and some of the evoluation (although

not nearly as many as evaluators would like)

randomize propektive participants into “

experimental” and “control” groups. After a period

of time,the evaluator compares the status of the

two groups on the desired indicators ( e.g..health

status, earnings, school graduation). To blanket

such pos thoc evaluations into the category of SEs

widens the category substantially.

If we confine ourselves to randomized studies

undertaken on attest basis to guide adoption of

future policy, we have a more focused field of

enquiry.it is the definition we adopt here. Of

course , the distinctions are not hard and fast.

Some evaluations of existing programs are expected

to guide future iterations of the program-i.e. to

lead to modifications and improvements in a

possible model for federal policy (states as

“laboratories of democracy”),what is an evoluation

at one level is an SE at another Still, the

distinction is useful to hold on to. It is

important to consider the main purpose for which

the SE is done as well as its research design.

3. History

With a little difficulty we could probably trace SEs

back to Francis Bacon, but it is sufficiently

historical to go back to Sidney and Beatrice Webb. In

their 1932 book, Methods of Social Study they argue for

scientifically based social policy in words that have

remarkable resonance for our own times. They advocated

research conducted by social scientists trained in

experimental methods who conduct independent social

investigations and transmit their results to those

making social policy. The actual methods, as Ann Oakley

(1998a) has pointed out, were developed by

educationalists and psychologists in the USA in the

late nineteenth and early twentieth centuries. The

philosopher, Charles S. Peirce, the father of

"pragmatism," introduced the idea of randomization into

psychological experiments in the 1880s. Some: of the

early studies dealt with the transferability of memory

skills from one subject to another (Oakley cites

Thorndike and Woodworth 1901 and Winch 1908). These

psychological researchers invented techniques for

randomly assigning subjects to experimental

treatments.. R. A. Fisher who did his research in

agriculture and developed much that has become

commonplace in statistics, is widely known for

championing randomization methods.

With regard to the "field" aspect of policy

experiments, Oakley,(1998b) reminds us that two US

sociologists, Stuart Chapin at the University of

Minnesota and Ernest Greenwood at Columbia University,

applied experimental methods to the study of social

problems in the early years of the twentieth century.

Where psychologists tended to work in laboratory

settings, pioneering sociologists took their research

out into the community. Chapin (1947) describes nine

experimental studies that he and others carried out on

topics such as recreation programs for delinquent boys,

social effects of public housing, and effects of

student participation in extracurricular activities.

Where others had stated that randomized experiments

could be done only under antiseptic laboratory

conditions, he was interested in demonstrating that

they could be adapted to community settings as well.

Greenwood provided a theoretical rationale for applying

experimental methods to social issues, described in his

book Experimental Sociology (1945).

In the first half of the twentieth century, most

of the forerunners of current SEs were evaluations of

existing programs. They shared many of the

characteristics of experiments, but dealt with programs

that were already up and running. The intent,

nevertheless, was very similar: to see whether a

program worked and, if it proved successful to extend

and expand it. One evaluation that gained a great deal

of attention was the Perry Preschool Project,' largely

because the preschool participants were followed up

into their late twenties and because their lives turned

out to be significantly more successful than the lives

of kids in the control group (Schweinhart, Barnes, and

Weikart 1993). The data provided much of the

justification for authorization and reauthorizations of

the Head Start program and other early childhood

programs. Among other noteworthy early studies were the

Eight Year Study of progressive high schools, conducted

by Ralph Tyler (.unpublished), the Cambridge-Somerville

youth' worker program that aimed to prevent juvenile

delinquency (Powers and Witmer 195i), and the Hawthorne

studies of reforms to working conditions in a Western

Electric plant (Roethlisberger, and Dickson 1939).

A relatively small number of evaluation studies

used randomization for assigning participants, but some

of them sought to introduce controls in other ways.

Campbell and Stanley (1966) wrote a landmark monograph,

Experimental-and Quasi-Experimental Designs for

Research, classifying the designs of studies that had

been reported. In the language of the time,

"experimental" meant that the study had randomly

assigned participants to the program (or several

variants of the program) and to a control group that

did not receive the program. "Quasi-experimental"

designs used other strategies to reduce the threat that

something other than the program was the cause of

whatever differences appeared between the groups..

Although perhaps not its intent, the Campbell and

Stanley book tended to legitimize quasi-experiments for

evaluation purposes. Campbell and his collaborators in

subsequent versions of the book (Cook and Campbell

1979; Shadish, Cook, and Campbell 2002) have sought to

overcome the impression and place randomization back in

priority position.

It wasn't until after the Second World War that

the three main ideas of SE were combined in. large-

scale investigations-randomization, study in the field,

and intentional preparation for policy change. With the

War on Poverty in the 1960s, SEs began their modern

history. The first noteworthy SE of the, period was the

series of income maintenance experiments. They began in

1968 in four sites in New Jersey and were followed by

parallel studies in a series of urban and rural

locations. The program was an effort to change the

existing welfare system by the provision of a

guaranteed annual income to poor people (Cain and Watts

1973; Kershaw and Fair 1976; Danziger, Haveman, and

Plotnick 1981). The aim of the experiment was to test a

policy innovation prior to enactment.

The income maintenance experiment was followed by

experiments with housing allowances (Carisor and

Heinberg 1978; Friedman and Weinberg 1983; Kennedy

1980), health insurance (Newhouse 1993), performance

contracting in education (Rivlin and Timpane 1975), and

job search (Wolfhagen 1983). Greenberg and Shroder

(1997) provide reports on 143 SEs conducted in the USA,

one in Canada, and one in the Netherlands. All of them

were randomized field trials of prospective new

policies (although the policies studied in the later

experiments generally represented merely incremental

changes in existing programs). Only experiments that

had reported results by 1996 are included in the

inventory. Their appendix lists seventy-five SEs then

still in progress.

To ground the reader in some real examples, Table

39.1 provides information on four SEs which we refer to

in the following discussion.

Income maintenance experiments. Four income

maintenance experiments were run in the 1960s, and

197os at eleven sites to test the impacts of variations

in a negative income tax program for low-income

families. Families were provided with a guaranteed

level of benefits and were allowed to earn additional

income through work. Program benefits were reduced by a

set fraction for each dollar earned. The findings

showed that families reduced the number of hours they

worked but not by-significant amounts. Other results

were mixed; with small positive results on many

measures. However, by the time results were reported,

the political climate had changed. Congress was in no

mood to give the poor a blank check. The long and

hugely expensive experiment (Greenberg and Shroder 1997

report the cost as $111.7 million) had little policy

impact.

The health insurance experiment conducted by the

RAND Corporation tested the effects of varying levels

of cost sharing on the use of health services and

health outcomes. It randomly assigned families to one

of fourteen fee for service plans or an HMO. A total of

7,708 individuals were tracked in six sites chosen to

represent the United States over a period of eight

years, making the experiment one of the largest and

most expensive in American history. The findings showed

that overall, cost sharing reduces use of medical

services without substantial negative effects on

health. This proved to be a factor in: later acceptance

of cost sharing as a cost containment strategy in both

public programs and private insurance plans.

Welfare to work programs. In the 198os, the

Manpower Development Research Corporation (MDRC) tested

ten specific: state programs using random assignment,

measuring the impacts and benefit-costs of state

welfare-to-work programs, as well as studying, their

implementation. State and, local' governments designed;

implemented, and operated the programs that were

evaluated, and the MDRC developed the evaluation design

and conducted the, actual evaluation. The findings

showed that the tested programs increased earnings and

reduced the size of the welfare rolls, the benefits to

society as a whole exceeded the social costs of the

programs,, and the programs usually resulted in net

savings for taxpayers. However, the effects were

relatively small.

Nursing home incentive reimbursement experiment.

This experiment, conducted from 1980 to 1983, tested

the effects of incentive payments for proprietary

nursing homes.

Table 39.1 Four Selected social experiments

Experiment Tested intervention Design Result Dissemination

maintenance

exreriments

(1968-78)

RAND health

insurance

experiment (1974-

MDRCwelfare-to-

work experiments

Income supplements

for welfare

recipients with

varied tax rate for

paid work

Varied cost

sharing' for

medical services

Provided job

training and other

employment services

Randomly assigned

families to varying

benefit reduction

rates in 11 sites

Randomly assigned

families to different

cost-sharing; programs

in 6 sites

Randomly assigned AFDC

recipients to various

employment program, in

10 sites

Payment of income

subsidy slightly

reduced number of

hours worked

Increases in Cost

sharing reduced use

of health services

without

significantly

affecting' health

status

Consistent small;

positive

widely published

in books, journal

articles, and

reports

Numerous

publications,

widely

disseminated

Widely

disseminated

during welfare

debates

Not widely

(1975-88)

Nursing home

incentive

reimbursement

experiment (1980-

to AFDC

participants

Provided

reimbursement

incentives for

nursing homes

accepting Medicaid

patients

Randomly assigned 36

nursing homes to

participate in

intervention program

or control group

effects on

participants'

earnings, reductions

in welfare - rolls

and in cost to

taxpayers

Little ettect of

reimhursemerit on

health outcomes or

discharge of

Medicaid, patients.

Slightly incrosoed

admissions of heavy

care patients

disseminated

The aim was to encourage them to accept more hard-

to-care-for Medicaid patients and to discharge patients

to lower-care facilities when they had attained

acceptable health status. The study was conducted with

a total of thirty-six nursing homes in San Diego

County, eighteen of which were in the control group.

Findings showed that in the first .year of the

experiment there was no difference between the two

groups of nursing homes in the intensity of care that

admitted patients required, but in the second year the

experimental nursing homes did admit patients in need

of more intensive care. No statistically significant

differences emerged on achievement on patient health

goals or on patient discharges to less expensive

facilities. The small size of the sample and the

shortness of time over which the experiment was run

(thirty months) militated-against significant

differences. The findings were, not disseminated

widely, and few people heard about the results.

4. Themes

It seems obvious that social experiments (SEs) are

conducted to improve decision making regarding policies

under study. However, a direct; relationship between the

results of SEs and policy decisions presumes a rational

policy environment with established pathways for

information from experiments to feed into policy

decisions. The relationship between the conduct of SEs

and the policy environment is more complex -than such, a

simple statement suggests. SEs are generally lengthy and

results arrive in changed, sometimes unreceptive policy

space. Experiments arise for a variety of reasons and are

not always set up to answer directly specific policy

questions. And indeed experiments are, but one in a

multitude of information sources that policy makers must

consider when making policy decisions.

In this chapter, we explore the relationship of SEs

to policy making. First we look at the advantages of

conducting such experiments. We examine contributions to

policy and contributions to social science. Then we

describe the disadvantages that SEs entail both for the

policy process and for social science. Last, we puzzle

about their future, in a near-sighted attempt to foresee

what use is likely to be made of SEs as political and

economic conditions change.

We admit that our view is largely a United States

view, but that is not totally our doing. The story of SEs

has been largely a US story. The first large experiments

were done in the USA and most of the subsequent work has

been "made in the USA." In recent years, Canada has

jumped on the bandwagon, and the Netherlands has also

conducted a few experiments. But most of the experience

on which the policy world relies is US work.

Running alongside our discussion of advantages and

disadvantages of SEs are three main themes. Hold the

pages sidewise and you will see these ideas: (1) The

Policy world is a complex place. Policy making evolves

from ideologies and beliefs, interests, and institutional

norms, as well as from competing information. "Scientific

evidence" alone will almost never determine the direction

of policy making. (2) The research world is no less

complex. Technical issues bedevil the study of complex

policy issues and affect the extent to which social

scientists can derive authoritative evidence. (3) The fit

between the worlds of policy and research is inexact.

Sometimes the answers that SEs provide bear little

resemblance to the questions that decisions makers ask. A

major misalignment is timing. An experiment may not be

completed until long after the questions' that provoked

the experiment have faded from view. Another issue is

the uneasy pattern of communication between researchers

and policy makers. 'Nevertheless, despite all the

disabilities that affect SEs, we conclude that a well-

done SE provides important information that illuminates

the policy field and has at least the potential for

influencing policy.

5. Advantages of social experiments

1. Policy advantages

Provide Data on Likely Outcomes of a Policy Idea

Social experiments are experimental tests of new policy

ideas. They provide information to people engaged in the

political process of making policy. They advance the

rational component in policy making (Rivlin 1971). Many

policy decisions are made in a relative information

vacuum with little known about the actual effects of the

policies proposed. Data from well-designed tests of

policies under discussion can provide invaluable

information about the realities of the expected effects

of policy adoption, including the potential for

unexpected or negative consequences, In some cases, such

information has counted in decisions to adopt a

particular policy track.' For example, the positive

results of the welfare-to-work experiments' played a

modest role in the further expansion of work requirements

in state welfare programs., In addition, the success of

state-designed and implemented welfare-to-work programs -

may have: encouraged later legislation to give states

flexibility to design state-specific welfare programs

(Greenberg, Linksz, and Mandell 2003; Baum 1991).

Some advocates claim that SEs offer objective

information, unsullied by the pull of interests. But

objectivity is relative. Social scientists for over a

generation have acknowledged that every social science

enquiry is inevitably colored by the assumptions, biases,

and blinkers of its investigator. Nevertheless,

experiments appear less prone to dispute than most other

forms of knowledge.' They collect information

systematically from a known population according to the

canons:; of social science. The element of randomization

adds authoritativeness. When there is contention other

social scientist can reanalyze the data to try to support

their argument. IT, resolving disputes, SEs rely on the

judgement of the community of social scientists.

(See Howell and Peterson 2004; Krueger,and Zhu 2004,

on rival interpretations of school choice experiments.)

On any reasonable scale, experimental information is

credible. In the four experiments that we have cited

here, little important disagreement emerged about the

interpretation of the findings.

- Clarify trade-offs

Social experiments can at times clarify the key trade-

offs in policy decisions and provide information to

debate these trade-offs (Orr 1998). For example, the AFDC

Homemaker Home Health Aide Demonstration found that home

care did not reduce health costs but did; improve

clients' sense of well-being. The findings provided

policy makers with information to debate the trade-off

between the costs and benefits of the, program.

- Keep a policy idea alive

One aim ascribed to social experiments is keeping alive a

policy idea that cannot muster enough support at the

moment to ensure passage: The income maintenance

experiment were reportedly undertaken because most

members of Congress did not support a negative income tax

for the poor to replace the welfare system. The federal

Office of. "Economic Opportunity and` academic economists

who favored the idea could not carry `the day, but they

gained support for an experiment (and then additional

experiments) in the hopes of making a good case. They

might also have hoped that the political winds would

changed and members of Congress would come to embrace

their idea for income maintenance for the poor. (Despite

their efforts, the negative income tax was not to be.)

The contrary assumption, that SEs are used to delay

a new policy until the lengthy study is, done, does not

receive much empirical support. Once a policy proposal

has acquired political momentum, it is usually enacted

regardless of evidence,, Before results were available

from the housing allowance experiments, Congress enacted

one feature that was still being tested They passed a

bill; known as, Section 8, that provided subsidized

payments for the poor; in the private housing market.

- Stock a library of information

SEs can create inventories of information for future

policy situations (Feldman 1989). Although their

sponsors, with their eyes focused on current options, do

not intend only to pile up knowledge for the future, -

that is one likely result. Even if the findings of the

experiment have little impact on current discussions,

they do provide a stock of information that future

political actors and analysts can draw on (Orr 1998). For

example, the health insurance experiment notably provided

information on elast - cities in health care demand that

informed later analysis.

- Help to build consensus

The focus and intensity of a social experiment,

coupled with a general acceptance among researchers of

the quality of impact estimates: derived through

experimental designs, may, provide the focal point needed

to draw together diverse actors and information sources

to agreement. The health insurance experiment finding

that cost sharing reduced health care use without

harming, health led to a fairly broad acceptance among

researchers and policy makers of cost sharing as a

legitimate cost containment strategy. Similarly, the

welfare-to-work experiments broadened acceptance of

mandated work requirements in public assistance programs.

- Legitimize existing preference

If the results of an experiment align with

preferences of decision makers, they can provide

legitimacy to existing policies' or preferred

alternatives. They, can reaffirm policies after the

policy has been chosen (Greenberg and Mandell i99i). Some

social scientists worry that this kind of after-the-fact

legitimization is a misuse of social science. But if the

findings suftZ 1 cy_tliat policy actors have already

selected on other grounds, there doesn't seem anything

wrong with giving it a social science seal of approval.

At times, social- experiments may provide political

cover for either difficult or highly contested policy

decisions, shifting the onus of decision making onto

"science." They may offer policy makers a set of data-

driven arguments for or against a

particular policy option.

2. Research advantages

- Spur the development of new research

methods

In order to do the challenging work of SEs, social

scientists have had to develop, new methods and

techniques. They have also had to develop new statistical

methods to analyze the data. The field environment, the

size of the samples, the rarity of certain groups about

whom data is needed, the need to generalize to a, larger

population, the need to measure difficult concepts-all

have contributed to innovations in_ research methods.

Current textbooks bear witness to the methodological

advances spurred by decades of social experimentation.

- Real-life test for social theories

Another advantage for social science is that SE gives

social scientists the opportunity to test theories in the

crucible of real-world settings. They can subject

theories and practices based on those theories to actual

test. This can heip bring abstract theorizing down to a

practical level. For example, theories about the value of

competition in improving the quality of schools are

being tested in a number of SEs that give parents choice

of their children's schools (Howell and Peterson 2004).

Theories about the positive effects of a non-stigmatizing

guaranteed income, implemented through a negative income

tax, were 4tudied in urban and rural areas for extensive

periods of

time.

Many of the pilot ideas that SEs have studied

originated not in social science theories but in

political or practice settings_ For example, the MDRC

welfare experiments did not directly test any specific

behavioral theory. Nevertheless, they often derived from-

or coincided with theories that were current among social

scientists. The studies therefore supported, refuted, or

failed to provide convincing evidence regarding the

theories to which they were related.

- Provide interesting work to social

scientists

SEs are interesting, frontier studies. They generate

considerable enthusiasm among social scientists",-

especially those who -work in research institutes' that

have the resources to do them well. SEs require skilled

staff and the latest statistical knowhow to do this kind

of-demanding work, and only a few organizations have over

time been able to establish and maintain the type of

expertise needed for such work. An analysis of the 143

SEs identified in The Digest of Social Experiments found

that three organizations dominate the conduct '.of SEs in

the USA: Abt Associates, the Manpower Demonstration

Research Center (MDRC), and Mathematica Policy Research

conducted almost half of the experiments reviewed

(Greenberg et al. '1999). In Canada, the Social Research

and Demonstration Association does most of the social

experiments.

One of, the interesting-things about SEs is that

economists are the investigators in most of them.

Economists, who haven't been known for their empirical

fieldwork, in a sense reinvented survey research for the

income maintenance experiments,: and developed sampling

and analysis techniques from their tradition Why

economists? Many of the topics deal with money. They are

testing schemes that expect to reduce government

expenditures. Do welfare-to-work programs reduce the

welfare rolls and welfare costs? Does nursing home=

reimbursement. increase intake of patients in need of

intensive care so that they do not have to stay in

(veryexpensive) hospitals? Do fob-finding-programs reduce

the length of time that unemployed workers receive

unemployment compensation? Another reason for the

frequent presence of economists is that money is easier

to measure than the outcomes that, often concern

sociologists and psychologists, such as "functional

ability" or "age-appropriate childhood development."

Policy makers and thepublic find data on costs and

savings more credible than fuzzier concepts. Economists

have the techniques to study and model data denominated

in dollars.

6. Limitations of SEs

Policy limitation

- Effects on decisions

When we review the history of social experiments, we

see that they have not had a decisive, direct effect on

the ensuing decisions. -Of our four examples, only the

welfare-to-work experiments were later reflected in

policy. Neither the health insurance experiment, the

nursing home incentive reimbursement experiment, nor the

income maintenance experiments made much of a dent at

all, and the findings were relegated to the great

analytical storehouse. Even in the welfare-to-work

experiments, where experiment results seemed to affect

later policy, the result was at best indirect.

Greenberg, Mandell, and their colleagues did

a .telephone interview study of welfare directors in the

states. They found that while most of the state directors

knew something about the findings of the welfare-to-work

experiments (although not the specifics), they didn't

believe the findings had influenced the policies of their

own state. What they did value was the demonstration that

states could administer the program without much problem

and a general sense that work first was better than

training first for former welfare recipients. In their

2003 book, Greenberg et al. conclude:

Ironically, however, even though these experiments

did have important effects on policy, their role was

nonetheless limited .In particular, many

policymakers already viewed the programs tested by

the welfare-to-work experiments as attractive on

other grounds. Findings from the experiments simply

reinforced that view. Consequently, rather than

being pivotal to whether the types of programs they

tested were adopted, they were instead used

persuasively and in designing these programs. In

other words, they aided policymakers in doing what

they already wanted to do. (2003, 308, 310)

Why should the results of SEs be so marginal? Why doesn't

rationality reign?

Social scientists are under no illusions that

"scientific evidence" will displace all other sources of

understanding. Policy making is also based on ideologies

and beliefs, interests, competing information, and

institutional norms (Weiss 1983, 1995). The results of

social experiments can nudge policy only a small

distance, and their influence is dependent in large part

on the interplay with the other factors in the policy

environment. Social scientists know that legislators and

administrative officials have long-standing beliefs and

principles that guide much of their orientation toward

policy. Their ideological orientation exerts powerful

influence over which policy proposals receive even a

hearing. Attitudes toward abortion and gay marriage are

obviously determined by ideology and principles, but it

is not only on such extreme issues that ideology often

prevails. For some policy makers, similarly strong

beliefs affect their views of the enactment of a draft,

the need for standardized performance tests in schools,

mandatory sentences for repeat offenders, and needle

exchange programs for drug addicts.

Interests are always powerful' influences on policy.

Drug manufacturers, farmers, radio station owners, state

and city service workers, trial lawyers, charities,

utility companies, universities, hospitals-almost every

organized body in the nation seeks to promote its own

well-being through public policy. The jostling among

organized interests provides much of the drama in the

policy arena. The scene is marked by the formation and

dissolution of temporary coalitions of interests as the

issues on, the agenda shift and change.

Nor does social science represent' the only form of

legitimate information.' The policy world is awash with

formation. Lobbyists hawk their- own version of past

events and futures. Media columnists and editorial

writers add to the stew. Many organizations have their'

own in-house information resources-databases, research

units, news services. Theavailability of 24/7-web-based

information' in titanic proportions makes getting

information much less difficult than interpreting the

information with a sense of history and context.

Furthermore, each-institution in the policy system

has its own setof rules and norms. The US Congress, =for

example, proceeds according to a 'system of committee

appointments, minority/majority representation on

committees, vote taking, reporting to the full body,

closing off debate, reconciling different versions of

bills passed by the-two houses, as well as time

schedules, budget limits, pressure group access, and so

on, that have major influence on the nature of policy

that emerges. Ron Haskins'(1991) tracked the instances

that the MDRC research was mentioned at various times in

the welfare reform policy process and found fewer and

fewer specific mentions of the MDRC research as the

welfare policy made its way through hearings, bill

writing, and consideration in the House and finally in

the HouseSenate Conference. The internal norms and

culture of each institution in the policy system:

exercise great pressure on its own activitiess and on the

activities of other institutions with which it interacts.

These four. sets of influences-ideology-and beliefs,

interests, other information, and institutional norms-set

limits to what social science .can contribute and how

much attention it can mobilize. Social experimentation;

as one small subset of social science research, is even

further constrained by the surround.

- Misuse of research findings

The results of SEs can be. misused in policy

discussions (Orr 1998). As with any source of

information, policy makers may choose to disregard

results if they are not congruent with their own beliefs

and political agendas. During the congressional welfare

reform debates, the welfare-to-work research was used to

argue that education and training were effective

strategies and that large amounts of federal funding were

needed to produce effects. In fact, education and

training received little attention in the programs

studied, and the experiments showed that relatively

lowcost job search and work experience were effective

(Haskins i99i).

Policy makers may take note of the general public

reaction. If the ouhiic is not interested or is skeptical

of certain results, policy makers have little incentive

to push forward any change based onn the results. Results

may not even reach the ears of policy makers if the

sponsoring agents of the studies themselves do not.like

the results. What goes to publication can be influenced

by the satisfaction (or dissatisfaction) of the agency

that asked and paid for the study in the first place.

Less insidious is a simple lack of dissemination of

experiments' results. In the nursing home incentive

study, the departure of the federal staffer who had

sponsored the studyf contributed to the lack of

dissemination of the findings. Few people learned of the

results, and little use was made of the findings

(Greenberg et al. 2003). A reanalysis of the data that

showed more positive results from incentives. (Norton

1992) we nt.almost totally unnoticed..

Contributing to the risk of misinterpretation or

misuse, policy makers may not have a particularlyhoned

sense for the quality of research or indeed have the

skills to interpret results correctly when they are

presented with them (they are not alone... it

is ,difficult for everyone). Policy;jnakers tend to rely

on indirect indicators of quality suchh as the reputation

of the,researchers, how the research community reacts to

the +results, and whether the research fits with their

own preconceived notions of what the results should be

(Orr 1998).

- Simplistic thinking

The results of SEs can be. misused in policy

discussions (Orr 1998). As with any source of

information, policy makers may choose to disregard

results if they are not congruent with their own beliefs

and political agendas. During the congressional welfare

reform debates, the welfare-to-work research was used to

argue that education and training were effective

strategies and that large amounts of federal funding were

needed to produce effects. In fact, education and

training received little attention in the programs

studied, and the experiments showed that relatively

lowcost job search and work experience were effective

(Haskins i99i).

Policy makers may take note of the general public

reaction. If the ouhiic is not interested or is skeptical

of certain results, policy makers have little incentive

to push forward any change based onn the results. Results

may not even reach the ears of policy makers if the

sponsoring agents of the studies themselves do not.like

the results. What goes to publication can be influenced

by the satisfaction (or dissatisfaction) of the agency

that asked and paid for the study in the first place.

Less insidious is a simple lack of dissemination of

experiments' results. In the nursing home incentive

study, the departure of the federal staffer who had

sponsored the studyf contributed to the lack of

dissemination of the findings. Few people learned of the

results, and little use was made of the findings

(Greenberg et al. 2003). A reanalysis of the data that

showed more positive results from incentives. (Norton

1992) we nt.almost totally unnoticed..

Contributing to the risk of misinterpretation or

misuse, policy makers may not have a particularlyhoned

sense for the quality of research or indeed have the

skills to interpret results correctly when they are

presented with them (they are not alone... it

is ,difficult for everyone). Policy;jnakers tend to rely

on indirect indicators of quality suchh as the reputation

of the,researchers, how the research community reacts to

the +results, and whether the research fits with their

own preconceived notions of what the results should be

(Orr 1998).

- Ability of research to work in the policy

world

Social experiments take place in the messy world,

The kinds of social scientists who have the requisite'

knowledge of research design, sampling, measurement, and

statistical analysis are not always the kinds of social

scientists who communicate well with political actors.

Experimenters in these circumstances have to listen. They

have to be aware of what policy options are feasible.

They should know the history of political battles already

waged on the turf. And still they have to know the

scientific literature and the intricacies of research

design and conduct. Such people can be hard to find. In

their stead come' highly skilled researchers who may have

little skill, and often less interest in aligning their

experiment with the world of politics.

- Heightened scrutiny

The results of social experiments may fare somewhat

better than other research findings as they are less

assailable by opponents. This occurs, in part as the

research community tends to support the results of

randomized experiments and thus, may present a more

unified front for policy makers trying to understand what

researchers believe. Thus, for example, the health

insurance experiment produced generalized agreement among

the research community that cost sharing could reduce.

health care . without detrimental : 6ects on~health-a

question that until then no study had adequately

answered. And yet; even some of the best social

experiments are open to methodological critique and

indeed sometimes may be treated to a more rigorous

critique than might be. expected due to their high

visibility in both the research and the policy worlds.

The school choice experiments are an example (e.g. Howell

and Peterson 20o4; Krueger and Zhu 2004). Because

parental choice of schools is such a politically loaded

issue, studies are scrutinized in meticulous detail.

Research limitation

Social experiments. are not, easy to bring off. To

be at all persuasive, social experiments require big

slugs of time, lots of money, powerful research

expertise, and enough flexibility to respond to changing

conditions and questions while the experiment is in

process. The impact of social experiments on policy

making is limited not only by the political process but

also by the constraints and limitations of the research

world. Social science methods themselves are not always

ideal for describing and analyzing complex policy issues.

- Design challenges

Researchers are plagued by a series of challenges

when conducting research in the real world. Experiments

pose difficulties all along the way. The first problem is

choice of sites. Even though the policy option that an

experiment is testing is usually intended to apply to all

members of the relevant group in the nation (or `he

state), the experiment cannot, be implemented among a

random sample chosen throughout the nation. The

intervention can, be offered (and studied) in only a few

places. Even the most expensive SEs have had to limit the

intervention' to a few sites. How does the researcher:

decide what sites are "typical" or "representative"

enough to.stand in for the nation as a whole? Researchers

avoid places: with :obviously unusual features, but much.

of the choice depends on which sites agree to cooperate.

Another problem is recruitment. The design demands

enlistment of nursing homes or low-income. households,

and the experimenter has to convince the required number

of units to sign on. About half of them have to be told

that they will not receive any new services but will be

required to give periodic information. Locating

participating units, explaining the conditions of the

experiment, and convincing them to participate is no

small task. Then there is the issue of when to tell

participants that they might be in the control group and

receive no service at all. Cook and Shadish (1994)

provide a balanced discussion of the pluses and minuses o

revealing the possibility of control group status a

various points in the recruitment process. It is an

important issue because if people (or organizations)

refuse to participate because they know about the no-

service possibility, the randomness of the assignment. is

compromised.

Another problem is being sure that the program is

being implemented as, planned. If, say, the state welfare

agency is not delivering the job-search services it is

supposed to be offering, i.ee the intervention is not on

offer, the SE would be testing the effects of a phantom

policy or of an unknown intervention of the agency's

own .devising. Results of the SE would be meaningless.

From experienrg, researchers have learned the importance

of monitoring the implementation of the intervention.

Probably the most basicdesign issue is implementing

and maintaining randomization. Often researchers do not

do the random assignment themselves. The operating agency

selects participants for its programs and in the process

is expected to assign participants to intervention and

control groups according to the protocols prepared by the

researchers. The actual assignment is "often' carried out

by a social worker, nurse, physician, or school district

official' (Cook and `Shadish 1994, 550. Sometimes these

people misunderstand what they are expected to do; and

sometimes they are tempted to use their professional

judgement in assignment decisions. Researchers have

learned that they must not only train agency staff but

also maintain an oversight presence to ensure that

assignment is indeed random.

Nor is that the end of the problem. What started as

true randomized assignment may become undone as time goes

on. In some cases the experiment does not enroll enough

participants. Agency staff therefore may raid the control

group to fill slots in the program. People labeled

"controls" may in truth receive the intervention. Or, and

this is inevitable, participants may drop out of the

program and the study. That would be fine if they dropped

out equally from intervention and control groups for

similar reasons. However, it is usually more common for

controls to drop out. They are not receiving services and

they have less reason to persevere. For example, in the

income maintenance experiments, higher drop-out rates

were registered in the control group and in some of the

experimental groups receiving smaller benefits than in

the more generous benefit groups.. The effect of

differential drop-out is to compromise the equality of

the groups. A selection bias is reintroduced.

In other cases, the control group may become

contaminated by being inadvertently exposed to the

intervention under study. Teachers receiving an

experimental professional development course may share

some of their new learnings with fellow teachers in their

school, regardless of their official "control" status.

The list of complications goes on and on. As

researchers have become more sophisticated over time and

with experience, they have identified a host of further

threats to the validity of SEs. Manski and Garfinkel

(1992) suggest that some interventions might cause

changes in norms and attitudes in the community, and. the

changed community attitudes would influence the success

of the intervention. Heckman (1992) and Heckman and Smith

(1995) have written that people who enlist in SEs may not

be representative of people who would participate in

full-scale programs. Moffitt (1992, 2004), too, has

worried about "entry effects," the conditions of a full-

scale program that would affect participants' behavior

that do not show up in small-scale experiments.

- Time

The worlds of research'and policy do not work in

tandem. Social experiments are time consuming; often

taking many years to design, implement, and finally

analyze and report results. The policy: process meanwhile

has moved forward and the results of a SE arrive in a

new, changed policy environment. Research results may

have little or no relevance' in this changed policy

world. For example, the health insurance experiment began

at a tune when the development of a national health care

system was under active consideration, and the impact of

cost sharing had real relevance. By the time the results

of the experiment were known, the health care debate had

petered out and rational health care was no longer an

imminent possibility. The relevance of the results; was;

greatly diminished (Greenberg et al. 2003).

In the past it has often taken four or five years:

(or more) before experimental results were ready. The

housing allowance experiment ran much longer. It studied

the effect of giving housing allowances to low-income

people not only on the families involved but also on the

supply of housing. It had to go on long enough for

landlords to increase the number of housing units

available to recipients off allowances. The study ran (in

two cities) for eleven years (Bradbury and Downs i98i).

On the other hand, some experiments are too short to

produce convincing results. The nursing home incentive

study ran for thirty months. Many nursing homes were

evidently not willing to change their practices in

response to the short-term monetary incentives. One of

the sponsoring agency's reports states:

To the participants [nursing homes]... it may seem a

very brief duration and there may be reluctance to make

staffing,, policy, and organizationalchanges which could

affect their environment long after the experiment is

concluded. (Greenberg et al. 2003,107)

Yet even within that brief' time period, the-study-

was not able to catch the wave. By the time it was

completed, political interest had moved away from

incentives- and toward regulation.

Foresight is not a particularly strong point of

social science. Trying to figure out what policy issues

will be lively at some future point is an exercise for a

soothsayer. Knowing how rapidly the political canvas

changes, knowing how volatile the complexion of

government is these days with the country divided almost

equally between Republicans and Democrats, knowing how

policy windows open and shut as the economy changes, can

we ever be confident that we are foreseeing an

appropriate mix of interventions? Many people worry about

issues of causation in experimentation. We worry aboutthe

clouded crystal ball. Fortunately or not, in recent years

SEs have become more modest. As noted in the next

paragraph, they are making do with available data, and

they are taking' less time to complete. But they are

testing more modest initiatives.

- Expense

Expense can limit the value that social experiments

can provide to policy making. There is generally a direct

relationship between the complexity of a research design

and its cost. The more, policy alternatives, settings, or

types of participants tested, the more expensive is the

experiment likely to be. Thus, cost plays a direct role

in limiting the relevance of the findings of social

experiments to particular policy questions.. Over time,

social experiments appear to be becoming simpler and

consequently cost less. Greenberg et al. (1999) suggest

that this is due in part to the increased use of

administrative databases rather than special surveys, an

increase in the likelihood that organizations that would

run the program are the ones involved in the social

experiment (as opposed to developing new programs run by

the research organization), simpler designs with fewer

groups, and shorter tracking periods for participants.

- Limits on how much can be tested

It is a rare experiment that can test all the

variations in a particular policy that may be relevant to

the question under study. Thus, the findings of social

experiments are limited only to specific alternatives

tested. SEs take place in a limited number of sites with

a particular set of participants, and the findings may

not generalize to other settings or participants. The

time horizon is often truncated (although not in the

health insurance experiment). Only a few social

experiments can assess trade-offs among components of the

intervention. Almost none are large enough to examine

differences among multiple subgroups of the client

population (the income maintenance experiments are an

exception). Few examine the behavior of the staff

implementing the program and so have little to say about

practices that are associated with better or• worse

outcomes. Costs of the intervention are not always

carefully . calculated (fo. example, mi the nursing' home

reimbursement experiment, officials were unable to

separate costs of running' the program from costs of the

study - (Greenberg and Shroder 1997)).

A distinction can be made' between "black box"

experiments, which test one or a few treatments, and

"response surface" experiments that test a wide range of

treatments (Greenberg et al. 20o3; Buttless x995).

Examples of the latter are the income maintenance

experiments of the 196os and 197os in which income

guarantees and tax rates were varied across the treatment

groups and the health insurance experiment in which cost

sharing was varied across the groups. Greenberg et al.

(2003) conclude that if the particular intervention that

is being tested is still on the policy agenda when the

experiment is concluded, the black box experiment would

be fine. However, that is almost never the case. The

advantage of the "response surface" experiment is that

the design allows for the estimation of elasticities over

a range of treatment options and its results can be used

in later simulation models well into the future.

- Small effect

Social experiments almost never produce slam-dunk

findings. If a proposed intervention were so obviously

superior, there would probably be little reason to

experiment. Most policy proposals are uncertain. The

results of experimentation are often marginal. There are

small gains in certain circumstances with some

subpopulations. Interpretation becomes critic .

Because experimentation is such a difficult craft,

the results are not always authoritative. Decisions about

the course of the experiment have to be made all along

the way. Compromises are made, sometimes in response to

crises in the environment;' sometimes to fit within a

budget, sometimes to suit the skills of the available

staff, sometimes to meet deadlines, sometimes in an

attempt to answer new questions that emerge in the course

of the study. Other researchers will critique the

findings. They may reanalyze the data. They will come up

with new models' that they claim better account for the

patterns in the data. The experiment can get' captured by

the research experts and become fodder: for struggles for

dominance.

- Fasibility of random assignment for

organizational/community intervention

Some innovative policy ideas involve intervening in

neighborhoods or systems or states. Rather than giving

service to individuals one at a time, the proposed policy

is designed to change the practices and culture of a

larger entity Examples include: changing the attitude of

welfare offices so that staff priority is to place the

client in a job; changing the practices in a neighborhood

so that families, restaurants, and law enforcement

agencies actively work to prevent youngsters from

drinking alcohol; and changing the culture of a school

system so that teachers and administrators actively

welcome parents to participate in their child's

education. To test ideas like these in an SE requires

study not o much of individuals as of the units that are

being altered welfare offices, neighborhoods, or school

systems. The interest is the behavior of the

collectivity.

The obvious solution. is to randomize the unit. A

certain number of school systems or neighborhoods might

be assigned randomly to the intervention or to a control

group. However, as the size of the unit increases (say,

to counties or states), fewer units can be. studied.,. It

is extraordinarily difficult and, expensive to study a

large number of neighborhoods or counties, and -few

studies have managed to go beyond ten or twelve. However,

with only a limited number of cases, the laws of

probability do not necessarily work. Any differences

observed between the intervention group and the control

group maybe the result of chance. There,are too few cases

to even out the lumps of chance. Therefore, randomization

of large units is a partial solution at best. Here is an

issue where research innovations are needed and are

currently being developed.

Another reason for the objection to random

assignment is that a city is not a city is not a city,

nor are neighborhoods interchangeable, or health systems

or schools. Each of them has a history. Each has a set of

established traditions. Each has a culture that has

developed over generations. Each has attracted particular

kinds of civic organizations-and program staff and

residents. Harlem is not the South Side of Chicago, which

is not Watts. P S 241 in Brooklyn is not the same as the

Condon School in Boston (Towne and. Hilton 2004). Even if

a researcher were randomly to assign neighborhoods, they

wouldn't be totally comparable, and differences observed

at the end might be due not so much to the intervention

as to the whole complex of prior history and culture.

For, example, an evaluation of a program to promote

nutritious food products randomly assigned supermarkets

in Washington and Baltimore. The intervention group of

markets placed nutritious products in favorable shelf

locations anddistributed fliers about nutrition. The

control group did nothing. The measure of success was the

customers' purchase of nutritious foods. Results showed

that there were more differences between the two cities

than between the experimental and control groups.

- Etnis

Ethical issues have dogged experimentation since its

beginning. People have displayed considerable concern

with withholding a social good from one group regardless

of degree of need. Practitioners are often loath to allow

services to be allotted on the basiss of chance, without

exercise of their own professional, judgement.

Beneficiaries of service object strongly to being placed

in a no-service control group. A host of ethical issues

(withholding services for those eligible, full disclosure

of experimental procedures, right to refuse, harm to

participants) may significantly limit the questions that

social experiments can address.

The rebuttal is that no one really knows whether the

service is a social "good" until it has been studied.

Many experiments find that the intervention is no better

than standard service--or even detrimental. Thus, the

nursing home reimbursement experiment did not show

positive effects from the reimbursement scheme. Bickman's

study of intensive mental health service, which included

all the professionally fashionable bells and whistles,

showed that intensive service did not have better results

than regular service (Bickman 1996).

- Complexcity of interventions

Perhaps the most vivid argument against experiments

is. that they assume that interventions have a simplicity

that can be captured in a treatment/no-treatment design.

Many interventions are highly complex social

interactions, and simple causeand-effect patterns may not

be easily detected. The "program" iss often implemented

differently by staff, and the desired outcomes are social

processes that cannot be readily measured by simple

metrics. Studying the effects of psychotherapy, for

example,_ voses all manner of problems because of the

inherently personal ways irn whicr, therapists work and

clients respond. No matter what label one affixes to the

"brand" of psychotherapy, or how assiduously one tries to

train therapists; to use the same procedures, critics

argue that quantitative randomized studies cannot yield

sensible results.

Similarly, educators often say that interactions

within a classroom, such as` the introduction of a new

teaching method, cannot be studied appropriately by

quantitative randomized techniques. The assumption that

all teachers trained in the new teaching method will

implement it consistently, and that children in all

classrooms will react in similar ways, represents a

fundamental misunderstanding of the variability of

teaching and learning. The rejoinder is that despite the

variability, which certainly introduces more error of

measurement, large samples should show the extent to

which mean scores (of social functioning, of math

achievement, of attendance) differ, across populations

exposed and unexposed to the intervention. In Cook's

(2001) words: "It is not an argument against random

assignment to claim that some schools are chaotic,,

the;implementation .ofa reform is usually highly

variable, and that treatments are' not completely

faithful to their underlying theories. There is enough

consistency in human behavior, experimentalists claim, to

allow an experiment to reach valuable conclusions about

whether an innovation is worth adopting.

7. Conclusions

We started this chapter with a

descpription of three distinctive traits of

SEs: research In the field, couducted through

random assignment of samples of prospective

beneficiaries to intervention and control

conditions,in order to tesr tht probable

success of a policy intervension. The first two

characteristics are increasingly accepted as

viable and necessary. Research in the field has

now become mainstream practice. Randomized

studies have received considerable support not

only from the research community (

although some researchers, particularly in the

field of education, have lodged vigorous

dissents) but also in congress. For example,

the education studies with randomized design.

It is the third feature that may on longer be

as firmly established the prospective test of

alternative policies.

SE came into prominence in the late 1960

at a time of turbulent policy change. It was

part of the climate of innovation and radical

reforn that was sweeping the country. In the

late 1980s and 1990s, as interest in

fundamental change lessened, the fortunes of

experimentation also shifted. Experiments

continued to be done, more of them in fact, but

fewer resources were devoted to them. The

emhises changed from major innovations to

marginal improvements in existing programs. In

burtless’s words, they were “ narrower”

(1995,63). Now, at a time of budget deficit and

fiscal stringenty in the USA ad elsewhere, the

likelihood of new domestic initiatives seems

low it is not a time when large new ideas will

be tested. At least with government funds. The

trend is to test minor modifications,

preterably cost-saving modifications, and

shifts of activity to the private sector. If

you were considering intestment in large-scale

SEs, our advice would be: hold off. The product

is a sound one, with high potential, but the

time is not now-at least in the USA. But hang

in. some version of SEs will have their day.

We also began our story with an outline of three

themes- the complexity of the policy world, the technical

complexity of the research world, and the alignment or

misalignment between experimental findings and policy

questions. Overall, SEs have showed the possibilities and

the limits of affecting policy through social science

research. They have contributed considerable new

knowledge. Some of their findings u have infiltrated the

policy arena and are part of policy-speak (Anderson 2003;

Weiss 1999): Influentials in Congress, federal agencies,

international organizations, interest groups, and the

media learn to be conversant with experimental findings

in order to take an informed part in, policy

conversation.

On the other hand, there are no examples of an SE

that led directlyy to policy change. Results of the

health insurance experiments were so late and so

unfocused on actual legislative proposals that they were

pretty much, ignored except by economists, who have used

them to model new proposals. The nursing home

reimbursement experiment results also arrived late, after

the zing had gone out of the incentive idea. Almost

nobody was still interested in incentives for nursing

homes; the action was in the area of regulation. While

widely published, the .income maintenance experiments led

to little concrete change in policy. The welfare-to-work

experiments seemed to have policy consequences. The MDRC

study provided support for mandatory work-first

requirements and demonstrated the ability of states to

design and manage their own welfare programs. All three

of these program design aspects ultimately ended up in

the Family Support Act of 1998. Nevertheless as we have

seen, the experiment merely reinforced what policy makers

were planning to do on other grounds.

Because poli making is such a complicated business,

with so many players pursuing suc divergent interests, it

is overly optimistic to expect research information to

carry the day. Even the high-quality information supplied

by SEs cannot overwhelm all the other forces on the,

scene. And as we have seen, the timing of SEs is often

off. The policy agenda moves on, while the SE is still

studying last year's proposals.

Yet, totting up advantages and disadvantages, we

come out in favor of further experimentation. The world

is in dire need of greater understanding of the

consequences of government .action. Social

experimentation cannot fully satisfy the needs for

knowledge about policy outcomes, partly because of the

intrinsic nature of social science research and partly

because of the limitations imposed by the conditions

under which it is done. Still it makes headway. Anything

that advances rationality in the messy world of policy is

worth supporting. Not venerated or kowtowed to, but

cheered on.

But we also need to moderate our expectations of the

contributions that SE can make. The notion of basing

policy strictly on experimental evidence is wrong-headed.

SE doesn't tell everything that a polity needs to know

about a pending policy option. Many other considerations

have to go into government action, such as popular

demands, costs, capabilities available for implementing

the policy, competing needs, effects on neighboring

policies, and so on. Resolution comes through politics.

Although the word has fallen A evil: times, politics is

the systemm we have for resolving differences in our:

complex societies and reaching decisions that are at

least minimally acceptable to all parties (for a

resounding affirmation of politics, see Crick 1972),

Evidence of polity outcomes cannot and should not

supplant the play of politics as the basis of policy. Of

course, we do not want to see policy developed on the

basis. of faulty understanding of the situation or

unrealisticc expectations for the effects of action, but

it does seem presumptuous to think that experimental data

alone can point to the best resolution of complex policy

issues. History matters, as do political culture and

institutional practices. What SE can do is illuminate the

understanding of publics and elites and infuse pohcy

discussion with insight.

Scienceand politics cohabit in the policy sphere,

but their alliance is an uneasy one. Social scientists,

to put the best face on the relationship, have pointed to

the "value-added" features that social science 'brings

'to the table an inventory of knowledge for the future to

draw on, general enlightenment of elites and publics in

the present, puncturing of faulty assumptions, and

confirmation of wise instincts for action.' But for all

the understanding and insight contributed by the social

sciencesand by SEs in particular-they do iota run the

show. There is inevitable tension between science and

politics, and convergence is usually a happy accident.

Eksperimen sosial bagi kebijakan public

1. Eksperimen kebijakan

Pernyataan yang dibuat oleh Inggris konservatif enoch politikus powell, highlihghts fakta pembuatan kebijakan publik tidak hanya melibatkan seni tinggi prinsip, kecerdasan, dan persuasi, tetapi juga bermain kepentingan dan mendorong dan mengangkut partisan untuk kekuasaan dan kontrol . Sementara sentralitas kepentingan dan prasangka telah menerima banyak perhatian baik di media ilmiah dan populer, itu adalah powell itu "tebakan tentang masa depan" dan bahwa "staf ekonom" yang menjadi perhatian kita dalam bab ini.

Kebijakan pasti jadi berurusan dengan anuncertain future.even dengan pletora statistis rangkaian dan penelitian kebijakan yang ada saat ini, pengambilan keputusan harus berdasarkan beberapa derajat tebakan. Ekonom Powell yang memproyeksikan melewati tren ke masa depan, sekarang ditambahkan oleh pakar sosiologi beberapa rona, menumpahkan kadang-kadang cahaya kelak-kelik pada apa efek intervensi kebijakan akan be.it adalah mendapat lebih dekat pada pengertian efek-efek mungkin itu percobaan sosial, satu kebijakan prospektif dilahirkan. Ide sederhana mencoba satu kebijakan secara kecil-kecilan dan melihat apa happes.

Sejak akhir 1960-an, membelanjakan percobaan sosial proposal-proposal olicy di USA telah dimakan lebih satu milyardolar ( burtless 1995). Dalam bab ini kita sifat eksperimen sosial yang sudah diselenggarakan di masa lalu empat puluh years.we mengulas upaya banyak ilmuwan sosial dan ekonom membangunkan bukti empiral sistematis tentang keuntungan-

keuntungan dan kerugian-kerugian yang mungkin proposal-proposal kebijakan khusus perilaku throughthe sosial eksperimen sendiri dan mencoba memproyeksikan garis tren saat ini ke masa depan kabur.

2. Definisi

Eksperimen sosial diacak percobaan lapangan interventation sosial. Dalam itu rubrik, dua penegasan berdesak-desakan untukutama ( dan sepertiga tekanan memberi label bersama). Beberapapenulis mendefinisikan eksperimen sosial dengan menekankan persidangan dalam ditunjuk secara acak percobaan lapangan. Untuk mereka, tanda resmi adalah intervensi prospektif mencobasecara kecil-kecilan sebelum ia secara luas adopted.not hanya ialah ia mencoba" ia dipelajari dalam versi perintisnya. Tujuan adalah untuk baik-baik saja keluar apakah intervensi mencapai tujuannya. Jika jadi, asumsi adalah pembuat kebijakanharus mengadopsinya di basis.there seluruh sistem adalah rasa niat sadar untuk mempengaruhi kebijakan , dan sering ini niat ditemani oleh perasaan mendesak apabila jendela kebijakan yhe membuka.

Authours lain menempatkan stres di pengacakan. Itu pengacakan yang memberikan experimentrs memiliki kepercayaan yang intervensi adalah couse apa pun perubahan diperhatikan. Di suatu studi rambang, eksperimen memilih sampel dari populasi yang sama, menugaskan seseorang untuk intervensi, atau condition.and "eksperimental" lain untuk satu "pengendalian" condition.at akhir waktu itu, kelompok adalah compared.inasmuch sebagai mereka sangat banyak sama pada bintang dan satu-satunya yang membedakan dari waktu ke waktu wasexposure untuk intervensi. Dari satu sudut pandang metodologis, pengacakan memberikan kepercayaan orang-orang yang bereksperimen dalam estimasi efek mereka.

Fokus ketiga di definisi sosial sxperiments, sekarang secara luas diacuhkan adalah persidangan dilakukan dalam ladang. Pergi adalah suasana nyaman laboratorium untuk hasil-hasil dtudying. Agak ilmuwan sosial counducts studi di daerah di mana kebijakan aactual akan diadakan. Oleh karenanya kita telah menunjuk secara acak sidang-sidang pengadilan fied. Jika tekanan di pengacakan menerima sebagai asas-asas petunjuk. Kemudian apa pun studi hasil yang didambakan dilakukan melalui pengacakan adalah satu SE.Such sebuah definisi menyapu di evaluasi-evaluasi banyak program yang ada.Banyak evaluasi program sosial diarahkan setelah acara itu diundangkan , dan beberapa evoluation (meskipun tidak hampir sebanyak penilai ingin ) randomize peserta propektive ke "eksperimental" dan grup-grup kontrol. Setelah periode waktu, penilai membandingkan status kedua kelompok di indicatiors diinginkan ( status e.g.health, pendapatan, lulus sekolah). Menyelimuti seperti itu pos evaluasi-evaluasi thoc ke kategoriitu SEs memperluas kategori itu secara substansial.

Jika kita membatasi diri kita ditunjuk secara acak studi melakukan dalam membuktikan basis memandu adopsi kebijakan mendatang, kita memiliki lapangan yang lebih berfokus enquiry.it adalah definisi kita mengadopsi di sini. Tentu saja, perbedaan tidak pasti. Beberapa evaluasi program yang ada diharapkan untuk memandu pengulangan bakal program yaitu mendorong kearah modifikasi dan perbaikan di sebuah model mungkin untuk kebijakan federal (menyatakan sebagai "laboratorium demokrasi"),what adalah satu evoluation pada satu tingkat adalah satu SE pada lain Still, perbedaan bermanfaat menunggu. Ia penting mempertimbangkan tujuan utama di mana SE dilakukan serta desain risetnya.

3. Sejarah

Dengan sedikit kesulitan kita mungkin bisa melacak SES kembali ke Francis Bacon, tetapi cukup sejarah untuk kembali ke Sidney dan Beatrice Webb. Dalam buku mereka tahun 1932, Metode Studi Sosial mereka berdebat untuk kebijakan sosial berbasis ilmiah dalam kata-kata yang memiliki resonansi luar biasa untuk zaman kita sendiri. Mereka menganjurkan penelitianyang dilakukan oleh ilmuwan sosial yang terlatih dalam metode eksperimental yang melakukan investigasi sosial yang mandiri dan mengirimkan hasilnya ke mereka yang membuat kebijakan sosial. Metode yang sebenarnya, sebagai Ann Oakley (1998a) telah menunjukkan, dikembangkan oleh pendidik dan psikolog di Amerika Serikat pada abad kedua puluh latenineteenth dan awal.Filsuf, Charles S, a? Eirce, ayah dari "pragmatisme," memperkenalkan ide pengacakan menjadieksperimen psikologis dalam i88os. Beberapa: dari. studi awal ditangani dengan pengalihan keterampilan memori dari satu matapelajaran, yang lain (Oakley mengutip Thorndike dan Woodworth 19o1 dan Winch 1908). Para peneliti psikologis menemukan teknik untuk secara acak menugaskan subjek untuk pengobatan eksperimental .. RA Fisher yang melakukan itu. penelitian di bidang pertanian dan dikembangkan banyak yang telah menjadi biasa dalam statistik, secara luas dikenal karena memperjuangkan metode pengacakan.

- Desain tantangan

para Peneliti diperburuk oleh suatu siri tantangan ketikamelakukan penelitian di dunia nyata. Percobaan menimbulkan kesulitan sepanjang jalan. Masalah pertama adalah pilihan darisitus. Walaupun pilihan kebijakan yang percobaan adalah untuk menguji biasanya dimaksudkan untuk berlaku untuk semua anggotadari kelompok yang relevan di dalam bangsa (atau 'dia negara),

percobaan tidak dapat diimplementasikan di antara sebuah contoh dipilih acak seluruh bangsa. Campur tangan, yang dapat ditawarkan (dan belajar) dalam hanya beberapa tempat. Bahkan yang paling mahal SEs telah untuk membatasi campur tangan' untuk beberapa situs. Bagaimana Cara peneliti: memutuskan apa yang situs adalah "biasa" atau "perwakilan" cukup untuk.berdiri di untuk seluruh negara? Para peneliti menghindari tempat: dengan :fitur jelas tidak lazim, tetapi banyak dari pilihan yang tergantung pada situs setuju untuk bekerja sama.

- Sederhana berpikir

hasil SEs dapat. disalahgunakan dalam diskusi-diskusi kebijakan (Orr 1998). Dengan apa pun sebagai sumber informasi, para pembuat kebijakan dapat memilih untuk mengabaikan hasil jika mereka tidak congruent dengan keyakinanmereka dan agenda politik. Selama kongres reformasi kesejahteraan perdebatan, kesejahteraan-untuk-kerja riset ini digunakan untuk memperdebatkan bahwa pendidikan dan pelatihan adalah strategi yang efektif dan dalam jumlah besar dana federal yang diperlukan untuk menghasilkan efek. Pada kenyataannya, pendidikan dan pelatihan menerima sedikit perhatian dalam program belajar, dan percobaan yang menunjukkan bahwa relatif lowcost ayub pencarian dan pengalaman kerja yang efektif (Haskins aku99i).

Para pembuat kebijakan dapat mengambil catatan dari reaksi masyarakat umum. Jika ouhiic tidak tertarik atau skeptis terhadap hasil tertentu, para pembuat kebijakan memiliki sedikit insentif untuk mendorong perubahan berdasarkan onn hasil. Hasil tidak mungkin bahkan mencapai telinga dari para pembuat kebijakan jika mensponsori agen daristudi tersebut sendiri tidak.seperti hasil. Apa yang akan menyala untuk publikasi dapat dipengaruhi oleh kepuasan (atau

ketidakpuasan) badan yang meminta dan dibayar untuk belajar ditempat pertama. Kurang berbahaya yang sederhana adalah kurangnya diseminasi hasil percobaan. Dalam studi jompo insentif, pemergian federal mengatakan yang disponsori oleh studyf yang memberikan kontribusi untuk kurangnya penyebaran hasil penemuan tersebut. Beberapa orang belajar dari hasil, dan sedikit menggunakan dibuat tentang penemuan (Greenberg et al. 2003). SEBUAH reanalysis dari data yang menunjukkan lebihbanyak hasil yang positif dari insentif. (Norton 1992) kita perjanjian baru.hampir sama sekali tidak terlihat..

Memberikan kontribusi terhadap risiko-risiko kesalahpahaman atau penyalahgunaan, para pembuat kebijakan mungkin tidak memiliki particularlyhoned pengertian untuk kualitas penelitian atau memang memiliki keahlian untuk menafsirkan hasil dengan benar, apabila mereka dihadapkan dengan mereka (mereka tidak sendirian ... adalah ,sulit untuk setiap orang). Kebijakan;jnakers cenderung bergantung pada tidak langsung dari indikator kualitas suchh sebagai reputasi,para peneliti, bagaimana penelitian masyarakat bereaksi terhadap hasil, dan apakah penelitian cocok dengan berbagai praduga mereka sendiri tentang apa yang hasil harus (Orr 1998).

- Kemampuan penelitian untuk bekerja dalam kebijakan

sosial dunia percobaan mengambil tempat di dunia yang kacau, jenis para ilmuwan sosial yang dipersyaratkan pengetahuan penelitian desain, pembagian sampling, pengukuran,dan analisis statistik tidak selalu jenis-jenis para ilmuwan sosial yang berkomunikasi dengan para aktor politik. Experimenters dalam situasi seperti ini telah mendengarkan. Mereka telah mengetahui apa yang pilihan kebijakan yang layak.Mereka harus mengetahui sejarah politik telah melancarkan

peperangan pada teritorial. Dan mereka masih harus mengetahui literatur ilmu pengetahuan dan pelupuk-merancang dan melakukanpenelitian. Orang-orang seperti dapat sulit untuk menemukan. Dalam menggantikan mereka datang' terampil para peneliti yang mungkin telah sedikit keterampilan, dan sering kurang suku bunga di meluruskan eksperimen dengan dunia politik.

- Meningkatkan pengawasan

hasil-hasil percobaan sosial tiket mungkin agak lebih baik dari penelitian lain seperti temuan mereka kurang assailable oleh lawan. Ini terjadi, di bagian sebagai penelitian masyarakat cenderung untuk mendukung hasil diacak percobaan, dan dengan itu, mungkin ada yang lebih unified depan untuk para pembuat kebijakan mencoba untuk memahami apa yang para peneliti percaya. Oleh itu, misalnya, asuransi kesehatan umum percobaan menghasilkan perjanjian di antara masyarakat penelitian yang dapat mengurangi biaya pemakaian. kesehatan care . tanpa merusak : 6ects)pada~kesehatan-sebuah pertanyaan yang sampai kemudian tidak memadai studi menjawab. Namun, bahkan beberapa best social percobaan terbuka untuk metode kecaman dan memang kadang mungkin akan diperlakukan untuk yang lebih ketat kritik dari mungkin. diharapkan untuk mereka yang tinggi karena jarak pandang baik dalam penelitian dan kebijakan semesta alam. Sekolah pilihan percobaan ini adalah contoh (mis. Howell dan Peterson 20 hai4; Krueger dan Zhu 2004). Karena orang tua adalah pilihan sekolah seperti politik yang dimuat masalah, studi cermat dalam seksama secaraterperinci.

Masalah lain adalah memastikan bahwa program tersebut dilaksanakan sebagai, direncanakan. Jika, katakanlah, badan

kesejahteraan negara tidak memberikan pekerjaan-pencarian layanan yang seharusnya menjadi korban, aku.ee campur tangan tidak ditawarkan, SE akan menguji dampak dari kebijakan antaraTepi One Phantom atau yang tidak diketahui campur tangan badantersebut sendiri .merencanakan. Hasil dari SE akan ada artinya. Dari experienrg, peneliti yang telah belajar pentingnya memantau implementasi dari campur tangan.

Mungkin yang paling basicdesign masalah adalah menerapkandan mempertahankan randomization. Para peneliti seringkali tidak melakukan penetapan acak diri mereka sendiri. Badan operasi memilih para peserta untuk program-program dan di dalam proses ini diharapkan untuk menetapkan peserta untuk campur tangan dan kontrol kelompok sesuai dengan protokol yangdisediakan oleh para peneliti. Penetapan sebenarnya adalah "sering' yang dilakukan oleh pekerja sosial, perawat, dokter, atau sekolah resmi kabupaten' (Cook dan 'Shadish tahun 1994, 550. Kadangkala orang ini salah mengerti apa yang mereka harapkan untuk melakukan, dan kadang-kadang mereka tergoda untuk menggunakan profesional mereka dalam penetapan keputusanpenghakiman. Para peneliti telah belajar bahwa mereka tidak hanya harus melatih staf badan tetapi juga mempertahankan pengawasan kehadiran untuk memastikan bahwa penetapan memang secara acak.

Hal itu bukan akhir dari masalah tersebut. Apa Yang dimulai sebagai benar diacak penetapan mungkin menjadi diurungkan sebagai masa terus berjalan. Dalam beberapa kasus percobaan tidak mendaftar cukup peserta. Staf Badan itu mungkin raid grup kontrol untuk mengisi slot dalam program ini. Orang berlabel "kontrol" mungkin dalam kebenaran menerimacampur tangan. Atau, dan ini tidak dapat dihindari, para peserta dapat turun dari program dan studi. Yang akan menjadi baik jika mereka turun dari sama dari campur tangan dan kelompok kontrol yang serupa dengan alasan. Bagaimanapun,

biasanya lebih umum untuk kontrol untuk drop out. Mereka tidakmenerima layanan dan mereka telah kurang alasan untuk bersabar. Misalnya, dalam pendapatan pemeliharaan percobaan, lebih tinggi tingkat putus telah terdaftar dalam grup kontrol dan dalam beberapa kelompok eksperimental menerima lebih kecildaripada manfaat yang lebih luas dalam kelompok manfaat.. Efekdifferential drop out adalah untuk berkompromi kesetaraan darikelompok tersebut. Pilihan bias diperkenalkan semula.

Dalam kasus lain, grup kontrol yang mungkin menjadi terkontaminasi oleh yang secara tidak sengaja didedahkan kepada campur tangan dalam penelitian. Guru menerima eksperimental kursus pengembangan profesional dapat berbagi beberapa dari mereka pelajari baru dengan sesama guru-guru di sekolah, terlepas dari resmi mereka "kontrol" status.

Daftar komplikasi akan menyala dan pada. Sebagai penelitiyang telah menjadi lebih canggih sepanjang masa dan dengan pengalaman, mereka telah diidentifikasi host-ancaman lebih lanjut terhadap kesahihan SEs. Manski dan Garfinkel (1992) mencadangkan bahawa beberapa intervensi dapat menyebabkan perubahan dalam norma dan sikap dalam masyarakat, dan perubahan sikap masyarakat akan mempengaruhi keberhasilan intervensi. Heckman (1992) dan Heckman dan Smith (tahun 1995 )telah menulis bahwa orang-orang yang berakar dalam SEs mungkintidak perwakilan dari orang-orang yang akan berpartisipasi dalam skala program. Moffitt (tahun 1992, tahun 2004 ),, juga memiliki khawatir tentang "entri efek," kondisi penuh-skala program yang akan mempengaruhi perilaku para peserta yang tidak muncul dalam skala kecil percobaan.

- Fasibility acak penetapan untuk organisasi/masyarakat

kebijakan campur tangan beberapa ide inovatif melibatkan campur tangan dalam lingkungan atau sistem atau menyatakan. Daripada memberikan layanan kepada individu pada waktu, usulankebijakan ini dirancang untuk mengubah praktik dan budaya yanglebih besar dari Contoh entiti termasuk: mengubah sikap kesejahteraan sehingga staf kantor prioritas utama adalah tempat klien dalam pekerjaan, mengubah amalan-amalan dalam sebuah lingkungan sehingga keluarga, restoran, dan aparat penegak hukum secara aktif bekerja untuk mencegah pembalap muda dari minum alkohol, dan mengubah budaya dari sebuah sistem sekolah sehingga guru dan administrator secara aktif menerima orang tua untuk berpartisipasi dalam pendidikan anak.Untuk menguji gagasan seperti ini dalam sebuah SE memerlukan studi tidak ya banyak individu sebagai unit yang sedang diubahkantor kesejahteraan, lingkungan, atau sistem sekolah. Suku bunga adalah perilaku kolektivitas dinomorduakan.

Solusi yang jelas. adalah untuk randomize unit. Beberapa sekolah atau lingkungan sistem mungkin akan ditetapkan secara acak untuk campur tangan atau ke sebuah kontrol grup. Namun, sebagai ukuran unit meningkat (mengatakan, untuk daerah atau menyatakan), lebih sedikit unit dapat. belajar. ,. Ia sangat sulit dan mahal, untuk mempelajari sejumlah besar dari lingkungan atau negara, dan -beberapa studi, telah berusaha untuk pergi lebih jauh dari sepuluh atau dua belas. Namun, dengan hanya terbatas jumlah kasus, hukum dari kemungkinan tidak perlu bekerja. Apa pun diamati perbedaan antara campur tangan dan kelompok-kelompok kontrol mungkin terjadi secara kebetulan. Ada,adalah terlalu sedikit kasus untuk bahkan dari gumpalan secara kebetulan. Oleh karena itu, randomization dariunit besar adalah sebagian solusi terbaik di. Di sini adalah sebuah masalah di mana penelitian dan inovasi yang dibutuhkan saat ini sedang dikembangkan.

Alasan lain untuk mengajukan keberatan untuk secara acak penetapan adalah bahwa kota itu tidak ada kota yang tidak ada kota, atau di lingkungan dapat dipertukarkan, atau sistem kesehatan atau sekolah. Masing-masing memiliki sejarah. Masing-masing memiliki set didirikan tradisi. Masing-masing memiliki sebuah budaya yang telah berkembang selama beberapa generasi. Setiap jenis tertentu telah menarik dari organisasi sipil-dan staf program dan penduduk. Harlem tidak Sebelah selatan dari Chicago, yang tidak Watt. P S 241 sampai di Brooklyn tidak sama dengan Condon Sekolah di Boston (Towne dan. Hilton 2004). Bahkan jika seorang peneliti telah secara acak untuk menetapkan lingkungan, mereka tidak akan dapat dibandingkan dengan sepenuhnya, dan perbedaan diamati pada akhir mungkin karena tidak begitu banyak hal untuk campur tangan sebagai ke seluruh kompleks sebelum sejarah dan budaya.Untuk, misalnya, evaluasi dari sebuah program untuk mempromosikan produk makanan bergizi secara acak ditetapkan supermarket di Washington dan Baltimore. Campur tangan kelompok pasar produk bergizi ditempatkan di lokasi kepingan menguntungkan anddistributed flyer mengenai gizi. Grup kontroltidak. Ukuran keberhasilan pelanggan membeli makanan yang bergizi. Hasil menunjukkan bahwa ada lebih banyak perbedaan antara dua kota dari antara eksperimental dan kelompok kontrol.

Pengeluaran biaya dapat membatasi nilai percobaan sosial yang dapat memberikan untuk pembuatan kebijakan. Pada umumnya ada hubungan langsung antara kerumitan desain penelitian dan biaya. Yang lebih, kebijakan alternatif, pengaturan, atau jenis peserta diuji, lebih mahal adalah percobaan mungkin. Oleh itu, biaya memainkan sebuah peran langsung dalam membatasi relevansi temuan-temuan dari sosial percobaan untuk pertanyaan kebijakan tertentu.. Sepanjang waktu, sosial

percobaan muncul akan menjadi lebih sederhana dan akibatnya biaya kurang. Greenberg et al. (1999) mencadangkan bahawa ini adalah karena meningkatnya penggunaan administratif khusus database daripada survei, peningkatan dalam kemungkinan bahwa organisasi yang akan menjalankan program tersebut adalah orang-orang yang terlibat dalam percobaan sosial (seperti yangditentang untuk mengembangkan program baru yang dijalankan oleh organisasi penelitian), lebih sederhana dengan desain kelompok lebih sedikit, dan lebih pendek pelacakan masa untuk para peserta.

- Batas pada seberapa jauh dapat diuji

ini jarang percobaan yang dapat menguji semua variasi dalam kebijakan tertentu yang mungkin sangat relevan untuk pertanyaan di bawah studi. Oleh itu, temuan-temuan dari percobaan sosial terbatas hanya untuk diuji alternatif tertentu. SEs mengambil tempat di dalam jumlah terbatas dengansitus set tertentu dari para peserta, dan penemuan mungkin tidak menyamaratakan untuk pengaturan lain atau peserta. Waktuhorizon adalah sering dipotong (walaupun tidak dalam percobaanasuransi kesehatan). Hanya beberapa percobaan sosial dapat menilai trade-off antara komponen campur tangan. Hampir tidak ada yang cukup besar untuk memeriksa perbedaan di antara beberapa subkumpulan klien penduduk (pendapatan pemeliharaan percobaan pengecualian). Beberapa mengkaji perilaku staf menerapkan program dan sedikit untuk berkata tentang amalan-amalan yang dikaitkan dengan lebih baik atau lebih buruk• hasil. Biaya dari campur tangan tidak selalu hati . dihitung (fo. contoh, mi yang menyusu penggantian rumah percobaan, parapejabat tidak dapat memisahkan biaya-biaya dari program studi tersebut - (Greenberg dan Shroder 1997)).

YANG dapat membuat perbezaan antara "kotak hitam" percobaan, yang menguji satu atau beberapa pengobatan, dan "respon permukaan" percobaan yang menguji berbagai pengobatan (Greenberg et al. 20 Hai3; Buttless x995). Contoh yang terakhir adalah pendapatan pemeliharaan percobaan yang 196os dan 197os yang jaminan pendapatan dan tarif pajak yang beragamdi seluruh pengobatan kelompok dan asuransi kesehatan dalam percobaan yang biaya pemakaian adalah beragam di seluruh kelompok tersebut. Greenberg et al. (2003) menyimpulkan bahwa jika campur tangan tertentu yang sedang diuji masih pada agenda kebijakan ketika percobaan ini menyimpulkan, kotak hitam percobaan akan baik. Namun, yang hampir tidak pernah terjadi. Kelebihan dari "respon permukaan percobaan" adalah bahwa rancangan memungkinkan untuk estimasi elasticities dari berbagai pilihan pengobatan dan hasil-hasilnya dapat digunakandalam kemudian model simulasi baik ke masa depan.

- Waktu

dunia'dan kebijakan penelitian tidak bekerja di bahu membahu. Percobaan Sosial yang memakan waktu, yang sering mengambil bertahun-tahun untuk desain, implementasi, dan akhirnya menganalisa dan hasil laporan. Kebijakan: proses sementara itu telah maju ke depan dan hasil-hasil yang SE tibadi yang baru, mengubah kebijakan lingkungan. Hasil Penelitian mungkin memiliki sedikit atau tidak ada kaitan' di dunia ini mengubah kebijakan. Misalnya, asuransi kesehatan percobaan bermula pada sebuah lagu ketika nasional sistem perawatan kesehatan di bawah aktif pertimbangan, dan dampak dari biaya pemakaian relevansi telah nyata. Pada saat hasil percobaan inidiketahui, perawatan kesehatan perdebatan petered keluar dan rasional kesehatan care tidak lagi yang mengancam kemungkinan.Relevansi hasil, adalah; sangat berkurang (Greenberg et al. 2003).

Di masa lalu telah sering mengambil empat atau lima tahun: (atau lebih) sebelum hasil percobaan telah siap. Tunjangan perumahan percobaan berlari lebih lama lagi. Ia mempelajari efek memberi tunjangan perumahan untuk penduduk berpenghasilan rendah tidak hanya pada keluarga terlibat tetapi juga pada penyediaan perumahan. Ia harus pergi pada cukup lama untuk landlords untuk meningkatkan jumlah unit rumah tersedia untuk penerima dari pemberian. Studi berlari (di dua kota) selama sebelas tahun (Bradbury dan Pasang Surut saya98i).

Di sisi lain, beberapa percobaan terlalu pendek untuk menghasilkan meyakinkan hasil. Rumah yang menyusu insentif untuk studi berlari tiga puluh bulan. Banyak perawat rumah ternyata tidak bersedia untuk mengubah praktik mereka dalam respon untuk jangka pendek insentif moneter. Salah satu sponsor laporan badan menyatakan:

Untuk para peserta [perawat homes] ... mungkin kelihatannya sangat jangka waktu yang singkat dan tidak mungkin keengganan untuk membuat kepegawaian" kebijakan, dan organizationalchanges yang dapat mempengaruhi lingkungan mereka lama setelah percobaan ini menyimpulkan. (Greenberg et al. 2003,107)

Namun bahkan dalam waktu yang singkat masa,-studi-tidak dapat menangkap persembahan unjukan. Pada saat ini telah selesai, kepentingan politik telah berpindah dari dari insentif- dan terhadap peraturan.

Seringnya 'tidak titik yang khususnya kuat dari ilmu sosial. Mencoba untuk mengetahui apa yang isu kebijakan akan hidup di beberapa titik masa depan adalah sebuah latihan untukseorang paranormal. Mengetahui seberapa cepat atas kanvas perubahan politik, mengetahui bagaimana volatile yang berwajahpemerintah yang hari ini dengan negara tersebut dibagi hampir

sama antara Republikan dan Demokrat, mengetahui bagaimana kebijakan windows membuka dan menutup sebagai ekonomi perubahan, dapat kita pernah merasa yakin bahwa kita adalah ramalan campuran yang sesuai dari intervensi? Banyak orang khawatir tentang masalah-musabab dalam percobaan. Kita khawatir aboutthe mengaburi bola kristal. Untungnya atau tidak, dalam tahun-tahun belakangan ini SEs telah menjadi lebih sederhana. Seperti yang dicatatkan di dalam paragraf selanjutnya, mereka akan membuat dengan data yang ada, dan mereka mengambil' sedikit waktu untuk menyelesaikan. Tetapi mereka adalah pengujian inisiatif lebih sederhana.

- Kecil

Sosial efek percobaan hampir tidak pernah menghasilkan slam dunk-penemuan. Jika usulan campur tangan itu jelas sekaliunggul, tidak mungkin akan menjadi sedikit alasan untuk percobaan. Paling usulan kebijakan yang tidak pasti. Hasil-hasil percobaan sering marginal. Ada kenaikan kecil dalam keadaan tertentu dengan beberapa subpopulations. Penafsiran menjadi kritikus .

Karena percobaan adalah seperti kerajinan yang sulit, hasil-hasil tidak selalu berwibawa. Keputusan tentang kursus dari percobaan harus dibuat di sepanjang jalan. Kompromi telahdibuat, terkadang dalam menanggapi krisis di lingkungan." untuk terkadang pas dalam anggaran, terkadang sesuai dengan keahlian staf yang tersedia, terkadang untuk memenuhi tenggat waktu, terkadang dalam sebuah upaya untuk menjawab pertanyaan baru yang muncul dalam kajian ini. Para peneliti lainnya akan kritik terhadap hasil penemuan tersebut. Mereka mungkin reanalyze data. Mereka akan datang dengan model baru' yang mereka menyatakan lebih baik untuk akun pola dalam data. Percobaan dapat mendapatkan' tertangkap oleh para ahli

penelitian dan menjadi makanan ternak: untuk berjuang untuk dominasi.

- Etnis

masalah etika telah menguji percobaan sejak awal. Orang yang telah ditampilkan cukup menimbulkan keprihatinan dengan menahan sosial baik dari satu grup terlepas dari tingkat kebutuhan. Praktisi adalah sering menunjukkan keengganannya untuk memungkinkan layanan akan diberikan pada basiss kebetulan, tanpa latihan profesional mereka sendiri, penghakiman. Para penerima manfaat dari layanan kuat untuk obyek yang ditempatkan di dalam tidak-kontrol layanan grup. Host-masalah etika (enggan memberikan layanan kepada mereka memenuhi syarat, keterbukaan penuh eksperimental prosedur, hakuntuk menolak, memberi mudarat kepada para peserta) mungkin secara signifikan membatasi pertanyaan yang sosial dapat alamat percobaan.

Bantahan Itu adalah bahwa tidak ada seorangpun yang tahu apakah layanan sosial adalah "baik" sampai ia telah dipelajari. Banyak percobaan menemukan bahwa campur tangan tidak lebih baik dari layanan standar --atau bahkan merusak. Oleh itu, perawat rumah penggantian percobaan tidak menunjukkan dampak positif dari skema ganti rugi. Bickman kajian intensif layanan kesehatan mental, yang termasuk semua secara profesional fashionable giring dan peluit, menunjukkan bahwa intensif layanan tidak memiliki hasil yang lebih baik daripada layanan reguler (Bickman 1996).

- Complexcity intervensi

Mungkin yang paling nyata hujah menentang percobaan. yangmereka menganggap bahwa intervensi yang sederhana yang dapat ditangkap dalam pengobatan/no-desain pengobatan. Banyak intervensi yang sangat kompleks interaksi sosial, dan mudah

causeand akibat pola mungkin tidak mudah terdeteksi. "Program"maukah anda berkunjung ke Norwegia sering diimplementasikan dengan cara yang berbeda oleh staf, dan hasil yang diinginkan adalah proses sosial yang tidak dapat dengan mudah diukur dengan mudah metrik. Mempelajari dampak dari psikoterapi, misalnya,_ voses segala masalah karena pada dasarnya cara irn whicr pribadi, ahli terapi bekerja dan klien menjawab. Tidak kira apa label satu dan akhiran apa saja untuk "merek" dari psikoterapi, atau bagaimana-tihan demi satu akan mencoba untukmelatih ahli terapi; untuk menggunakan prosedur yang sama, para kritikus berpendapat bahwa kuantitatif diacak studi tidakdapat menghasilkan hasil berakal.

Demikian juga, para pendidik sering mengatakan bahwa interaksi di dalam ruang kelas, seperti' pengenalan metode pengajaran baru, tidak dapat belajar dengan tepat oleh kuantitatif diacak teknik. Asumsi bahwa semua guru dilatih dalam baru akan menerapkan metode pengajaran secara konsisten,dan bahwa anak-anak di semua kelas akan bereaksi dengan cara serupa, mewakili sebuah kesalahpahaman mendasar dari variasi pembelajaran. Rejoinder tersebut adalah bahwa meskipun variasi, yang tentunya memperkenalkan lebih kesalahan pengukuran, besar sampel harus menunjukkan sejauh mana berartiskor (sosial berfungsi, prestasi matematika, kehadiran) berbeda, di seluruh populasi didedahkan dan tertutup untuk campur tangan. Dalam Cook (2001) kata-kata: "Ia tidak argumen terhadap penetapan acak untuk menyatakan bahwa beberapa sekolah yang kacau" yang;implementasi .ofa reformasi biasanya sangat bervariasi, dan bahwa pengobatan yang sama sekali tidaksetia kepada teori dasar mereka. Ada cukup konsistensi dalam perilaku manusia, experimentalists klaim, untuk mengizinkan sebuah percobaan untuk mencapai kesimpulan berharga tentang apakah yang layak untuk mengadopsi inovasi.

7. Kesimpulan

Kami memulai bab ini dengan descpription tiga sifat khas dari SEs: riset di bidang, couducted acak melalui penetapan calon penerima bantuan contoh untuk campur tangan dan kondisi kontrol,untuk tesr tht mungkin keberhasilan kebijakan intervension. Yang pertama adalah dua karakteristik semakin diterima sebagai realistis dan perlu. Penelitian di bidang sekarang telah menjadi amalan arus utama. Studi diacak telah menerima dukungan yang cukup tidak hanya dari penelitian masyarakat (walaupun beberapa peneliti, khususnya di bidang pendidikan, telah bermalam gigih mengulas sekiranya dia berselisih faham) tetapi juga di kongres. Misalnya, pendidikanstudi desain dengan diacak. Ia adalah yang ketiga fitur yang mungkin pada lagi sebagai bersemayam prospektif tes dari kebijakan alternatif.

SE menjadi terkenal pada akhir tahun 1960 pada saat goncangan perubahan kebijakan. Ia adalah sebahagian dari iklimdan inovasi reforn radikal yang melanda negara. Pada akhir tahun 1980 dan 1990, sebagai suku bunga dalam perubahan mendasar berkurang, keadaan percobaan juga berubah. Percobaan terus dilakukan, lebih banyak dari mereka dalam kenyataan, tetapi sumber yang lebih sedikit telah dikhususkan untuk mereka. Emhises yang berubah dari utama marjinal inovasi untukperbaikan dalam program yang telah ada. Dalam burtless, kata, mereka " lebih sempit" (1995,63). Sekarang, pada saat defisitanggaran fiskal dan stringenty di Amerika Serikat kaum 'Ad di tempat lain, kemungkinan dalam negeri inisiatif baru nampaknyarendah tidak saat besar ide baru akan diuji. Sekurang-kurangnya dana dengan pemerintah. Tren yang berlaku adalah untuk menguji minor modifikasi, preterably biaya-menyimpan perubahan, dan beralih dari kegiatan untuk sektor swasta. Jikaanda sedang mempertimbangkan intestment dalam skala besar SEs,nasihat kami akan: tahan. Produk adalah salah satu suara, dengan potensi tinggi, tetapi waktu tidak sekarang-sekurang-

kurangnya di Amerika Serikat. Tetapi hang. beberapa versi SEs akan memiliki hari mereka.

Kita juga mulai kisah kami dengan garis besar tiga tema- kompleksitas dari kebijakan tersebut dunia, teknis kompleksitas dari penelitian dunia, dan penjajaran atau misalignment antara penemuan eksperimental dan kebijakan pertanyaan. Secara keseluruhan, SEs telah menunjukkan kemungkinan dan batas-mempengaruhi kebijakan melalui penelitian ilmu sosial. Mereka telah memberikan kontribusi besar pengetahuan baru. Beberapa penemuan mereka u telah menyusup ke dalam arena kebijakan dan merupakan bagian dari kebijakan-berbicara (Anderson 2003; Weiss 1999): Influentials di Kongres, lembaga federal, organisasi internasional, kelompok kepentingan, dan media belajar mengenal dengan penemuan eksperimental untuk mengambil bagian dalam yang mengetahui, kebijakan percakapan.

Di sisi lain, tidak ada contoh dari sebuah SE yang dipimpin directlyy untuk perubahan kebijakan. Hasil-hasil percobaan asuransi kesehatan telah begitu lewat dan karyanya menceritakan tentang kekaguman hidup sebenarnya pada proposal legislatif yang mereka boleh dibilang, kecuali diabaikan oleh para ekonom, yang telah menggunakan mereka untuk model proposal baru. Rumah yang menyusu penggantian hasil percobaan juga terlambat, setelah zing telah keluar dari insentif gagasan. Hampir tak seorang pun masih tertarik dalam insentif bagi perawat rumah, tindakan tersebut di kawasan peraturan. Sementara banyak diterbitkan, .pendapatan pemeliharaan percobaan dipimpin untuk sedikit perubahan dalam kebijakan kongkrit. Kesejahteraan-untuk-kerja percobaan nampaknya telah akibat kebijakan. Studi Yang MDRC memberikan dukungan bagi wajib bekerja-pertama dan persyaratan menunjukkan kemampuan negara untuk desain dan mengelola program kesejahteraan mereka

sendiri. Ketiga-tiga aspek disain program ini pada akhirnya berakhir di dalam Keluarga Mendukung Tindakan dari tahun 1998.Tetapi seperti yang telah kita lihat, percobaan hanya diperkuat apa yang para pembuat kebijakan telah merencanakan untuk melakukan pada alasan lain.

Karena (membuat adalah seperti bisnis yang rumit, dengan begitu banyak pemain mengejar kepentingan di diantaranya berlainan, ia terlalu optimis untuk mendapatkan informasi riset untuk membawa hari. Bahkan yang berkualitas tinggi informasi disediakan oleh SEs tidak dapat menimpa semua kekuatan lain pada, adegan. Dan seperti yang telah kita lihat,penentuan masa SEs sering mati. Agenda kebijakan yang bergerakpada, sementara SE masih mempelajari tahun lalu proposal.

Namun, totting naik kelebihan dan kekurangan, kita akan keluar dalam kasih karunia dari percobaan lebih lanjut. Dunia adalah untuk direformasi pemahaman lebih besar akibat-akibat dari pemerintah .tindakan. Percobaan Sosial tidak dapat sepenuhnya memenuhi kebutuhan untuk pengetahuan tentang kebijakan hasil, sebagian karena intrinsik alam dari penelitian ilmu sosial dan sebagian karena pembatasan yang dilakukan oleh kondisi yang dilakukan. Ia masih membuat terbuka jalurnya. Apa pun yang rasionalitas kemajuan dalam sangatlah memilukan dunia dari kebijakan tersebut senilai mendukung. Tidak dihormati atau kowtowed, tetapi semangati.

Tetapi kita juga perlu moderat harapan kami dari sumbangan yang SE dapat membuat. Gagasan melandaskannya kebijakan ketat eksperimental bukti adalah salah-dipimpin. SE tidak memberitahukan segala sesuatu yang sebuah unit politik perlu tahu tentang pilihan kebijakan yang tertunda. Banyak pertimbangan yang lain telah masuk ke dalam tindakan pemerintah, seperti tuntutan populer, biaya, kemampuan tersedia untuk menerapkan kebijakan tersebut, kebutuhan

bersaing, efek pada kebijakan tetangga, dan seterusnya. Resolusi datang melalui politik. Walaupun perkataan telah jatuh YANG jahat: waktu, politik adalah systemm kami memiliki untuk mengatasi perbedaan-perbedaan dalam masyarakat kita: kompleks dan pengambilan keputusan yang paling tidak dapat diterima minimal untuk semua pihak (untuk yang luar biasa daripenegasan politik, lihat Crick 1972),

bukti hasil dari dunia politik tidak dapat dan seharusnyatidak semakan bermain politik sebagai dasar kebijakan. Tentu saja, kita tidak ingin melihat kebijakan dikembangkan berdasarkan. dari kesalahan pemahaman mengenai situasi atau unrealisticc harapan untuk efek dari tindakan, tetapi ia tidaktampak sombong untuk berpikir bahwa data eksperimental saja dapat menunjuk ke terbaik dari kompleks resolusi masalah kebijakan. Sejarah hal, seperti budaya politik dan praktik kelembagaan. Apa Yang SE dapat lakukan adalah menerangi pemahaman masyarakat dan para elit dan suntikan dana pohcy diskusi dengan pengertian. Scienceand politik galinya adalah kebijakan lingkungan, tetapi persekutuan mereka adalah sebuah gelisah. Para ilmuwan sosial, untuk menempatkan terbaik menghadapi pada hubungan, telah menunjuk ke "nilai tambah" fitur yang ilmu sosial 'membawa 'ke meja inventarisasi pengetahuan untuk masa depan untuk menarik pada, secara umum pencerahan oleh para elit dan masyarakat di masa sekarang, puncturing-mengidentifikasi asumsi, dan konfirmasi-naluri bijaksana untuk tindakan." Tetapi untuk semua pemahaman dan pengertian kontribusi sosial sciencesand oleh SEs khususnya-mereka melakukan atau satu titikpun menjalankan menunjukkan. Ada tidak dapat dihindari ketegangan antara ilmu dan politik, dan konvergensi biasanya kecelakaan yang bahagia.