SOCIAL EXPERIMENTATION FOR PUBLIC POLICY 1
Transcript of SOCIAL EXPERIMENTATION FOR PUBLIC POLICY 1
SOCIAL EXPERIMENTATION FOR PUBLIC POLICY
1. Police experiments
The statement, made by the british conservative
politican enought powell, highlihghts the fact the
public policy making involves not only the higher arts
of principle, intellect, and persuasion, but also the
play of interests and the pushing and hauling of
partisans for power and control. While the centrality
of interests and prejudices has received a great deal
of attention in both the scholarly and popular media,
it is powell’s “guesses about the future” and that
“staff of economists” that concern us in this chapter.
Policy inevitably deals with an uncertain future
even with the plethora of statistical series and policy
research currently available, policy making has to be
based on some degree of guesswork. Powell’s economists
who project past trends into the future, now
supplemented by sociologists of several hues, shed
sometimes flickering light on what the effects of
policy interventions will be it is to get closer to
understanding the likely effects of a prospective
policy that social experimentation was born. The idea
is simple try out a policy on a small scale and see
what happenes.
Since the late 1960s, spending on trials of social
policy proposals in the USA has consumed over a billion
dollars ( burtless 1995). In this chapter we the nature
of social experiments that have been conducted in the
past forty years we review the efforts of many social
scientists and economists to develop systematic empiral
evidence about the likely advantages and disadvantages
of specific policy proposals throughthe conduct of
social experiments themselves and try to project the
current trend line into the hazy future.
2. Definition
Social experiments are randomized field
trials of a social interventation. Within that
rubric, two emphases jostle for primary (and a
third emphasis tags along). Some authors define
social experiments (SE) by emphasizing the “trial”
in randomized field trial. For them, the hallmark
is that prospective intervention is being tried
out on a small scale before it is widely adopted
not only is it being tried out” it is being
studied in its pilot version. The aim is to fine
out whether the intervention achieves its aim. If
so, the assumption is that policy makers should
adopt it on a system-wide basis there is sense of
self-conscious intention to influence policy, and
often this intention is accompanied by a sense of
urgency as the policy window opens.
Other authours put the stress on
randomization. It is randomization that allows
experimentes to have confidence that the
intervention was the cause of whatever changes are
observed. In a randomized study, the experiment
select samples from the same population, assign
one to the intervention, or “experimental”
condition and the other to a “control” condition
at the end of the period, the groups are compared
inasmuch as they were very much the same at the
star and the only thing that differed over time
was exposure to the intervention. From a
methodological point of view, randomization gives
experimenters confidence in their estimates of
effects.
The third focus in the definitions of social
experiments, now widely taken for granted is that
the trial is done in the “field”. Gone is the
comfortable milieu of the laboratory for studying
outcomes. Rather the social scientist counducts
the studies in the precincts in which the actual
policy will be run. Thus we have randomized field
trials.
If the emphasis on randomization is accepted
as the guiding principle. Then any study of
desired outcome conducted through randomization is
an SE Such a definition sweeps in large numbers of
evaluations of existing programs. Many evaluations
of social programs are conduced after the program
are enacted, and some of the evoluation (although
not nearly as many as evaluators would like)
randomize propektive participants into “
experimental” and “control” groups. After a period
of time,the evaluator compares the status of the
two groups on the desired indicators ( e.g..health
status, earnings, school graduation). To blanket
such pos thoc evaluations into the category of SEs
widens the category substantially.
If we confine ourselves to randomized studies
undertaken on attest basis to guide adoption of
future policy, we have a more focused field of
enquiry.it is the definition we adopt here. Of
course , the distinctions are not hard and fast.
Some evaluations of existing programs are expected
to guide future iterations of the program-i.e. to
lead to modifications and improvements in a
possible model for federal policy (states as
“laboratories of democracy”),what is an evoluation
at one level is an SE at another Still, the
distinction is useful to hold on to. It is
important to consider the main purpose for which
the SE is done as well as its research design.
3. History
With a little difficulty we could probably trace SEs
back to Francis Bacon, but it is sufficiently
historical to go back to Sidney and Beatrice Webb. In
their 1932 book, Methods of Social Study they argue for
scientifically based social policy in words that have
remarkable resonance for our own times. They advocated
research conducted by social scientists trained in
experimental methods who conduct independent social
investigations and transmit their results to those
making social policy. The actual methods, as Ann Oakley
(1998a) has pointed out, were developed by
educationalists and psychologists in the USA in the
late nineteenth and early twentieth centuries. The
philosopher, Charles S. Peirce, the father of
"pragmatism," introduced the idea of randomization into
psychological experiments in the 1880s. Some: of the
early studies dealt with the transferability of memory
skills from one subject to another (Oakley cites
Thorndike and Woodworth 1901 and Winch 1908). These
psychological researchers invented techniques for
randomly assigning subjects to experimental
treatments.. R. A. Fisher who did his research in
agriculture and developed much that has become
commonplace in statistics, is widely known for
championing randomization methods.
With regard to the "field" aspect of policy
experiments, Oakley,(1998b) reminds us that two US
sociologists, Stuart Chapin at the University of
Minnesota and Ernest Greenwood at Columbia University,
applied experimental methods to the study of social
problems in the early years of the twentieth century.
Where psychologists tended to work in laboratory
settings, pioneering sociologists took their research
out into the community. Chapin (1947) describes nine
experimental studies that he and others carried out on
topics such as recreation programs for delinquent boys,
social effects of public housing, and effects of
student participation in extracurricular activities.
Where others had stated that randomized experiments
could be done only under antiseptic laboratory
conditions, he was interested in demonstrating that
they could be adapted to community settings as well.
Greenwood provided a theoretical rationale for applying
experimental methods to social issues, described in his
book Experimental Sociology (1945).
In the first half of the twentieth century, most
of the forerunners of current SEs were evaluations of
existing programs. They shared many of the
characteristics of experiments, but dealt with programs
that were already up and running. The intent,
nevertheless, was very similar: to see whether a
program worked and, if it proved successful to extend
and expand it. One evaluation that gained a great deal
of attention was the Perry Preschool Project,' largely
because the preschool participants were followed up
into their late twenties and because their lives turned
out to be significantly more successful than the lives
of kids in the control group (Schweinhart, Barnes, and
Weikart 1993). The data provided much of the
justification for authorization and reauthorizations of
the Head Start program and other early childhood
programs. Among other noteworthy early studies were the
Eight Year Study of progressive high schools, conducted
by Ralph Tyler (.unpublished), the Cambridge-Somerville
youth' worker program that aimed to prevent juvenile
delinquency (Powers and Witmer 195i), and the Hawthorne
studies of reforms to working conditions in a Western
Electric plant (Roethlisberger, and Dickson 1939).
A relatively small number of evaluation studies
used randomization for assigning participants, but some
of them sought to introduce controls in other ways.
Campbell and Stanley (1966) wrote a landmark monograph,
Experimental-and Quasi-Experimental Designs for
Research, classifying the designs of studies that had
been reported. In the language of the time,
"experimental" meant that the study had randomly
assigned participants to the program (or several
variants of the program) and to a control group that
did not receive the program. "Quasi-experimental"
designs used other strategies to reduce the threat that
something other than the program was the cause of
whatever differences appeared between the groups..
Although perhaps not its intent, the Campbell and
Stanley book tended to legitimize quasi-experiments for
evaluation purposes. Campbell and his collaborators in
subsequent versions of the book (Cook and Campbell
1979; Shadish, Cook, and Campbell 2002) have sought to
overcome the impression and place randomization back in
priority position.
It wasn't until after the Second World War that
the three main ideas of SE were combined in. large-
scale investigations-randomization, study in the field,
and intentional preparation for policy change. With the
War on Poverty in the 1960s, SEs began their modern
history. The first noteworthy SE of the, period was the
series of income maintenance experiments. They began in
1968 in four sites in New Jersey and were followed by
parallel studies in a series of urban and rural
locations. The program was an effort to change the
existing welfare system by the provision of a
guaranteed annual income to poor people (Cain and Watts
1973; Kershaw and Fair 1976; Danziger, Haveman, and
Plotnick 1981). The aim of the experiment was to test a
policy innovation prior to enactment.
The income maintenance experiment was followed by
experiments with housing allowances (Carisor and
Heinberg 1978; Friedman and Weinberg 1983; Kennedy
1980), health insurance (Newhouse 1993), performance
contracting in education (Rivlin and Timpane 1975), and
job search (Wolfhagen 1983). Greenberg and Shroder
(1997) provide reports on 143 SEs conducted in the USA,
one in Canada, and one in the Netherlands. All of them
were randomized field trials of prospective new
policies (although the policies studied in the later
experiments generally represented merely incremental
changes in existing programs). Only experiments that
had reported results by 1996 are included in the
inventory. Their appendix lists seventy-five SEs then
still in progress.
To ground the reader in some real examples, Table
39.1 provides information on four SEs which we refer to
in the following discussion.
Income maintenance experiments. Four income
maintenance experiments were run in the 1960s, and
197os at eleven sites to test the impacts of variations
in a negative income tax program for low-income
families. Families were provided with a guaranteed
level of benefits and were allowed to earn additional
income through work. Program benefits were reduced by a
set fraction for each dollar earned. The findings
showed that families reduced the number of hours they
worked but not by-significant amounts. Other results
were mixed; with small positive results on many
measures. However, by the time results were reported,
the political climate had changed. Congress was in no
mood to give the poor a blank check. The long and
hugely expensive experiment (Greenberg and Shroder 1997
report the cost as $111.7 million) had little policy
impact.
The health insurance experiment conducted by the
RAND Corporation tested the effects of varying levels
of cost sharing on the use of health services and
health outcomes. It randomly assigned families to one
of fourteen fee for service plans or an HMO. A total of
7,708 individuals were tracked in six sites chosen to
represent the United States over a period of eight
years, making the experiment one of the largest and
most expensive in American history. The findings showed
that overall, cost sharing reduces use of medical
services without substantial negative effects on
health. This proved to be a factor in: later acceptance
of cost sharing as a cost containment strategy in both
public programs and private insurance plans.
Welfare to work programs. In the 198os, the
Manpower Development Research Corporation (MDRC) tested
ten specific: state programs using random assignment,
measuring the impacts and benefit-costs of state
welfare-to-work programs, as well as studying, their
implementation. State and, local' governments designed;
implemented, and operated the programs that were
evaluated, and the MDRC developed the evaluation design
and conducted the, actual evaluation. The findings
showed that the tested programs increased earnings and
reduced the size of the welfare rolls, the benefits to
society as a whole exceeded the social costs of the
programs,, and the programs usually resulted in net
savings for taxpayers. However, the effects were
relatively small.
Nursing home incentive reimbursement experiment.
This experiment, conducted from 1980 to 1983, tested
the effects of incentive payments for proprietary
nursing homes.
Table 39.1 Four Selected social experiments
Experiment Tested intervention Design Result Dissemination
maintenance
exreriments
(1968-78)
RAND health
insurance
experiment (1974-
MDRCwelfare-to-
work experiments
Income supplements
for welfare
recipients with
varied tax rate for
paid work
Varied cost
sharing' for
medical services
Provided job
training and other
employment services
Randomly assigned
families to varying
benefit reduction
rates in 11 sites
Randomly assigned
families to different
cost-sharing; programs
in 6 sites
Randomly assigned AFDC
recipients to various
employment program, in
10 sites
Payment of income
subsidy slightly
reduced number of
hours worked
Increases in Cost
sharing reduced use
of health services
without
significantly
affecting' health
status
Consistent small;
positive
widely published
in books, journal
articles, and
reports
Numerous
publications,
widely
disseminated
Widely
disseminated
during welfare
debates
Not widely
(1975-88)
Nursing home
incentive
reimbursement
experiment (1980-
to AFDC
participants
Provided
reimbursement
incentives for
nursing homes
accepting Medicaid
patients
Randomly assigned 36
nursing homes to
participate in
intervention program
or control group
effects on
participants'
earnings, reductions
in welfare - rolls
and in cost to
taxpayers
Little ettect of
reimhursemerit on
health outcomes or
discharge of
Medicaid, patients.
Slightly incrosoed
admissions of heavy
care patients
disseminated
The aim was to encourage them to accept more hard-
to-care-for Medicaid patients and to discharge patients
to lower-care facilities when they had attained
acceptable health status. The study was conducted with
a total of thirty-six nursing homes in San Diego
County, eighteen of which were in the control group.
Findings showed that in the first .year of the
experiment there was no difference between the two
groups of nursing homes in the intensity of care that
admitted patients required, but in the second year the
experimental nursing homes did admit patients in need
of more intensive care. No statistically significant
differences emerged on achievement on patient health
goals or on patient discharges to less expensive
facilities. The small size of the sample and the
shortness of time over which the experiment was run
(thirty months) militated-against significant
differences. The findings were, not disseminated
widely, and few people heard about the results.
4. Themes
It seems obvious that social experiments (SEs) are
conducted to improve decision making regarding policies
under study. However, a direct; relationship between the
results of SEs and policy decisions presumes a rational
policy environment with established pathways for
information from experiments to feed into policy
decisions. The relationship between the conduct of SEs
and the policy environment is more complex -than such, a
simple statement suggests. SEs are generally lengthy and
results arrive in changed, sometimes unreceptive policy
space. Experiments arise for a variety of reasons and are
not always set up to answer directly specific policy
questions. And indeed experiments are, but one in a
multitude of information sources that policy makers must
consider when making policy decisions.
In this chapter, we explore the relationship of SEs
to policy making. First we look at the advantages of
conducting such experiments. We examine contributions to
policy and contributions to social science. Then we
describe the disadvantages that SEs entail both for the
policy process and for social science. Last, we puzzle
about their future, in a near-sighted attempt to foresee
what use is likely to be made of SEs as political and
economic conditions change.
We admit that our view is largely a United States
view, but that is not totally our doing. The story of SEs
has been largely a US story. The first large experiments
were done in the USA and most of the subsequent work has
been "made in the USA." In recent years, Canada has
jumped on the bandwagon, and the Netherlands has also
conducted a few experiments. But most of the experience
on which the policy world relies is US work.
Running alongside our discussion of advantages and
disadvantages of SEs are three main themes. Hold the
pages sidewise and you will see these ideas: (1) The
Policy world is a complex place. Policy making evolves
from ideologies and beliefs, interests, and institutional
norms, as well as from competing information. "Scientific
evidence" alone will almost never determine the direction
of policy making. (2) The research world is no less
complex. Technical issues bedevil the study of complex
policy issues and affect the extent to which social
scientists can derive authoritative evidence. (3) The fit
between the worlds of policy and research is inexact.
Sometimes the answers that SEs provide bear little
resemblance to the questions that decisions makers ask. A
major misalignment is timing. An experiment may not be
completed until long after the questions' that provoked
the experiment have faded from view. Another issue is
the uneasy pattern of communication between researchers
and policy makers. 'Nevertheless, despite all the
disabilities that affect SEs, we conclude that a well-
done SE provides important information that illuminates
the policy field and has at least the potential for
influencing policy.
5. Advantages of social experiments
1. Policy advantages
Provide Data on Likely Outcomes of a Policy Idea
Social experiments are experimental tests of new policy
ideas. They provide information to people engaged in the
political process of making policy. They advance the
rational component in policy making (Rivlin 1971). Many
policy decisions are made in a relative information
vacuum with little known about the actual effects of the
policies proposed. Data from well-designed tests of
policies under discussion can provide invaluable
information about the realities of the expected effects
of policy adoption, including the potential for
unexpected or negative consequences, In some cases, such
information has counted in decisions to adopt a
particular policy track.' For example, the positive
results of the welfare-to-work experiments' played a
modest role in the further expansion of work requirements
in state welfare programs., In addition, the success of
state-designed and implemented welfare-to-work programs -
may have: encouraged later legislation to give states
flexibility to design state-specific welfare programs
(Greenberg, Linksz, and Mandell 2003; Baum 1991).
Some advocates claim that SEs offer objective
information, unsullied by the pull of interests. But
objectivity is relative. Social scientists for over a
generation have acknowledged that every social science
enquiry is inevitably colored by the assumptions, biases,
and blinkers of its investigator. Nevertheless,
experiments appear less prone to dispute than most other
forms of knowledge.' They collect information
systematically from a known population according to the
canons:; of social science. The element of randomization
adds authoritativeness. When there is contention other
social scientist can reanalyze the data to try to support
their argument. IT, resolving disputes, SEs rely on the
judgement of the community of social scientists.
(See Howell and Peterson 2004; Krueger,and Zhu 2004,
on rival interpretations of school choice experiments.)
On any reasonable scale, experimental information is
credible. In the four experiments that we have cited
here, little important disagreement emerged about the
interpretation of the findings.
- Clarify trade-offs
Social experiments can at times clarify the key trade-
offs in policy decisions and provide information to
debate these trade-offs (Orr 1998). For example, the AFDC
Homemaker Home Health Aide Demonstration found that home
care did not reduce health costs but did; improve
clients' sense of well-being. The findings provided
policy makers with information to debate the trade-off
between the costs and benefits of the, program.
- Keep a policy idea alive
One aim ascribed to social experiments is keeping alive a
policy idea that cannot muster enough support at the
moment to ensure passage: The income maintenance
experiment were reportedly undertaken because most
members of Congress did not support a negative income tax
for the poor to replace the welfare system. The federal
Office of. "Economic Opportunity and` academic economists
who favored the idea could not carry `the day, but they
gained support for an experiment (and then additional
experiments) in the hopes of making a good case. They
might also have hoped that the political winds would
changed and members of Congress would come to embrace
their idea for income maintenance for the poor. (Despite
their efforts, the negative income tax was not to be.)
The contrary assumption, that SEs are used to delay
a new policy until the lengthy study is, done, does not
receive much empirical support. Once a policy proposal
has acquired political momentum, it is usually enacted
regardless of evidence,, Before results were available
from the housing allowance experiments, Congress enacted
one feature that was still being tested They passed a
bill; known as, Section 8, that provided subsidized
payments for the poor; in the private housing market.
- Stock a library of information
SEs can create inventories of information for future
policy situations (Feldman 1989). Although their
sponsors, with their eyes focused on current options, do
not intend only to pile up knowledge for the future, -
that is one likely result. Even if the findings of the
experiment have little impact on current discussions,
they do provide a stock of information that future
political actors and analysts can draw on (Orr 1998). For
example, the health insurance experiment notably provided
information on elast - cities in health care demand that
informed later analysis.
- Help to build consensus
The focus and intensity of a social experiment,
coupled with a general acceptance among researchers of
the quality of impact estimates: derived through
experimental designs, may, provide the focal point needed
to draw together diverse actors and information sources
to agreement. The health insurance experiment finding
that cost sharing reduced health care use without
harming, health led to a fairly broad acceptance among
researchers and policy makers of cost sharing as a
legitimate cost containment strategy. Similarly, the
welfare-to-work experiments broadened acceptance of
mandated work requirements in public assistance programs.
- Legitimize existing preference
If the results of an experiment align with
preferences of decision makers, they can provide
legitimacy to existing policies' or preferred
alternatives. They, can reaffirm policies after the
policy has been chosen (Greenberg and Mandell i99i). Some
social scientists worry that this kind of after-the-fact
legitimization is a misuse of social science. But if the
findings suftZ 1 cy_tliat policy actors have already
selected on other grounds, there doesn't seem anything
wrong with giving it a social science seal of approval.
At times, social- experiments may provide political
cover for either difficult or highly contested policy
decisions, shifting the onus of decision making onto
"science." They may offer policy makers a set of data-
driven arguments for or against a
particular policy option.
2. Research advantages
- Spur the development of new research
methods
In order to do the challenging work of SEs, social
scientists have had to develop, new methods and
techniques. They have also had to develop new statistical
methods to analyze the data. The field environment, the
size of the samples, the rarity of certain groups about
whom data is needed, the need to generalize to a, larger
population, the need to measure difficult concepts-all
have contributed to innovations in_ research methods.
Current textbooks bear witness to the methodological
advances spurred by decades of social experimentation.
- Real-life test for social theories
Another advantage for social science is that SE gives
social scientists the opportunity to test theories in the
crucible of real-world settings. They can subject
theories and practices based on those theories to actual
test. This can heip bring abstract theorizing down to a
practical level. For example, theories about the value of
competition in improving the quality of schools are
being tested in a number of SEs that give parents choice
of their children's schools (Howell and Peterson 2004).
Theories about the positive effects of a non-stigmatizing
guaranteed income, implemented through a negative income
tax, were 4tudied in urban and rural areas for extensive
periods of
time.
Many of the pilot ideas that SEs have studied
originated not in social science theories but in
political or practice settings_ For example, the MDRC
welfare experiments did not directly test any specific
behavioral theory. Nevertheless, they often derived from-
or coincided with theories that were current among social
scientists. The studies therefore supported, refuted, or
failed to provide convincing evidence regarding the
theories to which they were related.
- Provide interesting work to social
scientists
SEs are interesting, frontier studies. They generate
considerable enthusiasm among social scientists",-
especially those who -work in research institutes' that
have the resources to do them well. SEs require skilled
staff and the latest statistical knowhow to do this kind
of-demanding work, and only a few organizations have over
time been able to establish and maintain the type of
expertise needed for such work. An analysis of the 143
SEs identified in The Digest of Social Experiments found
that three organizations dominate the conduct '.of SEs in
the USA: Abt Associates, the Manpower Demonstration
Research Center (MDRC), and Mathematica Policy Research
conducted almost half of the experiments reviewed
(Greenberg et al. '1999). In Canada, the Social Research
and Demonstration Association does most of the social
experiments.
One of, the interesting-things about SEs is that
economists are the investigators in most of them.
Economists, who haven't been known for their empirical
fieldwork, in a sense reinvented survey research for the
income maintenance experiments,: and developed sampling
and analysis techniques from their tradition Why
economists? Many of the topics deal with money. They are
testing schemes that expect to reduce government
expenditures. Do welfare-to-work programs reduce the
welfare rolls and welfare costs? Does nursing home=
reimbursement. increase intake of patients in need of
intensive care so that they do not have to stay in
(veryexpensive) hospitals? Do fob-finding-programs reduce
the length of time that unemployed workers receive
unemployment compensation? Another reason for the
frequent presence of economists is that money is easier
to measure than the outcomes that, often concern
sociologists and psychologists, such as "functional
ability" or "age-appropriate childhood development."
Policy makers and thepublic find data on costs and
savings more credible than fuzzier concepts. Economists
have the techniques to study and model data denominated
in dollars.
6. Limitations of SEs
Policy limitation
- Effects on decisions
When we review the history of social experiments, we
see that they have not had a decisive, direct effect on
the ensuing decisions. -Of our four examples, only the
welfare-to-work experiments were later reflected in
policy. Neither the health insurance experiment, the
nursing home incentive reimbursement experiment, nor the
income maintenance experiments made much of a dent at
all, and the findings were relegated to the great
analytical storehouse. Even in the welfare-to-work
experiments, where experiment results seemed to affect
later policy, the result was at best indirect.
Greenberg, Mandell, and their colleagues did
a .telephone interview study of welfare directors in the
states. They found that while most of the state directors
knew something about the findings of the welfare-to-work
experiments (although not the specifics), they didn't
believe the findings had influenced the policies of their
own state. What they did value was the demonstration that
states could administer the program without much problem
and a general sense that work first was better than
training first for former welfare recipients. In their
2003 book, Greenberg et al. conclude:
Ironically, however, even though these experiments
did have important effects on policy, their role was
nonetheless limited .In particular, many
policymakers already viewed the programs tested by
the welfare-to-work experiments as attractive on
other grounds. Findings from the experiments simply
reinforced that view. Consequently, rather than
being pivotal to whether the types of programs they
tested were adopted, they were instead used
persuasively and in designing these programs. In
other words, they aided policymakers in doing what
they already wanted to do. (2003, 308, 310)
Why should the results of SEs be so marginal? Why doesn't
rationality reign?
Social scientists are under no illusions that
"scientific evidence" will displace all other sources of
understanding. Policy making is also based on ideologies
and beliefs, interests, competing information, and
institutional norms (Weiss 1983, 1995). The results of
social experiments can nudge policy only a small
distance, and their influence is dependent in large part
on the interplay with the other factors in the policy
environment. Social scientists know that legislators and
administrative officials have long-standing beliefs and
principles that guide much of their orientation toward
policy. Their ideological orientation exerts powerful
influence over which policy proposals receive even a
hearing. Attitudes toward abortion and gay marriage are
obviously determined by ideology and principles, but it
is not only on such extreme issues that ideology often
prevails. For some policy makers, similarly strong
beliefs affect their views of the enactment of a draft,
the need for standardized performance tests in schools,
mandatory sentences for repeat offenders, and needle
exchange programs for drug addicts.
Interests are always powerful' influences on policy.
Drug manufacturers, farmers, radio station owners, state
and city service workers, trial lawyers, charities,
utility companies, universities, hospitals-almost every
organized body in the nation seeks to promote its own
well-being through public policy. The jostling among
organized interests provides much of the drama in the
policy arena. The scene is marked by the formation and
dissolution of temporary coalitions of interests as the
issues on, the agenda shift and change.
Nor does social science represent' the only form of
legitimate information.' The policy world is awash with
formation. Lobbyists hawk their- own version of past
events and futures. Media columnists and editorial
writers add to the stew. Many organizations have their'
own in-house information resources-databases, research
units, news services. Theavailability of 24/7-web-based
information' in titanic proportions makes getting
information much less difficult than interpreting the
information with a sense of history and context.
Furthermore, each-institution in the policy system
has its own setof rules and norms. The US Congress, =for
example, proceeds according to a 'system of committee
appointments, minority/majority representation on
committees, vote taking, reporting to the full body,
closing off debate, reconciling different versions of
bills passed by the-two houses, as well as time
schedules, budget limits, pressure group access, and so
on, that have major influence on the nature of policy
that emerges. Ron Haskins'(1991) tracked the instances
that the MDRC research was mentioned at various times in
the welfare reform policy process and found fewer and
fewer specific mentions of the MDRC research as the
welfare policy made its way through hearings, bill
writing, and consideration in the House and finally in
the HouseSenate Conference. The internal norms and
culture of each institution in the policy system:
exercise great pressure on its own activitiess and on the
activities of other institutions with which it interacts.
These four. sets of influences-ideology-and beliefs,
interests, other information, and institutional norms-set
limits to what social science .can contribute and how
much attention it can mobilize. Social experimentation;
as one small subset of social science research, is even
further constrained by the surround.
- Misuse of research findings
The results of SEs can be. misused in policy
discussions (Orr 1998). As with any source of
information, policy makers may choose to disregard
results if they are not congruent with their own beliefs
and political agendas. During the congressional welfare
reform debates, the welfare-to-work research was used to
argue that education and training were effective
strategies and that large amounts of federal funding were
needed to produce effects. In fact, education and
training received little attention in the programs
studied, and the experiments showed that relatively
lowcost job search and work experience were effective
(Haskins i99i).
Policy makers may take note of the general public
reaction. If the ouhiic is not interested or is skeptical
of certain results, policy makers have little incentive
to push forward any change based onn the results. Results
may not even reach the ears of policy makers if the
sponsoring agents of the studies themselves do not.like
the results. What goes to publication can be influenced
by the satisfaction (or dissatisfaction) of the agency
that asked and paid for the study in the first place.
Less insidious is a simple lack of dissemination of
experiments' results. In the nursing home incentive
study, the departure of the federal staffer who had
sponsored the studyf contributed to the lack of
dissemination of the findings. Few people learned of the
results, and little use was made of the findings
(Greenberg et al. 2003). A reanalysis of the data that
showed more positive results from incentives. (Norton
1992) we nt.almost totally unnoticed..
Contributing to the risk of misinterpretation or
misuse, policy makers may not have a particularlyhoned
sense for the quality of research or indeed have the
skills to interpret results correctly when they are
presented with them (they are not alone... it
is ,difficult for everyone). Policy;jnakers tend to rely
on indirect indicators of quality suchh as the reputation
of the,researchers, how the research community reacts to
the +results, and whether the research fits with their
own preconceived notions of what the results should be
(Orr 1998).
- Simplistic thinking
The results of SEs can be. misused in policy
discussions (Orr 1998). As with any source of
information, policy makers may choose to disregard
results if they are not congruent with their own beliefs
and political agendas. During the congressional welfare
reform debates, the welfare-to-work research was used to
argue that education and training were effective
strategies and that large amounts of federal funding were
needed to produce effects. In fact, education and
training received little attention in the programs
studied, and the experiments showed that relatively
lowcost job search and work experience were effective
(Haskins i99i).
Policy makers may take note of the general public
reaction. If the ouhiic is not interested or is skeptical
of certain results, policy makers have little incentive
to push forward any change based onn the results. Results
may not even reach the ears of policy makers if the
sponsoring agents of the studies themselves do not.like
the results. What goes to publication can be influenced
by the satisfaction (or dissatisfaction) of the agency
that asked and paid for the study in the first place.
Less insidious is a simple lack of dissemination of
experiments' results. In the nursing home incentive
study, the departure of the federal staffer who had
sponsored the studyf contributed to the lack of
dissemination of the findings. Few people learned of the
results, and little use was made of the findings
(Greenberg et al. 2003). A reanalysis of the data that
showed more positive results from incentives. (Norton
1992) we nt.almost totally unnoticed..
Contributing to the risk of misinterpretation or
misuse, policy makers may not have a particularlyhoned
sense for the quality of research or indeed have the
skills to interpret results correctly when they are
presented with them (they are not alone... it
is ,difficult for everyone). Policy;jnakers tend to rely
on indirect indicators of quality suchh as the reputation
of the,researchers, how the research community reacts to
the +results, and whether the research fits with their
own preconceived notions of what the results should be
(Orr 1998).
- Ability of research to work in the policy
world
Social experiments take place in the messy world,
The kinds of social scientists who have the requisite'
knowledge of research design, sampling, measurement, and
statistical analysis are not always the kinds of social
scientists who communicate well with political actors.
Experimenters in these circumstances have to listen. They
have to be aware of what policy options are feasible.
They should know the history of political battles already
waged on the turf. And still they have to know the
scientific literature and the intricacies of research
design and conduct. Such people can be hard to find. In
their stead come' highly skilled researchers who may have
little skill, and often less interest in aligning their
experiment with the world of politics.
- Heightened scrutiny
The results of social experiments may fare somewhat
better than other research findings as they are less
assailable by opponents. This occurs, in part as the
research community tends to support the results of
randomized experiments and thus, may present a more
unified front for policy makers trying to understand what
researchers believe. Thus, for example, the health
insurance experiment produced generalized agreement among
the research community that cost sharing could reduce.
health care . without detrimental : 6ects on~health-a
question that until then no study had adequately
answered. And yet; even some of the best social
experiments are open to methodological critique and
indeed sometimes may be treated to a more rigorous
critique than might be. expected due to their high
visibility in both the research and the policy worlds.
The school choice experiments are an example (e.g. Howell
and Peterson 20o4; Krueger and Zhu 2004). Because
parental choice of schools is such a politically loaded
issue, studies are scrutinized in meticulous detail.
Research limitation
Social experiments. are not, easy to bring off. To
be at all persuasive, social experiments require big
slugs of time, lots of money, powerful research
expertise, and enough flexibility to respond to changing
conditions and questions while the experiment is in
process. The impact of social experiments on policy
making is limited not only by the political process but
also by the constraints and limitations of the research
world. Social science methods themselves are not always
ideal for describing and analyzing complex policy issues.
- Design challenges
Researchers are plagued by a series of challenges
when conducting research in the real world. Experiments
pose difficulties all along the way. The first problem is
choice of sites. Even though the policy option that an
experiment is testing is usually intended to apply to all
members of the relevant group in the nation (or `he
state), the experiment cannot, be implemented among a
random sample chosen throughout the nation. The
intervention can, be offered (and studied) in only a few
places. Even the most expensive SEs have had to limit the
intervention' to a few sites. How does the researcher:
decide what sites are "typical" or "representative"
enough to.stand in for the nation as a whole? Researchers
avoid places: with :obviously unusual features, but much.
of the choice depends on which sites agree to cooperate.
Another problem is recruitment. The design demands
enlistment of nursing homes or low-income. households,
and the experimenter has to convince the required number
of units to sign on. About half of them have to be told
that they will not receive any new services but will be
required to give periodic information. Locating
participating units, explaining the conditions of the
experiment, and convincing them to participate is no
small task. Then there is the issue of when to tell
participants that they might be in the control group and
receive no service at all. Cook and Shadish (1994)
provide a balanced discussion of the pluses and minuses o
revealing the possibility of control group status a
various points in the recruitment process. It is an
important issue because if people (or organizations)
refuse to participate because they know about the no-
service possibility, the randomness of the assignment. is
compromised.
Another problem is being sure that the program is
being implemented as, planned. If, say, the state welfare
agency is not delivering the job-search services it is
supposed to be offering, i.ee the intervention is not on
offer, the SE would be testing the effects of a phantom
policy or of an unknown intervention of the agency's
own .devising. Results of the SE would be meaningless.
From experienrg, researchers have learned the importance
of monitoring the implementation of the intervention.
Probably the most basicdesign issue is implementing
and maintaining randomization. Often researchers do not
do the random assignment themselves. The operating agency
selects participants for its programs and in the process
is expected to assign participants to intervention and
control groups according to the protocols prepared by the
researchers. The actual assignment is "often' carried out
by a social worker, nurse, physician, or school district
official' (Cook and `Shadish 1994, 550. Sometimes these
people misunderstand what they are expected to do; and
sometimes they are tempted to use their professional
judgement in assignment decisions. Researchers have
learned that they must not only train agency staff but
also maintain an oversight presence to ensure that
assignment is indeed random.
Nor is that the end of the problem. What started as
true randomized assignment may become undone as time goes
on. In some cases the experiment does not enroll enough
participants. Agency staff therefore may raid the control
group to fill slots in the program. People labeled
"controls" may in truth receive the intervention. Or, and
this is inevitable, participants may drop out of the
program and the study. That would be fine if they dropped
out equally from intervention and control groups for
similar reasons. However, it is usually more common for
controls to drop out. They are not receiving services and
they have less reason to persevere. For example, in the
income maintenance experiments, higher drop-out rates
were registered in the control group and in some of the
experimental groups receiving smaller benefits than in
the more generous benefit groups.. The effect of
differential drop-out is to compromise the equality of
the groups. A selection bias is reintroduced.
In other cases, the control group may become
contaminated by being inadvertently exposed to the
intervention under study. Teachers receiving an
experimental professional development course may share
some of their new learnings with fellow teachers in their
school, regardless of their official "control" status.
The list of complications goes on and on. As
researchers have become more sophisticated over time and
with experience, they have identified a host of further
threats to the validity of SEs. Manski and Garfinkel
(1992) suggest that some interventions might cause
changes in norms and attitudes in the community, and. the
changed community attitudes would influence the success
of the intervention. Heckman (1992) and Heckman and Smith
(1995) have written that people who enlist in SEs may not
be representative of people who would participate in
full-scale programs. Moffitt (1992, 2004), too, has
worried about "entry effects," the conditions of a full-
scale program that would affect participants' behavior
that do not show up in small-scale experiments.
- Time
The worlds of research'and policy do not work in
tandem. Social experiments are time consuming; often
taking many years to design, implement, and finally
analyze and report results. The policy: process meanwhile
has moved forward and the results of a SE arrive in a
new, changed policy environment. Research results may
have little or no relevance' in this changed policy
world. For example, the health insurance experiment began
at a tune when the development of a national health care
system was under active consideration, and the impact of
cost sharing had real relevance. By the time the results
of the experiment were known, the health care debate had
petered out and rational health care was no longer an
imminent possibility. The relevance of the results; was;
greatly diminished (Greenberg et al. 2003).
In the past it has often taken four or five years:
(or more) before experimental results were ready. The
housing allowance experiment ran much longer. It studied
the effect of giving housing allowances to low-income
people not only on the families involved but also on the
supply of housing. It had to go on long enough for
landlords to increase the number of housing units
available to recipients off allowances. The study ran (in
two cities) for eleven years (Bradbury and Downs i98i).
On the other hand, some experiments are too short to
produce convincing results. The nursing home incentive
study ran for thirty months. Many nursing homes were
evidently not willing to change their practices in
response to the short-term monetary incentives. One of
the sponsoring agency's reports states:
To the participants [nursing homes]... it may seem a
very brief duration and there may be reluctance to make
staffing,, policy, and organizationalchanges which could
affect their environment long after the experiment is
concluded. (Greenberg et al. 2003,107)
Yet even within that brief' time period, the-study-
was not able to catch the wave. By the time it was
completed, political interest had moved away from
incentives- and toward regulation.
Foresight is not a particularly strong point of
social science. Trying to figure out what policy issues
will be lively at some future point is an exercise for a
soothsayer. Knowing how rapidly the political canvas
changes, knowing how volatile the complexion of
government is these days with the country divided almost
equally between Republicans and Democrats, knowing how
policy windows open and shut as the economy changes, can
we ever be confident that we are foreseeing an
appropriate mix of interventions? Many people worry about
issues of causation in experimentation. We worry aboutthe
clouded crystal ball. Fortunately or not, in recent years
SEs have become more modest. As noted in the next
paragraph, they are making do with available data, and
they are taking' less time to complete. But they are
testing more modest initiatives.
- Expense
Expense can limit the value that social experiments
can provide to policy making. There is generally a direct
relationship between the complexity of a research design
and its cost. The more, policy alternatives, settings, or
types of participants tested, the more expensive is the
experiment likely to be. Thus, cost plays a direct role
in limiting the relevance of the findings of social
experiments to particular policy questions.. Over time,
social experiments appear to be becoming simpler and
consequently cost less. Greenberg et al. (1999) suggest
that this is due in part to the increased use of
administrative databases rather than special surveys, an
increase in the likelihood that organizations that would
run the program are the ones involved in the social
experiment (as opposed to developing new programs run by
the research organization), simpler designs with fewer
groups, and shorter tracking periods for participants.
- Limits on how much can be tested
It is a rare experiment that can test all the
variations in a particular policy that may be relevant to
the question under study. Thus, the findings of social
experiments are limited only to specific alternatives
tested. SEs take place in a limited number of sites with
a particular set of participants, and the findings may
not generalize to other settings or participants. The
time horizon is often truncated (although not in the
health insurance experiment). Only a few social
experiments can assess trade-offs among components of the
intervention. Almost none are large enough to examine
differences among multiple subgroups of the client
population (the income maintenance experiments are an
exception). Few examine the behavior of the staff
implementing the program and so have little to say about
practices that are associated with better or• worse
outcomes. Costs of the intervention are not always
carefully . calculated (fo. example, mi the nursing' home
reimbursement experiment, officials were unable to
separate costs of running' the program from costs of the
study - (Greenberg and Shroder 1997)).
A distinction can be made' between "black box"
experiments, which test one or a few treatments, and
"response surface" experiments that test a wide range of
treatments (Greenberg et al. 20o3; Buttless x995).
Examples of the latter are the income maintenance
experiments of the 196os and 197os in which income
guarantees and tax rates were varied across the treatment
groups and the health insurance experiment in which cost
sharing was varied across the groups. Greenberg et al.
(2003) conclude that if the particular intervention that
is being tested is still on the policy agenda when the
experiment is concluded, the black box experiment would
be fine. However, that is almost never the case. The
advantage of the "response surface" experiment is that
the design allows for the estimation of elasticities over
a range of treatment options and its results can be used
in later simulation models well into the future.
- Small effect
Social experiments almost never produce slam-dunk
findings. If a proposed intervention were so obviously
superior, there would probably be little reason to
experiment. Most policy proposals are uncertain. The
results of experimentation are often marginal. There are
small gains in certain circumstances with some
subpopulations. Interpretation becomes critic .
Because experimentation is such a difficult craft,
the results are not always authoritative. Decisions about
the course of the experiment have to be made all along
the way. Compromises are made, sometimes in response to
crises in the environment;' sometimes to fit within a
budget, sometimes to suit the skills of the available
staff, sometimes to meet deadlines, sometimes in an
attempt to answer new questions that emerge in the course
of the study. Other researchers will critique the
findings. They may reanalyze the data. They will come up
with new models' that they claim better account for the
patterns in the data. The experiment can get' captured by
the research experts and become fodder: for struggles for
dominance.
- Fasibility of random assignment for
organizational/community intervention
Some innovative policy ideas involve intervening in
neighborhoods or systems or states. Rather than giving
service to individuals one at a time, the proposed policy
is designed to change the practices and culture of a
larger entity Examples include: changing the attitude of
welfare offices so that staff priority is to place the
client in a job; changing the practices in a neighborhood
so that families, restaurants, and law enforcement
agencies actively work to prevent youngsters from
drinking alcohol; and changing the culture of a school
system so that teachers and administrators actively
welcome parents to participate in their child's
education. To test ideas like these in an SE requires
study not o much of individuals as of the units that are
being altered welfare offices, neighborhoods, or school
systems. The interest is the behavior of the
collectivity.
The obvious solution. is to randomize the unit. A
certain number of school systems or neighborhoods might
be assigned randomly to the intervention or to a control
group. However, as the size of the unit increases (say,
to counties or states), fewer units can be. studied.,. It
is extraordinarily difficult and, expensive to study a
large number of neighborhoods or counties, and -few
studies have managed to go beyond ten or twelve. However,
with only a limited number of cases, the laws of
probability do not necessarily work. Any differences
observed between the intervention group and the control
group maybe the result of chance. There,are too few cases
to even out the lumps of chance. Therefore, randomization
of large units is a partial solution at best. Here is an
issue where research innovations are needed and are
currently being developed.
Another reason for the objection to random
assignment is that a city is not a city is not a city,
nor are neighborhoods interchangeable, or health systems
or schools. Each of them has a history. Each has a set of
established traditions. Each has a culture that has
developed over generations. Each has attracted particular
kinds of civic organizations-and program staff and
residents. Harlem is not the South Side of Chicago, which
is not Watts. P S 241 in Brooklyn is not the same as the
Condon School in Boston (Towne and. Hilton 2004). Even if
a researcher were randomly to assign neighborhoods, they
wouldn't be totally comparable, and differences observed
at the end might be due not so much to the intervention
as to the whole complex of prior history and culture.
For, example, an evaluation of a program to promote
nutritious food products randomly assigned supermarkets
in Washington and Baltimore. The intervention group of
markets placed nutritious products in favorable shelf
locations anddistributed fliers about nutrition. The
control group did nothing. The measure of success was the
customers' purchase of nutritious foods. Results showed
that there were more differences between the two cities
than between the experimental and control groups.
- Etnis
Ethical issues have dogged experimentation since its
beginning. People have displayed considerable concern
with withholding a social good from one group regardless
of degree of need. Practitioners are often loath to allow
services to be allotted on the basiss of chance, without
exercise of their own professional, judgement.
Beneficiaries of service object strongly to being placed
in a no-service control group. A host of ethical issues
(withholding services for those eligible, full disclosure
of experimental procedures, right to refuse, harm to
participants) may significantly limit the questions that
social experiments can address.
The rebuttal is that no one really knows whether the
service is a social "good" until it has been studied.
Many experiments find that the intervention is no better
than standard service--or even detrimental. Thus, the
nursing home reimbursement experiment did not show
positive effects from the reimbursement scheme. Bickman's
study of intensive mental health service, which included
all the professionally fashionable bells and whistles,
showed that intensive service did not have better results
than regular service (Bickman 1996).
- Complexcity of interventions
Perhaps the most vivid argument against experiments
is. that they assume that interventions have a simplicity
that can be captured in a treatment/no-treatment design.
Many interventions are highly complex social
interactions, and simple causeand-effect patterns may not
be easily detected. The "program" iss often implemented
differently by staff, and the desired outcomes are social
processes that cannot be readily measured by simple
metrics. Studying the effects of psychotherapy, for
example,_ voses all manner of problems because of the
inherently personal ways irn whicr, therapists work and
clients respond. No matter what label one affixes to the
"brand" of psychotherapy, or how assiduously one tries to
train therapists; to use the same procedures, critics
argue that quantitative randomized studies cannot yield
sensible results.
Similarly, educators often say that interactions
within a classroom, such as` the introduction of a new
teaching method, cannot be studied appropriately by
quantitative randomized techniques. The assumption that
all teachers trained in the new teaching method will
implement it consistently, and that children in all
classrooms will react in similar ways, represents a
fundamental misunderstanding of the variability of
teaching and learning. The rejoinder is that despite the
variability, which certainly introduces more error of
measurement, large samples should show the extent to
which mean scores (of social functioning, of math
achievement, of attendance) differ, across populations
exposed and unexposed to the intervention. In Cook's
(2001) words: "It is not an argument against random
assignment to claim that some schools are chaotic,,
the;implementation .ofa reform is usually highly
variable, and that treatments are' not completely
faithful to their underlying theories. There is enough
consistency in human behavior, experimentalists claim, to
allow an experiment to reach valuable conclusions about
whether an innovation is worth adopting.
7. Conclusions
We started this chapter with a
descpription of three distinctive traits of
SEs: research In the field, couducted through
random assignment of samples of prospective
beneficiaries to intervention and control
conditions,in order to tesr tht probable
success of a policy intervension. The first two
characteristics are increasingly accepted as
viable and necessary. Research in the field has
now become mainstream practice. Randomized
studies have received considerable support not
only from the research community (
although some researchers, particularly in the
field of education, have lodged vigorous
dissents) but also in congress. For example,
the education studies with randomized design.
It is the third feature that may on longer be
as firmly established the prospective test of
alternative policies.
SE came into prominence in the late 1960
at a time of turbulent policy change. It was
part of the climate of innovation and radical
reforn that was sweeping the country. In the
late 1980s and 1990s, as interest in
fundamental change lessened, the fortunes of
experimentation also shifted. Experiments
continued to be done, more of them in fact, but
fewer resources were devoted to them. The
emhises changed from major innovations to
marginal improvements in existing programs. In
burtless’s words, they were “ narrower”
(1995,63). Now, at a time of budget deficit and
fiscal stringenty in the USA ad elsewhere, the
likelihood of new domestic initiatives seems
low it is not a time when large new ideas will
be tested. At least with government funds. The
trend is to test minor modifications,
preterably cost-saving modifications, and
shifts of activity to the private sector. If
you were considering intestment in large-scale
SEs, our advice would be: hold off. The product
is a sound one, with high potential, but the
time is not now-at least in the USA. But hang
in. some version of SEs will have their day.
We also began our story with an outline of three
themes- the complexity of the policy world, the technical
complexity of the research world, and the alignment or
misalignment between experimental findings and policy
questions. Overall, SEs have showed the possibilities and
the limits of affecting policy through social science
research. They have contributed considerable new
knowledge. Some of their findings u have infiltrated the
policy arena and are part of policy-speak (Anderson 2003;
Weiss 1999): Influentials in Congress, federal agencies,
international organizations, interest groups, and the
media learn to be conversant with experimental findings
in order to take an informed part in, policy
conversation.
On the other hand, there are no examples of an SE
that led directlyy to policy change. Results of the
health insurance experiments were so late and so
unfocused on actual legislative proposals that they were
pretty much, ignored except by economists, who have used
them to model new proposals. The nursing home
reimbursement experiment results also arrived late, after
the zing had gone out of the incentive idea. Almost
nobody was still interested in incentives for nursing
homes; the action was in the area of regulation. While
widely published, the .income maintenance experiments led
to little concrete change in policy. The welfare-to-work
experiments seemed to have policy consequences. The MDRC
study provided support for mandatory work-first
requirements and demonstrated the ability of states to
design and manage their own welfare programs. All three
of these program design aspects ultimately ended up in
the Family Support Act of 1998. Nevertheless as we have
seen, the experiment merely reinforced what policy makers
were planning to do on other grounds.
Because poli making is such a complicated business,
with so many players pursuing suc divergent interests, it
is overly optimistic to expect research information to
carry the day. Even the high-quality information supplied
by SEs cannot overwhelm all the other forces on the,
scene. And as we have seen, the timing of SEs is often
off. The policy agenda moves on, while the SE is still
studying last year's proposals.
Yet, totting up advantages and disadvantages, we
come out in favor of further experimentation. The world
is in dire need of greater understanding of the
consequences of government .action. Social
experimentation cannot fully satisfy the needs for
knowledge about policy outcomes, partly because of the
intrinsic nature of social science research and partly
because of the limitations imposed by the conditions
under which it is done. Still it makes headway. Anything
that advances rationality in the messy world of policy is
worth supporting. Not venerated or kowtowed to, but
cheered on.
But we also need to moderate our expectations of the
contributions that SE can make. The notion of basing
policy strictly on experimental evidence is wrong-headed.
SE doesn't tell everything that a polity needs to know
about a pending policy option. Many other considerations
have to go into government action, such as popular
demands, costs, capabilities available for implementing
the policy, competing needs, effects on neighboring
policies, and so on. Resolution comes through politics.
Although the word has fallen A evil: times, politics is
the systemm we have for resolving differences in our:
complex societies and reaching decisions that are at
least minimally acceptable to all parties (for a
resounding affirmation of politics, see Crick 1972),
Evidence of polity outcomes cannot and should not
supplant the play of politics as the basis of policy. Of
course, we do not want to see policy developed on the
basis. of faulty understanding of the situation or
unrealisticc expectations for the effects of action, but
it does seem presumptuous to think that experimental data
alone can point to the best resolution of complex policy
issues. History matters, as do political culture and
institutional practices. What SE can do is illuminate the
understanding of publics and elites and infuse pohcy
discussion with insight.
Scienceand politics cohabit in the policy sphere,
but their alliance is an uneasy one. Social scientists,
to put the best face on the relationship, have pointed to
the "value-added" features that social science 'brings
'to the table an inventory of knowledge for the future to
draw on, general enlightenment of elites and publics in
the present, puncturing of faulty assumptions, and
confirmation of wise instincts for action.' But for all
the understanding and insight contributed by the social
sciencesand by SEs in particular-they do iota run the
show. There is inevitable tension between science and
politics, and convergence is usually a happy accident.
Eksperimen sosial bagi kebijakan public
1. Eksperimen kebijakan
Pernyataan yang dibuat oleh Inggris konservatif enoch politikus powell, highlihghts fakta pembuatan kebijakan publik tidak hanya melibatkan seni tinggi prinsip, kecerdasan, dan persuasi, tetapi juga bermain kepentingan dan mendorong dan mengangkut partisan untuk kekuasaan dan kontrol . Sementara sentralitas kepentingan dan prasangka telah menerima banyak perhatian baik di media ilmiah dan populer, itu adalah powell itu "tebakan tentang masa depan" dan bahwa "staf ekonom" yang menjadi perhatian kita dalam bab ini.
Kebijakan pasti jadi berurusan dengan anuncertain future.even dengan pletora statistis rangkaian dan penelitian kebijakan yang ada saat ini, pengambilan keputusan harus berdasarkan beberapa derajat tebakan. Ekonom Powell yang memproyeksikan melewati tren ke masa depan, sekarang ditambahkan oleh pakar sosiologi beberapa rona, menumpahkan kadang-kadang cahaya kelak-kelik pada apa efek intervensi kebijakan akan be.it adalah mendapat lebih dekat pada pengertian efek-efek mungkin itu percobaan sosial, satu kebijakan prospektif dilahirkan. Ide sederhana mencoba satu kebijakan secara kecil-kecilan dan melihat apa happes.
Sejak akhir 1960-an, membelanjakan percobaan sosial proposal-proposal olicy di USA telah dimakan lebih satu milyardolar ( burtless 1995). Dalam bab ini kita sifat eksperimen sosial yang sudah diselenggarakan di masa lalu empat puluh years.we mengulas upaya banyak ilmuwan sosial dan ekonom membangunkan bukti empiral sistematis tentang keuntungan-
keuntungan dan kerugian-kerugian yang mungkin proposal-proposal kebijakan khusus perilaku throughthe sosial eksperimen sendiri dan mencoba memproyeksikan garis tren saat ini ke masa depan kabur.
2. Definisi
Eksperimen sosial diacak percobaan lapangan interventation sosial. Dalam itu rubrik, dua penegasan berdesak-desakan untukutama ( dan sepertiga tekanan memberi label bersama). Beberapapenulis mendefinisikan eksperimen sosial dengan menekankan persidangan dalam ditunjuk secara acak percobaan lapangan. Untuk mereka, tanda resmi adalah intervensi prospektif mencobasecara kecil-kecilan sebelum ia secara luas adopted.not hanya ialah ia mencoba" ia dipelajari dalam versi perintisnya. Tujuan adalah untuk baik-baik saja keluar apakah intervensi mencapai tujuannya. Jika jadi, asumsi adalah pembuat kebijakanharus mengadopsinya di basis.there seluruh sistem adalah rasa niat sadar untuk mempengaruhi kebijakan , dan sering ini niat ditemani oleh perasaan mendesak apabila jendela kebijakan yhe membuka.
Authours lain menempatkan stres di pengacakan. Itu pengacakan yang memberikan experimentrs memiliki kepercayaan yang intervensi adalah couse apa pun perubahan diperhatikan. Di suatu studi rambang, eksperimen memilih sampel dari populasi yang sama, menugaskan seseorang untuk intervensi, atau condition.and "eksperimental" lain untuk satu "pengendalian" condition.at akhir waktu itu, kelompok adalah compared.inasmuch sebagai mereka sangat banyak sama pada bintang dan satu-satunya yang membedakan dari waktu ke waktu wasexposure untuk intervensi. Dari satu sudut pandang metodologis, pengacakan memberikan kepercayaan orang-orang yang bereksperimen dalam estimasi efek mereka.
Fokus ketiga di definisi sosial sxperiments, sekarang secara luas diacuhkan adalah persidangan dilakukan dalam ladang. Pergi adalah suasana nyaman laboratorium untuk hasil-hasil dtudying. Agak ilmuwan sosial counducts studi di daerah di mana kebijakan aactual akan diadakan. Oleh karenanya kita telah menunjuk secara acak sidang-sidang pengadilan fied. Jika tekanan di pengacakan menerima sebagai asas-asas petunjuk. Kemudian apa pun studi hasil yang didambakan dilakukan melalui pengacakan adalah satu SE.Such sebuah definisi menyapu di evaluasi-evaluasi banyak program yang ada.Banyak evaluasi program sosial diarahkan setelah acara itu diundangkan , dan beberapa evoluation (meskipun tidak hampir sebanyak penilai ingin ) randomize peserta propektive ke "eksperimental" dan grup-grup kontrol. Setelah periode waktu, penilai membandingkan status kedua kelompok di indicatiors diinginkan ( status e.g.health, pendapatan, lulus sekolah). Menyelimuti seperti itu pos evaluasi-evaluasi thoc ke kategoriitu SEs memperluas kategori itu secara substansial.
Jika kita membatasi diri kita ditunjuk secara acak studi melakukan dalam membuktikan basis memandu adopsi kebijakan mendatang, kita memiliki lapangan yang lebih berfokus enquiry.it adalah definisi kita mengadopsi di sini. Tentu saja, perbedaan tidak pasti. Beberapa evaluasi program yang ada diharapkan untuk memandu pengulangan bakal program yaitu mendorong kearah modifikasi dan perbaikan di sebuah model mungkin untuk kebijakan federal (menyatakan sebagai "laboratorium demokrasi"),what adalah satu evoluation pada satu tingkat adalah satu SE pada lain Still, perbedaan bermanfaat menunggu. Ia penting mempertimbangkan tujuan utama di mana SE dilakukan serta desain risetnya.
3. Sejarah
Dengan sedikit kesulitan kita mungkin bisa melacak SES kembali ke Francis Bacon, tetapi cukup sejarah untuk kembali ke Sidney dan Beatrice Webb. Dalam buku mereka tahun 1932, Metode Studi Sosial mereka berdebat untuk kebijakan sosial berbasis ilmiah dalam kata-kata yang memiliki resonansi luar biasa untuk zaman kita sendiri. Mereka menganjurkan penelitianyang dilakukan oleh ilmuwan sosial yang terlatih dalam metode eksperimental yang melakukan investigasi sosial yang mandiri dan mengirimkan hasilnya ke mereka yang membuat kebijakan sosial. Metode yang sebenarnya, sebagai Ann Oakley (1998a) telah menunjukkan, dikembangkan oleh pendidik dan psikolog di Amerika Serikat pada abad kedua puluh latenineteenth dan awal.Filsuf, Charles S, a? Eirce, ayah dari "pragmatisme," memperkenalkan ide pengacakan menjadieksperimen psikologis dalam i88os. Beberapa: dari. studi awal ditangani dengan pengalihan keterampilan memori dari satu matapelajaran, yang lain (Oakley mengutip Thorndike dan Woodworth 19o1 dan Winch 1908). Para peneliti psikologis menemukan teknik untuk secara acak menugaskan subjek untuk pengobatan eksperimental .. RA Fisher yang melakukan itu. penelitian di bidang pertanian dan dikembangkan banyak yang telah menjadi biasa dalam statistik, secara luas dikenal karena memperjuangkan metode pengacakan.
- Desain tantangan
para Peneliti diperburuk oleh suatu siri tantangan ketikamelakukan penelitian di dunia nyata. Percobaan menimbulkan kesulitan sepanjang jalan. Masalah pertama adalah pilihan darisitus. Walaupun pilihan kebijakan yang percobaan adalah untuk menguji biasanya dimaksudkan untuk berlaku untuk semua anggotadari kelompok yang relevan di dalam bangsa (atau 'dia negara),
percobaan tidak dapat diimplementasikan di antara sebuah contoh dipilih acak seluruh bangsa. Campur tangan, yang dapat ditawarkan (dan belajar) dalam hanya beberapa tempat. Bahkan yang paling mahal SEs telah untuk membatasi campur tangan' untuk beberapa situs. Bagaimana Cara peneliti: memutuskan apa yang situs adalah "biasa" atau "perwakilan" cukup untuk.berdiri di untuk seluruh negara? Para peneliti menghindari tempat: dengan :fitur jelas tidak lazim, tetapi banyak dari pilihan yang tergantung pada situs setuju untuk bekerja sama.
- Sederhana berpikir
hasil SEs dapat. disalahgunakan dalam diskusi-diskusi kebijakan (Orr 1998). Dengan apa pun sebagai sumber informasi, para pembuat kebijakan dapat memilih untuk mengabaikan hasil jika mereka tidak congruent dengan keyakinanmereka dan agenda politik. Selama kongres reformasi kesejahteraan perdebatan, kesejahteraan-untuk-kerja riset ini digunakan untuk memperdebatkan bahwa pendidikan dan pelatihan adalah strategi yang efektif dan dalam jumlah besar dana federal yang diperlukan untuk menghasilkan efek. Pada kenyataannya, pendidikan dan pelatihan menerima sedikit perhatian dalam program belajar, dan percobaan yang menunjukkan bahwa relatif lowcost ayub pencarian dan pengalaman kerja yang efektif (Haskins aku99i).
Para pembuat kebijakan dapat mengambil catatan dari reaksi masyarakat umum. Jika ouhiic tidak tertarik atau skeptis terhadap hasil tertentu, para pembuat kebijakan memiliki sedikit insentif untuk mendorong perubahan berdasarkan onn hasil. Hasil tidak mungkin bahkan mencapai telinga dari para pembuat kebijakan jika mensponsori agen daristudi tersebut sendiri tidak.seperti hasil. Apa yang akan menyala untuk publikasi dapat dipengaruhi oleh kepuasan (atau
ketidakpuasan) badan yang meminta dan dibayar untuk belajar ditempat pertama. Kurang berbahaya yang sederhana adalah kurangnya diseminasi hasil percobaan. Dalam studi jompo insentif, pemergian federal mengatakan yang disponsori oleh studyf yang memberikan kontribusi untuk kurangnya penyebaran hasil penemuan tersebut. Beberapa orang belajar dari hasil, dan sedikit menggunakan dibuat tentang penemuan (Greenberg et al. 2003). SEBUAH reanalysis dari data yang menunjukkan lebihbanyak hasil yang positif dari insentif. (Norton 1992) kita perjanjian baru.hampir sama sekali tidak terlihat..
Memberikan kontribusi terhadap risiko-risiko kesalahpahaman atau penyalahgunaan, para pembuat kebijakan mungkin tidak memiliki particularlyhoned pengertian untuk kualitas penelitian atau memang memiliki keahlian untuk menafsirkan hasil dengan benar, apabila mereka dihadapkan dengan mereka (mereka tidak sendirian ... adalah ,sulit untuk setiap orang). Kebijakan;jnakers cenderung bergantung pada tidak langsung dari indikator kualitas suchh sebagai reputasi,para peneliti, bagaimana penelitian masyarakat bereaksi terhadap hasil, dan apakah penelitian cocok dengan berbagai praduga mereka sendiri tentang apa yang hasil harus (Orr 1998).
- Kemampuan penelitian untuk bekerja dalam kebijakan
sosial dunia percobaan mengambil tempat di dunia yang kacau, jenis para ilmuwan sosial yang dipersyaratkan pengetahuan penelitian desain, pembagian sampling, pengukuran,dan analisis statistik tidak selalu jenis-jenis para ilmuwan sosial yang berkomunikasi dengan para aktor politik. Experimenters dalam situasi seperti ini telah mendengarkan. Mereka telah mengetahui apa yang pilihan kebijakan yang layak.Mereka harus mengetahui sejarah politik telah melancarkan
peperangan pada teritorial. Dan mereka masih harus mengetahui literatur ilmu pengetahuan dan pelupuk-merancang dan melakukanpenelitian. Orang-orang seperti dapat sulit untuk menemukan. Dalam menggantikan mereka datang' terampil para peneliti yang mungkin telah sedikit keterampilan, dan sering kurang suku bunga di meluruskan eksperimen dengan dunia politik.
- Meningkatkan pengawasan
hasil-hasil percobaan sosial tiket mungkin agak lebih baik dari penelitian lain seperti temuan mereka kurang assailable oleh lawan. Ini terjadi, di bagian sebagai penelitian masyarakat cenderung untuk mendukung hasil diacak percobaan, dan dengan itu, mungkin ada yang lebih unified depan untuk para pembuat kebijakan mencoba untuk memahami apa yang para peneliti percaya. Oleh itu, misalnya, asuransi kesehatan umum percobaan menghasilkan perjanjian di antara masyarakat penelitian yang dapat mengurangi biaya pemakaian. kesehatan care . tanpa merusak : 6ects)pada~kesehatan-sebuah pertanyaan yang sampai kemudian tidak memadai studi menjawab. Namun, bahkan beberapa best social percobaan terbuka untuk metode kecaman dan memang kadang mungkin akan diperlakukan untuk yang lebih ketat kritik dari mungkin. diharapkan untuk mereka yang tinggi karena jarak pandang baik dalam penelitian dan kebijakan semesta alam. Sekolah pilihan percobaan ini adalah contoh (mis. Howell dan Peterson 20 hai4; Krueger dan Zhu 2004). Karena orang tua adalah pilihan sekolah seperti politik yang dimuat masalah, studi cermat dalam seksama secaraterperinci.
Masalah lain adalah memastikan bahwa program tersebut dilaksanakan sebagai, direncanakan. Jika, katakanlah, badan
kesejahteraan negara tidak memberikan pekerjaan-pencarian layanan yang seharusnya menjadi korban, aku.ee campur tangan tidak ditawarkan, SE akan menguji dampak dari kebijakan antaraTepi One Phantom atau yang tidak diketahui campur tangan badantersebut sendiri .merencanakan. Hasil dari SE akan ada artinya. Dari experienrg, peneliti yang telah belajar pentingnya memantau implementasi dari campur tangan.
Mungkin yang paling basicdesign masalah adalah menerapkandan mempertahankan randomization. Para peneliti seringkali tidak melakukan penetapan acak diri mereka sendiri. Badan operasi memilih para peserta untuk program-program dan di dalam proses ini diharapkan untuk menetapkan peserta untuk campur tangan dan kontrol kelompok sesuai dengan protokol yangdisediakan oleh para peneliti. Penetapan sebenarnya adalah "sering' yang dilakukan oleh pekerja sosial, perawat, dokter, atau sekolah resmi kabupaten' (Cook dan 'Shadish tahun 1994, 550. Kadangkala orang ini salah mengerti apa yang mereka harapkan untuk melakukan, dan kadang-kadang mereka tergoda untuk menggunakan profesional mereka dalam penetapan keputusanpenghakiman. Para peneliti telah belajar bahwa mereka tidak hanya harus melatih staf badan tetapi juga mempertahankan pengawasan kehadiran untuk memastikan bahwa penetapan memang secara acak.
Hal itu bukan akhir dari masalah tersebut. Apa Yang dimulai sebagai benar diacak penetapan mungkin menjadi diurungkan sebagai masa terus berjalan. Dalam beberapa kasus percobaan tidak mendaftar cukup peserta. Staf Badan itu mungkin raid grup kontrol untuk mengisi slot dalam program ini. Orang berlabel "kontrol" mungkin dalam kebenaran menerimacampur tangan. Atau, dan ini tidak dapat dihindari, para peserta dapat turun dari program dan studi. Yang akan menjadi baik jika mereka turun dari sama dari campur tangan dan kelompok kontrol yang serupa dengan alasan. Bagaimanapun,
biasanya lebih umum untuk kontrol untuk drop out. Mereka tidakmenerima layanan dan mereka telah kurang alasan untuk bersabar. Misalnya, dalam pendapatan pemeliharaan percobaan, lebih tinggi tingkat putus telah terdaftar dalam grup kontrol dan dalam beberapa kelompok eksperimental menerima lebih kecildaripada manfaat yang lebih luas dalam kelompok manfaat.. Efekdifferential drop out adalah untuk berkompromi kesetaraan darikelompok tersebut. Pilihan bias diperkenalkan semula.
Dalam kasus lain, grup kontrol yang mungkin menjadi terkontaminasi oleh yang secara tidak sengaja didedahkan kepada campur tangan dalam penelitian. Guru menerima eksperimental kursus pengembangan profesional dapat berbagi beberapa dari mereka pelajari baru dengan sesama guru-guru di sekolah, terlepas dari resmi mereka "kontrol" status.
Daftar komplikasi akan menyala dan pada. Sebagai penelitiyang telah menjadi lebih canggih sepanjang masa dan dengan pengalaman, mereka telah diidentifikasi host-ancaman lebih lanjut terhadap kesahihan SEs. Manski dan Garfinkel (1992) mencadangkan bahawa beberapa intervensi dapat menyebabkan perubahan dalam norma dan sikap dalam masyarakat, dan perubahan sikap masyarakat akan mempengaruhi keberhasilan intervensi. Heckman (1992) dan Heckman dan Smith (tahun 1995 )telah menulis bahwa orang-orang yang berakar dalam SEs mungkintidak perwakilan dari orang-orang yang akan berpartisipasi dalam skala program. Moffitt (tahun 1992, tahun 2004 ),, juga memiliki khawatir tentang "entri efek," kondisi penuh-skala program yang akan mempengaruhi perilaku para peserta yang tidak muncul dalam skala kecil percobaan.
- Fasibility acak penetapan untuk organisasi/masyarakat
kebijakan campur tangan beberapa ide inovatif melibatkan campur tangan dalam lingkungan atau sistem atau menyatakan. Daripada memberikan layanan kepada individu pada waktu, usulankebijakan ini dirancang untuk mengubah praktik dan budaya yanglebih besar dari Contoh entiti termasuk: mengubah sikap kesejahteraan sehingga staf kantor prioritas utama adalah tempat klien dalam pekerjaan, mengubah amalan-amalan dalam sebuah lingkungan sehingga keluarga, restoran, dan aparat penegak hukum secara aktif bekerja untuk mencegah pembalap muda dari minum alkohol, dan mengubah budaya dari sebuah sistem sekolah sehingga guru dan administrator secara aktif menerima orang tua untuk berpartisipasi dalam pendidikan anak.Untuk menguji gagasan seperti ini dalam sebuah SE memerlukan studi tidak ya banyak individu sebagai unit yang sedang diubahkantor kesejahteraan, lingkungan, atau sistem sekolah. Suku bunga adalah perilaku kolektivitas dinomorduakan.
Solusi yang jelas. adalah untuk randomize unit. Beberapa sekolah atau lingkungan sistem mungkin akan ditetapkan secara acak untuk campur tangan atau ke sebuah kontrol grup. Namun, sebagai ukuran unit meningkat (mengatakan, untuk daerah atau menyatakan), lebih sedikit unit dapat. belajar. ,. Ia sangat sulit dan mahal, untuk mempelajari sejumlah besar dari lingkungan atau negara, dan -beberapa studi, telah berusaha untuk pergi lebih jauh dari sepuluh atau dua belas. Namun, dengan hanya terbatas jumlah kasus, hukum dari kemungkinan tidak perlu bekerja. Apa pun diamati perbedaan antara campur tangan dan kelompok-kelompok kontrol mungkin terjadi secara kebetulan. Ada,adalah terlalu sedikit kasus untuk bahkan dari gumpalan secara kebetulan. Oleh karena itu, randomization dariunit besar adalah sebagian solusi terbaik di. Di sini adalah sebuah masalah di mana penelitian dan inovasi yang dibutuhkan saat ini sedang dikembangkan.
Alasan lain untuk mengajukan keberatan untuk secara acak penetapan adalah bahwa kota itu tidak ada kota yang tidak ada kota, atau di lingkungan dapat dipertukarkan, atau sistem kesehatan atau sekolah. Masing-masing memiliki sejarah. Masing-masing memiliki set didirikan tradisi. Masing-masing memiliki sebuah budaya yang telah berkembang selama beberapa generasi. Setiap jenis tertentu telah menarik dari organisasi sipil-dan staf program dan penduduk. Harlem tidak Sebelah selatan dari Chicago, yang tidak Watt. P S 241 sampai di Brooklyn tidak sama dengan Condon Sekolah di Boston (Towne dan. Hilton 2004). Bahkan jika seorang peneliti telah secara acak untuk menetapkan lingkungan, mereka tidak akan dapat dibandingkan dengan sepenuhnya, dan perbedaan diamati pada akhir mungkin karena tidak begitu banyak hal untuk campur tangan sebagai ke seluruh kompleks sebelum sejarah dan budaya.Untuk, misalnya, evaluasi dari sebuah program untuk mempromosikan produk makanan bergizi secara acak ditetapkan supermarket di Washington dan Baltimore. Campur tangan kelompok pasar produk bergizi ditempatkan di lokasi kepingan menguntungkan anddistributed flyer mengenai gizi. Grup kontroltidak. Ukuran keberhasilan pelanggan membeli makanan yang bergizi. Hasil menunjukkan bahwa ada lebih banyak perbedaan antara dua kota dari antara eksperimental dan kelompok kontrol.
Pengeluaran biaya dapat membatasi nilai percobaan sosial yang dapat memberikan untuk pembuatan kebijakan. Pada umumnya ada hubungan langsung antara kerumitan desain penelitian dan biaya. Yang lebih, kebijakan alternatif, pengaturan, atau jenis peserta diuji, lebih mahal adalah percobaan mungkin. Oleh itu, biaya memainkan sebuah peran langsung dalam membatasi relevansi temuan-temuan dari sosial percobaan untuk pertanyaan kebijakan tertentu.. Sepanjang waktu, sosial
percobaan muncul akan menjadi lebih sederhana dan akibatnya biaya kurang. Greenberg et al. (1999) mencadangkan bahawa ini adalah karena meningkatnya penggunaan administratif khusus database daripada survei, peningkatan dalam kemungkinan bahwa organisasi yang akan menjalankan program tersebut adalah orang-orang yang terlibat dalam percobaan sosial (seperti yangditentang untuk mengembangkan program baru yang dijalankan oleh organisasi penelitian), lebih sederhana dengan desain kelompok lebih sedikit, dan lebih pendek pelacakan masa untuk para peserta.
- Batas pada seberapa jauh dapat diuji
ini jarang percobaan yang dapat menguji semua variasi dalam kebijakan tertentu yang mungkin sangat relevan untuk pertanyaan di bawah studi. Oleh itu, temuan-temuan dari percobaan sosial terbatas hanya untuk diuji alternatif tertentu. SEs mengambil tempat di dalam jumlah terbatas dengansitus set tertentu dari para peserta, dan penemuan mungkin tidak menyamaratakan untuk pengaturan lain atau peserta. Waktuhorizon adalah sering dipotong (walaupun tidak dalam percobaanasuransi kesehatan). Hanya beberapa percobaan sosial dapat menilai trade-off antara komponen campur tangan. Hampir tidak ada yang cukup besar untuk memeriksa perbedaan di antara beberapa subkumpulan klien penduduk (pendapatan pemeliharaan percobaan pengecualian). Beberapa mengkaji perilaku staf menerapkan program dan sedikit untuk berkata tentang amalan-amalan yang dikaitkan dengan lebih baik atau lebih buruk• hasil. Biaya dari campur tangan tidak selalu hati . dihitung (fo. contoh, mi yang menyusu penggantian rumah percobaan, parapejabat tidak dapat memisahkan biaya-biaya dari program studi tersebut - (Greenberg dan Shroder 1997)).
YANG dapat membuat perbezaan antara "kotak hitam" percobaan, yang menguji satu atau beberapa pengobatan, dan "respon permukaan" percobaan yang menguji berbagai pengobatan (Greenberg et al. 20 Hai3; Buttless x995). Contoh yang terakhir adalah pendapatan pemeliharaan percobaan yang 196os dan 197os yang jaminan pendapatan dan tarif pajak yang beragamdi seluruh pengobatan kelompok dan asuransi kesehatan dalam percobaan yang biaya pemakaian adalah beragam di seluruh kelompok tersebut. Greenberg et al. (2003) menyimpulkan bahwa jika campur tangan tertentu yang sedang diuji masih pada agenda kebijakan ketika percobaan ini menyimpulkan, kotak hitam percobaan akan baik. Namun, yang hampir tidak pernah terjadi. Kelebihan dari "respon permukaan percobaan" adalah bahwa rancangan memungkinkan untuk estimasi elasticities dari berbagai pilihan pengobatan dan hasil-hasilnya dapat digunakandalam kemudian model simulasi baik ke masa depan.
- Waktu
dunia'dan kebijakan penelitian tidak bekerja di bahu membahu. Percobaan Sosial yang memakan waktu, yang sering mengambil bertahun-tahun untuk desain, implementasi, dan akhirnya menganalisa dan hasil laporan. Kebijakan: proses sementara itu telah maju ke depan dan hasil-hasil yang SE tibadi yang baru, mengubah kebijakan lingkungan. Hasil Penelitian mungkin memiliki sedikit atau tidak ada kaitan' di dunia ini mengubah kebijakan. Misalnya, asuransi kesehatan percobaan bermula pada sebuah lagu ketika nasional sistem perawatan kesehatan di bawah aktif pertimbangan, dan dampak dari biaya pemakaian relevansi telah nyata. Pada saat hasil percobaan inidiketahui, perawatan kesehatan perdebatan petered keluar dan rasional kesehatan care tidak lagi yang mengancam kemungkinan.Relevansi hasil, adalah; sangat berkurang (Greenberg et al. 2003).
Di masa lalu telah sering mengambil empat atau lima tahun: (atau lebih) sebelum hasil percobaan telah siap. Tunjangan perumahan percobaan berlari lebih lama lagi. Ia mempelajari efek memberi tunjangan perumahan untuk penduduk berpenghasilan rendah tidak hanya pada keluarga terlibat tetapi juga pada penyediaan perumahan. Ia harus pergi pada cukup lama untuk landlords untuk meningkatkan jumlah unit rumah tersedia untuk penerima dari pemberian. Studi berlari (di dua kota) selama sebelas tahun (Bradbury dan Pasang Surut saya98i).
Di sisi lain, beberapa percobaan terlalu pendek untuk menghasilkan meyakinkan hasil. Rumah yang menyusu insentif untuk studi berlari tiga puluh bulan. Banyak perawat rumah ternyata tidak bersedia untuk mengubah praktik mereka dalam respon untuk jangka pendek insentif moneter. Salah satu sponsor laporan badan menyatakan:
Untuk para peserta [perawat homes] ... mungkin kelihatannya sangat jangka waktu yang singkat dan tidak mungkin keengganan untuk membuat kepegawaian" kebijakan, dan organizationalchanges yang dapat mempengaruhi lingkungan mereka lama setelah percobaan ini menyimpulkan. (Greenberg et al. 2003,107)
Namun bahkan dalam waktu yang singkat masa,-studi-tidak dapat menangkap persembahan unjukan. Pada saat ini telah selesai, kepentingan politik telah berpindah dari dari insentif- dan terhadap peraturan.
Seringnya 'tidak titik yang khususnya kuat dari ilmu sosial. Mencoba untuk mengetahui apa yang isu kebijakan akan hidup di beberapa titik masa depan adalah sebuah latihan untukseorang paranormal. Mengetahui seberapa cepat atas kanvas perubahan politik, mengetahui bagaimana volatile yang berwajahpemerintah yang hari ini dengan negara tersebut dibagi hampir
sama antara Republikan dan Demokrat, mengetahui bagaimana kebijakan windows membuka dan menutup sebagai ekonomi perubahan, dapat kita pernah merasa yakin bahwa kita adalah ramalan campuran yang sesuai dari intervensi? Banyak orang khawatir tentang masalah-musabab dalam percobaan. Kita khawatir aboutthe mengaburi bola kristal. Untungnya atau tidak, dalam tahun-tahun belakangan ini SEs telah menjadi lebih sederhana. Seperti yang dicatatkan di dalam paragraf selanjutnya, mereka akan membuat dengan data yang ada, dan mereka mengambil' sedikit waktu untuk menyelesaikan. Tetapi mereka adalah pengujian inisiatif lebih sederhana.
- Kecil
Sosial efek percobaan hampir tidak pernah menghasilkan slam dunk-penemuan. Jika usulan campur tangan itu jelas sekaliunggul, tidak mungkin akan menjadi sedikit alasan untuk percobaan. Paling usulan kebijakan yang tidak pasti. Hasil-hasil percobaan sering marginal. Ada kenaikan kecil dalam keadaan tertentu dengan beberapa subpopulations. Penafsiran menjadi kritikus .
Karena percobaan adalah seperti kerajinan yang sulit, hasil-hasil tidak selalu berwibawa. Keputusan tentang kursus dari percobaan harus dibuat di sepanjang jalan. Kompromi telahdibuat, terkadang dalam menanggapi krisis di lingkungan." untuk terkadang pas dalam anggaran, terkadang sesuai dengan keahlian staf yang tersedia, terkadang untuk memenuhi tenggat waktu, terkadang dalam sebuah upaya untuk menjawab pertanyaan baru yang muncul dalam kajian ini. Para peneliti lainnya akan kritik terhadap hasil penemuan tersebut. Mereka mungkin reanalyze data. Mereka akan datang dengan model baru' yang mereka menyatakan lebih baik untuk akun pola dalam data. Percobaan dapat mendapatkan' tertangkap oleh para ahli
penelitian dan menjadi makanan ternak: untuk berjuang untuk dominasi.
- Etnis
masalah etika telah menguji percobaan sejak awal. Orang yang telah ditampilkan cukup menimbulkan keprihatinan dengan menahan sosial baik dari satu grup terlepas dari tingkat kebutuhan. Praktisi adalah sering menunjukkan keengganannya untuk memungkinkan layanan akan diberikan pada basiss kebetulan, tanpa latihan profesional mereka sendiri, penghakiman. Para penerima manfaat dari layanan kuat untuk obyek yang ditempatkan di dalam tidak-kontrol layanan grup. Host-masalah etika (enggan memberikan layanan kepada mereka memenuhi syarat, keterbukaan penuh eksperimental prosedur, hakuntuk menolak, memberi mudarat kepada para peserta) mungkin secara signifikan membatasi pertanyaan yang sosial dapat alamat percobaan.
Bantahan Itu adalah bahwa tidak ada seorangpun yang tahu apakah layanan sosial adalah "baik" sampai ia telah dipelajari. Banyak percobaan menemukan bahwa campur tangan tidak lebih baik dari layanan standar --atau bahkan merusak. Oleh itu, perawat rumah penggantian percobaan tidak menunjukkan dampak positif dari skema ganti rugi. Bickman kajian intensif layanan kesehatan mental, yang termasuk semua secara profesional fashionable giring dan peluit, menunjukkan bahwa intensif layanan tidak memiliki hasil yang lebih baik daripada layanan reguler (Bickman 1996).
- Complexcity intervensi
Mungkin yang paling nyata hujah menentang percobaan. yangmereka menganggap bahwa intervensi yang sederhana yang dapat ditangkap dalam pengobatan/no-desain pengobatan. Banyak intervensi yang sangat kompleks interaksi sosial, dan mudah
causeand akibat pola mungkin tidak mudah terdeteksi. "Program"maukah anda berkunjung ke Norwegia sering diimplementasikan dengan cara yang berbeda oleh staf, dan hasil yang diinginkan adalah proses sosial yang tidak dapat dengan mudah diukur dengan mudah metrik. Mempelajari dampak dari psikoterapi, misalnya,_ voses segala masalah karena pada dasarnya cara irn whicr pribadi, ahli terapi bekerja dan klien menjawab. Tidak kira apa label satu dan akhiran apa saja untuk "merek" dari psikoterapi, atau bagaimana-tihan demi satu akan mencoba untukmelatih ahli terapi; untuk menggunakan prosedur yang sama, para kritikus berpendapat bahwa kuantitatif diacak studi tidakdapat menghasilkan hasil berakal.
Demikian juga, para pendidik sering mengatakan bahwa interaksi di dalam ruang kelas, seperti' pengenalan metode pengajaran baru, tidak dapat belajar dengan tepat oleh kuantitatif diacak teknik. Asumsi bahwa semua guru dilatih dalam baru akan menerapkan metode pengajaran secara konsisten,dan bahwa anak-anak di semua kelas akan bereaksi dengan cara serupa, mewakili sebuah kesalahpahaman mendasar dari variasi pembelajaran. Rejoinder tersebut adalah bahwa meskipun variasi, yang tentunya memperkenalkan lebih kesalahan pengukuran, besar sampel harus menunjukkan sejauh mana berartiskor (sosial berfungsi, prestasi matematika, kehadiran) berbeda, di seluruh populasi didedahkan dan tertutup untuk campur tangan. Dalam Cook (2001) kata-kata: "Ia tidak argumen terhadap penetapan acak untuk menyatakan bahwa beberapa sekolah yang kacau" yang;implementasi .ofa reformasi biasanya sangat bervariasi, dan bahwa pengobatan yang sama sekali tidaksetia kepada teori dasar mereka. Ada cukup konsistensi dalam perilaku manusia, experimentalists klaim, untuk mengizinkan sebuah percobaan untuk mencapai kesimpulan berharga tentang apakah yang layak untuk mengadopsi inovasi.
7. Kesimpulan
Kami memulai bab ini dengan descpription tiga sifat khas dari SEs: riset di bidang, couducted acak melalui penetapan calon penerima bantuan contoh untuk campur tangan dan kondisi kontrol,untuk tesr tht mungkin keberhasilan kebijakan intervension. Yang pertama adalah dua karakteristik semakin diterima sebagai realistis dan perlu. Penelitian di bidang sekarang telah menjadi amalan arus utama. Studi diacak telah menerima dukungan yang cukup tidak hanya dari penelitian masyarakat (walaupun beberapa peneliti, khususnya di bidang pendidikan, telah bermalam gigih mengulas sekiranya dia berselisih faham) tetapi juga di kongres. Misalnya, pendidikanstudi desain dengan diacak. Ia adalah yang ketiga fitur yang mungkin pada lagi sebagai bersemayam prospektif tes dari kebijakan alternatif.
SE menjadi terkenal pada akhir tahun 1960 pada saat goncangan perubahan kebijakan. Ia adalah sebahagian dari iklimdan inovasi reforn radikal yang melanda negara. Pada akhir tahun 1980 dan 1990, sebagai suku bunga dalam perubahan mendasar berkurang, keadaan percobaan juga berubah. Percobaan terus dilakukan, lebih banyak dari mereka dalam kenyataan, tetapi sumber yang lebih sedikit telah dikhususkan untuk mereka. Emhises yang berubah dari utama marjinal inovasi untukperbaikan dalam program yang telah ada. Dalam burtless, kata, mereka " lebih sempit" (1995,63). Sekarang, pada saat defisitanggaran fiskal dan stringenty di Amerika Serikat kaum 'Ad di tempat lain, kemungkinan dalam negeri inisiatif baru nampaknyarendah tidak saat besar ide baru akan diuji. Sekurang-kurangnya dana dengan pemerintah. Tren yang berlaku adalah untuk menguji minor modifikasi, preterably biaya-menyimpan perubahan, dan beralih dari kegiatan untuk sektor swasta. Jikaanda sedang mempertimbangkan intestment dalam skala besar SEs,nasihat kami akan: tahan. Produk adalah salah satu suara, dengan potensi tinggi, tetapi waktu tidak sekarang-sekurang-
kurangnya di Amerika Serikat. Tetapi hang. beberapa versi SEs akan memiliki hari mereka.
Kita juga mulai kisah kami dengan garis besar tiga tema- kompleksitas dari kebijakan tersebut dunia, teknis kompleksitas dari penelitian dunia, dan penjajaran atau misalignment antara penemuan eksperimental dan kebijakan pertanyaan. Secara keseluruhan, SEs telah menunjukkan kemungkinan dan batas-mempengaruhi kebijakan melalui penelitian ilmu sosial. Mereka telah memberikan kontribusi besar pengetahuan baru. Beberapa penemuan mereka u telah menyusup ke dalam arena kebijakan dan merupakan bagian dari kebijakan-berbicara (Anderson 2003; Weiss 1999): Influentials di Kongres, lembaga federal, organisasi internasional, kelompok kepentingan, dan media belajar mengenal dengan penemuan eksperimental untuk mengambil bagian dalam yang mengetahui, kebijakan percakapan.
Di sisi lain, tidak ada contoh dari sebuah SE yang dipimpin directlyy untuk perubahan kebijakan. Hasil-hasil percobaan asuransi kesehatan telah begitu lewat dan karyanya menceritakan tentang kekaguman hidup sebenarnya pada proposal legislatif yang mereka boleh dibilang, kecuali diabaikan oleh para ekonom, yang telah menggunakan mereka untuk model proposal baru. Rumah yang menyusu penggantian hasil percobaan juga terlambat, setelah zing telah keluar dari insentif gagasan. Hampir tak seorang pun masih tertarik dalam insentif bagi perawat rumah, tindakan tersebut di kawasan peraturan. Sementara banyak diterbitkan, .pendapatan pemeliharaan percobaan dipimpin untuk sedikit perubahan dalam kebijakan kongkrit. Kesejahteraan-untuk-kerja percobaan nampaknya telah akibat kebijakan. Studi Yang MDRC memberikan dukungan bagi wajib bekerja-pertama dan persyaratan menunjukkan kemampuan negara untuk desain dan mengelola program kesejahteraan mereka
sendiri. Ketiga-tiga aspek disain program ini pada akhirnya berakhir di dalam Keluarga Mendukung Tindakan dari tahun 1998.Tetapi seperti yang telah kita lihat, percobaan hanya diperkuat apa yang para pembuat kebijakan telah merencanakan untuk melakukan pada alasan lain.
Karena (membuat adalah seperti bisnis yang rumit, dengan begitu banyak pemain mengejar kepentingan di diantaranya berlainan, ia terlalu optimis untuk mendapatkan informasi riset untuk membawa hari. Bahkan yang berkualitas tinggi informasi disediakan oleh SEs tidak dapat menimpa semua kekuatan lain pada, adegan. Dan seperti yang telah kita lihat,penentuan masa SEs sering mati. Agenda kebijakan yang bergerakpada, sementara SE masih mempelajari tahun lalu proposal.
Namun, totting naik kelebihan dan kekurangan, kita akan keluar dalam kasih karunia dari percobaan lebih lanjut. Dunia adalah untuk direformasi pemahaman lebih besar akibat-akibat dari pemerintah .tindakan. Percobaan Sosial tidak dapat sepenuhnya memenuhi kebutuhan untuk pengetahuan tentang kebijakan hasil, sebagian karena intrinsik alam dari penelitian ilmu sosial dan sebagian karena pembatasan yang dilakukan oleh kondisi yang dilakukan. Ia masih membuat terbuka jalurnya. Apa pun yang rasionalitas kemajuan dalam sangatlah memilukan dunia dari kebijakan tersebut senilai mendukung. Tidak dihormati atau kowtowed, tetapi semangati.
Tetapi kita juga perlu moderat harapan kami dari sumbangan yang SE dapat membuat. Gagasan melandaskannya kebijakan ketat eksperimental bukti adalah salah-dipimpin. SE tidak memberitahukan segala sesuatu yang sebuah unit politik perlu tahu tentang pilihan kebijakan yang tertunda. Banyak pertimbangan yang lain telah masuk ke dalam tindakan pemerintah, seperti tuntutan populer, biaya, kemampuan tersedia untuk menerapkan kebijakan tersebut, kebutuhan
bersaing, efek pada kebijakan tetangga, dan seterusnya. Resolusi datang melalui politik. Walaupun perkataan telah jatuh YANG jahat: waktu, politik adalah systemm kami memiliki untuk mengatasi perbedaan-perbedaan dalam masyarakat kita: kompleks dan pengambilan keputusan yang paling tidak dapat diterima minimal untuk semua pihak (untuk yang luar biasa daripenegasan politik, lihat Crick 1972),
bukti hasil dari dunia politik tidak dapat dan seharusnyatidak semakan bermain politik sebagai dasar kebijakan. Tentu saja, kita tidak ingin melihat kebijakan dikembangkan berdasarkan. dari kesalahan pemahaman mengenai situasi atau unrealisticc harapan untuk efek dari tindakan, tetapi ia tidaktampak sombong untuk berpikir bahwa data eksperimental saja dapat menunjuk ke terbaik dari kompleks resolusi masalah kebijakan. Sejarah hal, seperti budaya politik dan praktik kelembagaan. Apa Yang SE dapat lakukan adalah menerangi pemahaman masyarakat dan para elit dan suntikan dana pohcy diskusi dengan pengertian. Scienceand politik galinya adalah kebijakan lingkungan, tetapi persekutuan mereka adalah sebuah gelisah. Para ilmuwan sosial, untuk menempatkan terbaik menghadapi pada hubungan, telah menunjuk ke "nilai tambah" fitur yang ilmu sosial 'membawa 'ke meja inventarisasi pengetahuan untuk masa depan untuk menarik pada, secara umum pencerahan oleh para elit dan masyarakat di masa sekarang, puncturing-mengidentifikasi asumsi, dan konfirmasi-naluri bijaksana untuk tindakan." Tetapi untuk semua pemahaman dan pengertian kontribusi sosial sciencesand oleh SEs khususnya-mereka melakukan atau satu titikpun menjalankan menunjukkan. Ada tidak dapat dihindari ketegangan antara ilmu dan politik, dan konvergensi biasanya kecelakaan yang bahagia.