Evaluating government training programs for the economically disadvantaged

American Economic Association

Evaluating Government Training Programs for the Economically DisadvantagedAuthor(s): Daniel Friedlander, David H. Greenberg and Philip K. RobinsSource: Journal of Economic Literature, Vol. 35, No. 4 (Dec., 1997), pp. 1809-1855Published by: American Economic AssociationStable URL: http://www.jstor.org/stable/2729880 .

Accessed: 11/11/2014 11:06

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof Economic Literature.

http://www.jstor.org

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=aea

http://www.jstor.org/stable/2729880?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Journal of Economic Literature Vol. XXXV (December 1997), pp. 1809-1855

Evaluating Government Training Programs for the Economically

Disadvantaged

DANIEL FRIEDLANDER MDRC

DAVID H. GREENBERG University of Maryland, Baltimore County

and

PHILIP K. ROBINS University of Miami

This paper has benefitted greatly from helpful comments on an earlier draft by Stephen Bell, Howard Bloom, David Card, Judith Gueron, Robert LaLonde, Robert Moffitt, Ernst Strornsdorfer, and three anonymnous referees. For support in the preparation and production of this paper, the authors gratefully acknowledge the funders of MDRC's Public Policy Out- reach project: the Ford Foundation, the Amnbrose Monell Foundation, the Alcoa Foundation, and the James Irvine Foundation. The findings and conclusions presented in this article do not necessarily represent the official positions or policies of the funders.

I. Introduction

IN AN EFFORT to increase the earnings of low-income individuals-that

is, poor or near poor persons-who have ended their formal education, federal and state governments fund a number of training programs. Since the 1960s, these programs have been seen as instruments for combating poverty. In- terest in them has heightened recently, fed by concerns about rising income inequality and falling real earnings among workers with limited skills. Such programs are also viewed as integral to recent efforts to reform a welfare svstem

that is widely perceived as discouraging work. 1

Evaluations of the effectiveness of government training programs for the economically disadvantaged have been accumulating for more than three decades. Additionally, in the past decade, a rapidly expanding literature has focused on the methodology of training program evaluation. The time is ripe for collect-

1It is also sometimes argued that by better matching workers to job vacancies, government training programs can improve macroeconomic tradeo fs between inflation and unemployment (Malcom Cohen 1969; Daniel Hamermesh 1972; and Lawrence Katz 1994).

1809



1810 Journal of Economic Literature, Vol. XXXV (December 1997)

ing and interpreting the empirical findings from this literature and for assessing the status of evaluation methodology.

The broadest generalization about the current knowledge of government training programs for the disadvantaged is that they have produced modest positive effects on employment and earnings for adult men and women that are roughly commensurate with the modest amounts of resources expended on them. The positive effects for adults are not large enough to produce major aggregate effects on employment and earnings among low-income target groups, and the programs have not made substantial inroads in reducing poverty, income inequality, or welfare use. Moreover, they have failed to produce positive effects for youth.

In this article, we investigate the methodological foundations and empirical support for this view, suggest possible modifications of it, and identify potentially fruitful areas for future research by economists. We argue that, despite a large number of evaluations of government training programs and the development of a variety of sophisticated evaluation methods, considerable uncertainty remains about the kinds of training that work best, the effectiveness of training for certain demographic groups, and the appropriate policies for increasing aggregate program effects.

The remainder of this article is orga- nized as follows. Section II describes the activities sponsored by government training programs and the rationale behind government funding for them. A brief sketch of government training programs for the economically disadvantaged in the United States is presented in Section III, followed by a discussion of the theory of training program evaluation in Section IV. The next three sections constitute the core of our essay: Sections V and VI cover methodology

and findings for voluntary and mandatory training programs, respectively, and Section VII examines the broader implications of the findings for society. We conclude in Section VIII with our assessment of what we do and do not know from previous training program evaluations, and offer an agenda for future research.

II. Training Program Activities and Economic Rationale

Although the mix of activities in government training programs changes over time and differs from one program to the next, all training programs include one or more of the following: remedial education in reading and math, vocational training in specific occupational skills, subsidies paid to private sector employers to hire program participants for a specified period of time in order to provide them with on- the-job training, short-term subsidized "work experience" positions (paid or unpaid) at government or nonprofit agencies to give participants an opportunity to build an employment record and ac- quire general work skills, and job search assistance (including training in resume preparation and interviewing, help in job finding, and direct job placement). In addition, financial support, child care, personal and career counseling, and expense reimbursements during training are sometimes provided. (Burt Barnow, 1989, provides a more detailed discussion of the different kinds of training and support services.)

Space constraints prohibit us from covering all the programs that provide the services listed above. As already indicated, we focus more narrowly on those training programs that are primarily targeted at economically disadvantaged persons who are no longer in school. We do not examine evaluation



Friedlander, Greenberg, and Robins: Evaluating Training Programs 1811

findings for programs targeted mainly at the elderly, persons with disabilities, persons still in school, and dislocated workers. Notwithstanding the exclusion of these groups, the methodological discussion in this article applies to evaluations of most government training programs, regardless of their target population. We do not cover job creation programs in this article because their main objective is not to increase unsubsidized employment.2 We also ex- clude policies that affect earnings or hours of work through nontraining mechanisms (e.g., wage subsidies and tax credits).

Why is it desirable for the government to provide training for the economically disadvantaged? The answer is not obvious because many training opportunities are available elsewhere. For example, private sector employers are a major source of on-the-job training, even in the absence of government subsidies. In addition, classroom training of several kinds-Adult Basic Educa- tion, General Educational Development (GED) preparation, and vocational education-is available through community colleges and adult education schools, and low-income persons can obtain some financing for certain activities through federal Pell grants without assistance from training programs. In fact, training programs often provide classroom instruction by sending program

enrollees to the same schools and the same courses used by persons not enrolled in training programs. Similarly, job search assistance is available outside training programs for those who seek it.

Given the wide availability of alternative sources of services similar to those provided by training programs, the major economic rationale for funding training programs revolves around as- sertions of market or institutional failure. Low-income people may not have the capital resources to invest in certain kinds of training, such as classroom vocational training, that are not usually subsidized outside government training programs. Their access to private financing may be limited by a lack of col- lateral and a high risk of default. Public training programs may also be justified as compensating for inadequacies in the public education system or as providing a second chance to those who prema- turely terminate formal schooling because of imperfect foresight or a high subjective rate of time preference. The economic rationale for these programs may also hinge partly on the existence of imperfect information about available training opportunities and their likely returns. That is, government programs may serve as a training "broker," guiding program enrollees into activities that yield the highest payoff for them. Given the presence of market or institutional failure, it is possible for the social rate of return to government investment in training for low-income individuals to be quite high. Even if the social rate of return is below the market rate of return, however, using public funds to support training programs for the economically disadvantaged may still be more efficient than using such funds to provide direct transfers to the poor or to operating alternative programs intended to decrease poverty. None- theless, as discussed below, program op-

2 Job creation programs are substitutes for regular employment. Examples include the Works Projects Administration during the 1930s and the Public Service Employment component of the Comprehensive Employment and Training Act during the 1970s. The work experience activities used in training programs differ from job creation because they are of limited duration, often do not pay wages, are for the stated purpose of allowing the participant to build an employment record and general work skills that will be of value in finding regular, unsubsidized employment, and are not viewed by training program administrators as a substitute for regu1ar employment.




erators cannot easily monitor the effectiveness of their efforts, and it is therefore possible for the returns to public funds expended on government training programs to be negligible.

Two additional goals of government training programs for low-income individuals are to reduce government welfare expenditures and to increase the time that welfare recipients spend working. To achieve these objectives, increasingly stringent requirements to participate in training programs have been imposed on welfare recipients in recent years. Still, many "welfare-to- work" program administrators see participation requirements as a way of se- curing participation in programs aimed primarily at increasing income and reducing poverty rather than reducing welfare payments and increasing work. Regardless of the ultimate objective, the rationale for participation requirements is obvious-namely, to secure cooperation by welfare recipients who may not see training or employment as being in their immediate best economic interests.

III. Training Programs for the Economically Disadvantaged

Government training programs may be broadly classified into two basic categories: voluntary and mandatory. Voluntary programs provide training for individuals who apply for them and meet certain criteria of need, such as having income below a certain level or lacking a high school diploma. The first major post-World War II national voluntary training program for the disadvantaged in the United States was funded in 1962 under the Man- power Development and Training Act (MDTA). Although initially enacted to retrain technologically dislocated workers, MDTA soon shifted resources to-

ward serving economically disadvantaged persons, reflecting new priorities established by the 1964 Economic Op- portunity Act. In 1964, also, the Job Corps was created. The Job Corps, which still operates today, provides training for disadvantaged youth at 110 urban and rural residential centers throughout the United States. Since its inception, the Job Corps has served more than 1.7 million youth.

In 1973, MDTA was replaced by new legislation, the Comprehensive Employ- ment and Training Act (CETA). CETA differed from MDTA in two important respects. First, states and local governments were given authority to operate training programs using grants from the federal government. Second, CETA had a job creation component, "public service employment" (PSE), that grew quite large during the Carter administration.

As a result of charges that PSE was corrupt and mismanaged, along with a desire to have the private sector play a bigger role in the operation of training programs, CETA was replaced during the early years of the Reagan administration by the Job Training Partnership Act (JTPA), passed in 1982. JTPA elimi- nated the PSE component of CETA, but enhanced its decentralized administrative structure. JTPA, which currently serves close to one million economically disadvantaged persons annually, remains the principal voluntary national training program today for the disadvantaged. Like MDTA and CETA, JTPA also provides separate funding for training persons who are not classified as economically disadvantaged.

Mandatory training programs are directed at public assistance recipients. In this article, we examine mandatory programs directed at welfare recipients, including recipients of the former Aid to Families with Dependent Children




(AFDC) and Food Stamps.3 A program's mandatory nature stems from its statutory authority to penalize or "sanc- tion" recipients who do not cooperate by reducing (or in some cases terminat- ing) their welfare payments. Mandatory training programs for welfare recipients were first established in 1967 under the Work Incentive (WIN) Program. Under WIN, participation could be required of heads (mostly female) of single-parent AFDC families without preschool-age children and by the much smaller number of heads (mostly male) of two-parent AFDC-U families. In practice, WIN was never given enough funding to es- tablish an effective mandate for more than a small minority of the targeted population. The 1981 Omnibus Budget Reconciliation Act (OBRA) allowed states additional options and flexibility in designing mandatory training and work programs for welfare recipients. These programs became known as "welfare-to-work programs." OBRA and sub- sequent legislation, together with a growing political momentum toward welfare reform, stimulated a number of states to experiment with the design of welfare-to-work programs and to strengthen the requirements to participate in them. In 1988, the Family Sup- port Act (FSA) was passed, replacing the WIN program with the Job Oppor- tunities and Basic Skills Training (JOBS) Program. JOBS expanded the mandatory population to include single- parent AFDC recipients with preschool-age children (down to age three or, at state option, to age one), esta,b- lished minimum-participation-rate tar- gets for states, increased the grant reduction penalties for nonparticipation, and, for the first time, committed fed-

eral funds to education in welfare-to- work programs. A modestly funded Food Stamp Employment and Training Program was authorized in 1985 and became fully operational in 1987.

The distinction between voluntary and mandatory training programs is not considered meaningful by all policy analysts. Most of the program activities in the two kinds of programs are similar and the institutions providing the training can be the same. Enforcement among mandatory programs is often downplayed by local program administrators, making participation seem voluntary. Additionally, the target populations partially overlap: for example, a significant proportion of JTPA participants are welfare recipients, and some of them are there in fulfillment of a JOBS program participation obligation. Nonetheless, despite the similarities, we believe the differences are sufficient to warrant separate treatment in this article. For one, the evidence suggests that pressure to participate or to work has been increasing in recent years for enrollees in mandatory programs in many states. In addition, only mandatory programs purport to have a direct effect on some nonparticipants, whether through financial sanctions or simply by prompting some people to find a job in order to avoid what may be perceived as an onerous participation requirement. As discussed later, the possibility of direct program effects on nonparticipants under mandatory programs imposes restrictions on the way results from studies of these programs can be interpreted.

According to the United States Gen- eral Accounting Office (1995), in fiscal year 1995 just over $3.8 billion was appropriated for services to the economically disadvantaged under JTPA Title IIA (disadvantaged adults, $947 million), JTPA Title IIC (disadvantaged

3 AFDC was replaced in August 1996 by legislation that create, welfare block grants to states (Temporary Assistance to Needy Families, or TANF).




youth, $260 million), JOBS ($1.3 billion), Job Corps ($1.1 billion), and the Food Stamp Employment and Training Program ($165 million). An additional $10.4 billion was appropriated for services under a variety of other "training" programs, although only a portion of this amount was for the disadvantaged and for services traditionally defined as out-of-school training. In total, the $14 billion of training expenditures (broadly defined) constituted less than 0.2 percent of Gross Domestic Product. The states also spend modest amounts on training programs and, like the federal government, state programs serve other groups in addition to the disadvantaged. Thus, as a fraction of national income, funds expended on training the disadvantaged are very small.

IV. Theory of Training Program Evaluation

A complete theory of program evaluation would specify the information required and the appropriate decision rule to apply in using that information to advise policy makers on the desirabil- ity of increasing or decreasing the scale of a particular training program. Econo- mists have done surprisingly little work toward developing a complete theory of training program evaluation.4 More-

over, the information provided by current training program evaluations is quite limited. Nearly all training program evaluations are "black box," indicating only whether a particular program "works," on average, for a particular sample under a particular set of circumstances (including labor market conditions and service delivery sys- tems). Such information, although useful, may not be readily generalizable to other programs, circumstances, or populations. Indeed, a major recent criti- cism of the evaluations of the past 30 years is that they have failed to contrib- ute to the accumulation of knowledge because, it is alleged, they have not systematically gathered empirical information under the guidance of a broad theoretical framework (James Heckman 1992; Charles Manski 1995; Heckman and Jeffrey Smith 1995). Ideally, if the parameters of an underlying structural model could be estimated, policy makers would be able to identify the most effective policy option under differing circumstances and would be able to pre- dict its outcome (see Manski and Irwin Garfinkel 1992a; Heckman 1992; and Greenberg, Robert Meyer, and Michael Wiseman 1993, 1994). Such estimation, however, has never been accomplished.5

In this article, we distinguish the effects of training programs on three suc- cessively inclusive groups: program participants (those who actually receive the training services), the broader target population eligible for the program (participants and nonparticipants), and

4 The principal contribution has been to apply a Bayesian decision-theoretic framework, with a loss function derived by positing a prior distribution of belief about the effectiveness of a training program (Frank Stafford 1979; Gary Burtless and Larry Orr 1986). One formal model implies that evaluation information is most valuable when (a) the aggregate program benefits are expected to be large, (b) aggregate program benefits minus aggregate costs are close to zero, and (c) prior opinion about the magnitude of the benefit minus cost difference is uncertain. Seemingly minor changes in assumptions can, however, yield diametrically opposite conclusions about what kind of evaluation research is most valuable. Moreover, potentially important extensions of this approach, such as in- corp orating sequential evaluation and decision ma in and accounting for the value of long-run

increases in basic knowledge (John Conlisk 1979), have not been well explored in the training program literature.

5 One reason for this is that a very large number of evaluation sites would be required to provide sufficient variation in program designs, chent characteristics, and local environmental conditions to estimate the parameters of a structural model with precision (Greenberg, Meyer, and Wiseman 1993).




society (participants, nonparticipants, and ineligible individuals). Our discussion of voluntary programs focuses on effects on participants. In the voluntary program evaluation literature, analysis centers on participants because nonparticipants are, presumably, unaffected by the program. We bring in effects on program-eligible nonparticipants in our review of mandatory programs. In mandatory programs, nonparticipants may experience program effects resulting from penalties for nonparticipation or from changes in behavior to avoid participating in the training program (a "deterrent" effect). Finally, at the societal level, we discuss training program effects on persons ineligible for the program. The behavior of some ineligibles is affected if they attempt to become eligible for the training program (for example, by lowering their income). This is termed an "entry" effect (Moffitt 1992, 1996). Others could lose jobs in competition with training program graduates. This is termed a "displacement" effect (Hamermesh 1972; George Johnson 1979). Because the literature on social effects is scant and mostly theoretical, we only briefly summarize it.

V. Evaluating Voluntary Training Programs

A. Estimating the Effects of Participating

The behavior of participants in a voluntary training program can be de- picted by the following model:

Yit = ctXi + btPio + uit, t > 0, (la)

Pio = aoZi + eio. (lb)

In this model, Yit is the outcome of interest (say, earnings) for the ith person in period t, where t = 0 is the period in which the training occurs; Xi and Zi are sets of (perhaps overlapping) exogenous factors and personal characteristics for individual i (usually measured before the

program begins);6 Pio is a binary variable, with zero indicating no participation in training program activities and unity indicating participation;7 and uit and e1o are random error terms. In this formulation, the mean effect of training program participation in period t is bt, which may vary over time.8 Equation (lb) is sometimes called an index function (Heckman and Richard Robb 1985) or a propensity score (Heckman and V. Joseph Hotz 1989),9 to denote that the decision to participate in training program activities may be made by a program administra- tor, a prospective trainee, or both.

The fundamental strategy for estimating bt is to compare a sample of persons who receive services from a training program that is being evaluated with a sample of persons who do not. This comparison between participants and nonparticipants is made in two basic ways: a nonexperimental approach and an experimental approach. Under certain conditions, the nonexperimental approach is adequate to yield an unbiased estimate of bt. When such conditions do not hold, the experimental approach is an alternative.

6 In principle, X, and Zi can be measured during or after the program, but if this is done, they may be affected by participation and, hence, be endogenous. It is assumed that X, and Zi are uncorrelated with uit and e,o, respectively.

7 Ideally, it is desirable to differentiate among several kinds of training activities, by defining Pio as a vector, and to account for the level and intensity of participation by defining Pio as a continuous variable. For one recent attempt to do the latter, see Louis Jacobson et al. (1994).

8 In practice, bt is also often allowed to vary over certain kinds of individuals (subgroup analysis). In addition, evaluators have recently become interested in examining effects on the entire distribution (as opposed to the mean) of the outcomes. See, for example, Anders Bjorklund and Moffitt (1987), Nancy Clements, Heckman, and Smith (1994), Manski (1995), and Friedlander and Robins (1997).

9 The term "propensity score" comes from the statistics literature (Donald Rubin 1973; Paul Rosenbaum and Rubin 1983).




1. Nonexperimental Evaluations. Non- experimental evaluations usually involve the selection of a comparison group by the evaluator. The comparison group is intended to provide a counterfactual for the program participant group. Com- parison groups for estimating training program effects for participants have been variously drawn from among applicants who dropped out or were turned away without receiving program services, target group members who did not apply, individuals outside the geographic area covered by the program, and nonparticipants drawn from national micro- data sets. Training program participants have also served as their own comparison group in periods prior to participating; that is, their pre- and post-program behavior is compared.10

In a nonexperimental evaluation, if E(Piouit) = 0, an unbiased estimate of bt can be obtained by regressing Yit on Xi and Pio. There is no guarantee, however, that this condition will hold in practice. In fact, if Yit is earnings, it is quite possible that Pio and uit are posi- tively correlated. This would occur, for example, if more motivated individuals chose to participate in the training program and motivation was not captured in the X variables.

Correlation between Pio and uit can arise in two ways, through Zi or through eio. If E(Ziuit) ? 0, but E(uiteio) = 0, there is selection on observables (Heck- man and Robb 1985; Heckman and Hotz 1989). What this generally means in practice is that program administrators are selecting applicants for a program on the basis of a set of known characteristics. For example, persons might be admitted into a program if they have dropped out of high school, or if they are unemployed, or if they

satisfy a ranking based on a set of observable characteristics. In the case of a simple linear specification of equation set (1), Barnow, Glen Cain, and Arthur Goldberger (1980) show that when there is selection only on observables, consistent effects of the program can be obtained by including the selection variables, Zi, as regressors. Linearity may not be a very good assumption, however, and, as described below, a number of fairly sophisticated methods have been proposed to deal with the general problem of selection on observables.

If E(Ziuit) = 0, but E(uiteio) ? 0, there is selection on unobservables. This is a much more serious problem than selection on observables because solutions require strong (largely untestable) behavioral assumptions, complex nonlinear estimation models, or difficult-to-obtain data. Selection on unobservables can occur when individuals are prompted to participate in program activities by some underlying factor, such as motivation, that is difficult to measure. Selec- tion on unobservables can also occur if program administrators use subjective or objective criteria to select program participants, and their ratings of individuals are not recorded.1"

a. Addressing the Problem of Selec- tion on Observables. Early nonexperimental evaluations of voluntary training programs implicitly assumed that selection into the program was based on observables (see Goldberger 1972; and Cain 1975, for discussions). One ap-

10 An overview of nonexperimental evaluation methods is given in Moffitt (1991) and Bell et al. (1995).

11 To a degree, making the distinction between selection on observables and selection on unobservables is artificial (and perhaps even mislead- ing), because unobservables may represent mainly factors that are difficult, but not necessarily im- possible, to measure. Heckman and Smith (1995) argue that the collection of better data can mini- mize (and, perhaps, even eliminate) problems caused by se ection on unobservables, thereby reducing the evaluation problem to one of finding an appropriate method Eor controlling for selection on observables.




proach taken in several early nonexperimental studies was to use an "internal" comparison group (for example, nonparticipating training program applicants) to draw inferences about the effects of a program (Michael Borus 1964; Cain 1968; Ernst Stromsdorfer 1968; Thomas Cooley, Timothy McGuire, and Edward Prescott 1979). It was thought that internal comparison groups were appropriate because these individuals pos- sessed many of the same characteristics as participants.

The use of internal comparison groups never achieved great popularity because it was quickly recognized that nonparticipants are likely to be quite different from participants by virtue of the fact that they have chosen not to participate or have been excluded by program staff. Recently, Bell et al. (1995) have proposed using a variant of this approach, based on the "regression discontinuity" model, to evaluate a training program for welfare recipients. 12

They argue that "screened out" applicants-those excluded because of decisions made by an intake staff-by defi- nition differ from participants only on factors (both objective and subjective) observable to staff. Their regression discontinuity approach attempts to control fully for these differences using intake workers' ratings of applicant potential.

Whether the Bell et al. study will stimulate more nonexperimental evaluations utilizing internal comparison groups remains to be seen. One alternative that has proven quite popular for a number of years utilizes "external" comparison groups, consisting of a sample of individuals whose observed characteristics resemble those of program par-

ticipants, but are drawn from a different source (often a national data base, such as the Current Population Survey or the Panel Study of Income Dynam- ics, or special samples from geographic areas that have not implemented the program). The use of external comparison groups became prevalent in evaluating the effects of the MDTA and CETA programs in the 1970s and 1980s (Orley Ashenfelter 1978; Barnow 1987), and are still being used in the 1990s (Sharon Long and Douglas Wissoker 1995; Heckman, Smith, and Christopher Ta- ber 1994; and Rajeev Dehejia and Sadek Wahba 1995).13

The use of external comparison groups sometimes involves searching for a group of individuals who are matched statistically to members of the program group. One procedure, known as "cell matching," was used by Edward Bryant and Kalman Rupp (1987), among others, to evaluate the CETA program. Under this procedure, subgroups of individuals are created based on certain observed characteristics (such as age, education, and race) and are then matched to other individuals with the same characteristics. Another procedure, known as "distance function matching," matches individuals based on a weighted function of observed characteristics. The first application of distance function matching in the training program evaluation literature was by Katherine Dickinson, Terry Johnson, and Richard West (1986, 1987a) in their evaluation of the CETA program. Their application was based on Rubin's (1979) "nearest neighbor" technique. Under

12 The "regression discontinuity" model was first proposed as an evaluation model in the field of education by Donald Thistlethwaite and Donald Campbell (1960), but has received scant attention in the training program evaluation literature.

13 Some evaluators claim to have solved the problem of selection on unobservables by using external comparison groups. External comparison groups are usually chosen on the basis of a set of observed characteristics, however, and, hence, technically their use addresses only the problem of selection on observables.




this technique, the Mahalanobis distance is calculated between a training program participant and each potential comparison group member,'4 and then a match is accepted for the pair with the smallest distance between them.15

Recently, a variation on statistical matching has been proposed by Dehejia and Wahba (1995) and lieckman et al. (forthcoming). Based on the methodology developed by Rosenbaum and Ru- bin (1983), these authors use the propensity score as the matching variable. The propensity score suinmarizes the information in a set of observable variables into a single index function. Treatmrent group and comparison group observations with similar propensity scores (simnilar predictions of being in the treatmnent group) are considered good matches for each other.'6 Dehejia and Wahba (1995) argue that the propensity score method can serve as a good approximation to a wide variety of linear and nonlinear econometric re- sponse functions when there is selection on observables.

Statistical matching is to be distin- guished from econometric matching. Ecoinometric mnatching denotes the standard behavioral modeling techniques and specification tests that use observed characteristics as regressors, without necessarily restricting the composition of the estimation sample through

the use of matching. As noted by Fried- lander and Robins (1995), statistical matching and econometric matching with the same data set can produce very similar estimates of program effects. Essentially, both methods adjust estimates of program effects for the influence of a given set of observable covari- ates. Thus, they differ only in the way they specify the functional relationship between the observed characteristics and the relevant outcome variable.

b. Addressing the Problem of Selec- tion on Unobservables. A number of early training program evaluation studies proposed methods for dealing with selection on unobservables. All of these studies make certain assumptions about the nature of the dependence between uit and Pio (or uit and eio). One of the first to propose an econometric solution was Ashenfelter (1978), who hypothe- sized an "autoregressive" model of the earnings generation process based on a simple model of human capital investment. In Ashenfelter's model, preprogram earnings histories play a crucial role in estimation.17 The key assumption in Ashenfelter's model is that earnings contains an unobserved "fixed effect" that can be accounted for in estimation by making appropriate transfor- mations of the earnings outcome data'18

14 The Mahalanobis distance is given by (X1 - X2)'S-'(X1 - X2), where XI and X2 are column vectors of the matching variables for two observations and S is the covariance matrix of the matching variables.

15 In the typical application of the nearest neighbor technique, comparison group members are sampled wAithout replacement; that is, once an observation is selected it is not used again. Thus, the results produced by the technique are not in- variant to the order in which the data are sorted for matching.

16 Dehejia and Wahba (1995) propose using good" matches muore thaii once (in effect, match-

ing them to more than one program group member).

17 Ashenfelter noted that trainees tend to suffer a sharp decline in earnings just prior to program entry. This "preprogram dip"undoubtedly reflects adverse economic circumstances that are, at least, partly responsible for the individual's decision to enter training. Knowing whether the preprogram level of earnings is transitory or permanent is critical for developing an appropriate statistical model to account for selection on unobservables.

18 For example, in the case of a simple fixed effect, where u,t = ik and e,O = jt (i.e., the correlation between u,t and P0O arises from a common component, RI,), an unbiased estimate of the program effect can be obtained by estimating a first difference model for earnings (see Barnow 1987, for details). A transitory "earnings dip," however, will make the results sensitive to the base year used to construct the first difference model.




Ashenfelter found that training effects tend to decay over time, an aspect of his results that stimulated a number of other papers, including a lively ex- change between Bloom (1984b) and Laurie Bassi (1987). Bloom argued that Ashenfelter failed to correct for a time- varying bias in the fixed effect model, and that when such a correction is made the effects of training do not decay over time. Bassi countered that Bloom's estimates were no more credible than Ashenfelter's because they, also, were based on a set of strong assumptions about the nature of the selection on unobservables. Debates about nonexperimental evaluations such as these rarely produce a winner because there is no clear-cut way of determining whose assumptions are valid.

In addition to the fixed-effect model, two other methods have been used in the training program evaluation literature for addressing selection on unobservables. One is instrumental variables (Heckman and Robb 1985; Joshua An- grist, Guido Imbens, and Rubin 1996). In effect, equation (lb) is estimated and is used to construct an instrument for Pio that is uncorrelated with uit. Al- though the use of instrumental variables has sometimes proved successful, it has not been popular because of difficulties in finding an appropriate instrument- that is, a variable that influences Pio but not Yit.19 In addition, as discussed by Heckman (1996), the instrumental variable method has important limitations when the program effect varies across people (so-called random coefficient models).

Another method, proposed by Bar- now, Cain, and Goldberger (1980) and

based on the procedure developed by Heckman (1978) to deal with censored samples, is based on the assumption that uit and eio are jointly normally distributed. As in the case of instrumental variables, equation (lb) is estimated, in this instance using a probit model. An appropriate Mills ratio adjustment term is then constructed as a weighted average of predicted Pio and 1-Pio and is included as an additional variable in equation (la), which is then estimated using conventional regression techniques. This two-step method avoids the need for an instrument by relying on the functional form of (lb) for identification, but suffers from low reliability associated with specification uncertainty and low vari- ability in the newly constructed adjustment term.

2. Experimental Evaluations. Be- cause training program participation contains a large unexplained component, both instrumental variables and two-step procedures tend to produce statistically imprecise estimates of the effect of training. In addition, a number of studies (Bassi 1983, 1984; Ashenfel- ter and Card 1985; LaLonde 1986; Thomas Fraker and Rebecca Maynard 1987; LaLonde and Maynard 1987; and Friedlander and Robins 1995) find that these and other nonexperimental procedures designed to deal with the problem of selection bias can produce widely varying estimates of program effects, often quite different from experimentally based estimates from the same data set, depending on the assumptions made about the nature of the dependence between uit and po.20 Rec- ognition of this sensitivity made experimental evaluations of training pro-

19 Perhaps one of the more successful applications of the instrumental variable technique was performed by Charles Mallar (1979), who used proximity to the training site as an instrument for participating in the Job Corps program.

20 LaLonde (1986) was the first to develop procedures for assessing nonexperimental estimators using experimental data.




grams more popular in the 1980s and 1990s.21

In contrast to nonexperimental evaluations, experimental evaluations are based on random assignment of individuals into a treatment (or "program") group and a control group. Randomiza- tion is intended to produce zero correlation between Pio and Xi, between Pio and Zi, and between Pio and uit, so that E(PioXi) = E(Piouit) = E(uiteio) = ao = 0. In an ideal experiment, only members of the treatment group participate in program activities and their participation rate is 100 percent. An unbiased estimate of the effect of the treatment, bt (the effect of training versus no training), can therefore be obtained by simply taking the difference between mean earnings of the treatment and control groups. In practice, equation (la) is usually estimated using ordinary least squares. The Xs are included to increase the statistical precision of the estimates, but such gains are usually small. As discussed later, this ideal experiment is never fully realized in actual evaluations because not all members of the treatment group participate in training and some members of the control group seek out and engage in training similar to that provided by the program being evaluated.

Moreover, even an "ideal" experiment has inherent limitations. Although it provides an unbiased estimate of the

average program ettect per participant, it cannot provide unbiased estimates of the distribution of program effects across participants. Thus, an experiment cannot estimate the percentage of participants who actually benefit from a training program (Clements, Heckman, and Smith 1994; Manski 1995). As Heckman and Smith (1995) point out, experimental estimates of the "average effect per sample member" cannot distinguish between two possibilities: (a) most people gained about the average, and (b) a few people gained much but most gained nothing or perhaps even lost. In addition, experimental data can provide unbiased estimates of the effects of training programs on employment rates, but not on effects that are conditional on employment status, such as the effects of training on the hazard rates of entry into or exit from employment (Heckman and Smith 1995, John Ham and LaLonde 1990, 1996; Card and Daniel Sullivan 1988) or on hourly wage rates or weekly work hours. All of these limitations, however, apply equally to the common nonexperimental techniques. Thus, because the experimental approach solves the basic problem of selection bias in estimating mean treatment effects, it has come to be seen by many analysts as the more attractive method of program evaluation.

Given the attractiveness of experiments, it may seem surprising that nonexperimental evaluations of training programs continue to be conducted. One reason, discussed below, is that experimental evaluations may be inappro- priate if a program is expected to have large community or macro effects or large entry effects. In addition, experiments incur costs for implementing and monitoring randomization. These additional research costs must be weighed against the costs of the misallocation of

21 An advisory panel that convened to make recommendations concerning how to evaluate the JTPA program concluded that an experimental evaluation was the preferred method (Stromsdor- fer et al. 1985). Similar conclusions regarding the preferability of experimental evaluations of training programs are reached by Ashenfelter and Card (1985), Burtless and Orr (1986), Barnow (1987), and Burtless (1995). Eventually, the federal government decided to fund an experimental evaluation of JTPA, with an associated nonexperimental research component. The earlier generations of federal training programs (MDTA, CETA) had been evaluated exclusively using nonexperimental methods.




social resources from decisions based on less reliable nonexperimental designs. Finally, nonexperimental evaluation may, in some cases, produce results faster than an experiment. For evaluations of ongoing training programs, data on earnings and welfare receipt can often be obtained retrospec- tively. Data collection and statistical analysis can therefore be completed relatively quickly following the start of the evaluation, as long as researchers are interested only in a program as it operated in the immediate past. Experi- mental evaluations typically take much longer to complete, because the period of evaluation must include up to a year or more of sample intake during random assignment and a two- to five-year follow-up period plus the time required for data collection and analysis. On the other hand, experimental evaluations of new training programs are not necessarily more time consuming than nonexperimental evaluations if the evaluation is initiated as soon as the program begins. As Orr et al. (1996) suggest, even with an ongoing program, much of the relatively higher cost and lack of timeli- ness of an experimental evaluation can be overcome if a fraction of program applicants are assigned to a control group on a continuing basis.

3. The External Validity of Estimated Program Effects. A critical issue in the evaluation of training programs is "external validity." External validity refers to the extent to which estimated program effects can be generalized to different locations and populations, to different time periods, and to different variants of the program being studied. The external validity of specific estimates of program effects may be ques- tioned for a number of reasons. Most of these have been offered recently as criticisms of experimental evaluations (Heckman and Smith 1995), but nearly

all apply as well to nonexperimental evaluations.

First, and most obviously, social attitudes, government institutions, the business cycle, the relative demand for un- skilled and skilled labor, and other relevant factors may change in the years following an evaluation. Likewise, different locations may have dissimilar trainee characteristics, social attitudes, state and local government institutions, labor market conditions, and so forth.

Second, training program evaluations are often performed at a small number of sites that are rarely selected randomly, raising questions about how well they represent administrative capacity and other unobservables for the uni- verse of sites (see Heckman and Smith 1995; Hotz 1992; and Heckman 1992). Difficulties in obtaining a representative sample of program sites are especially acute when the cooperation of local administrators is essential and they cannot be compelled to participate in the evaluation. It has been argued, for example, that sites participating in the recently completed national evaluation of the JTPA program are not representative of all JTPA sites because participation in the experiment occurred in only a few of the sites from which it was sought (Heckman and Smith 1995) 22 The evaluators (Orr et al. 1996) argue that the participating sites are representative, judging by observable characteristics. An important area of future research is to determine the degree to which site selectivity translates into bias in generalizing the estimated effects to other sites.

Third, external validity may be com- promised by "scale bias." Training pro-

22 A standard argument is that only sites operating superior programs will agree to an evaluation. There may be, however, only minimal correlation between local operators' self-appraisals and the results of a rigorous third-party evaluation.




gram innovations are often tested a small demonstrations or pilot programs Even ongoing programs such as JTPI do not achieve universal participatioi within the program-eligible population Manski and Garfinkel (1992a) and Gar finkel, Manski, and Charles Michalopou los (1992) suggest that scaling up tc universal participation could chang( community norms or combine with pat terns of social interaction or informa tion diffusion in ways that will fee( back and influence the success of thc policy innovation. These community OI "macro" effects, they argue, will be ab. sent in small-scale pilot programs o0 partially scaled programs.23 In addition testing a program on a small scale may cause the composition of the program participants to differ from what it woulc be in an ongoing training prograrr by inhibiting diffusion of informatior about the program to potential applicants; by limiting the number of program slots and thereby encouraging program administrators to restrict participation to "higher quality" applicants: or, in an experiment, by discouraging risk-averse individuals from applying tc a program when they could be randomly assigned to a no-services control group (see Heckman 1992; Heckman and Smith 1995; and Manski 1993, 1995).24

At present, little is known about the practical importance of community effects, although in principle their presence could greatly multiply, or seriously impede, the effectiveness of government training programs. An important area for future theoretical and empirical research may lie in adapting methods from sociology, urban anthropology, ethnography, and community psychology to study the community effects of large-scale, permanent training programs.25 Similarly, although the possibility of bias caused by distor- tion of the participant sample in small- scale selective voluntary programs has strong theoretical appeal, its empirical importance is yet to be demonstrated.26

One nonexperimental approach for avoiding biases caused by testing policy innovations on a small scale is to imple- ment them on a site-wide, fully scaled basis in some locations and, for comparison, use other sites (perhaps statistically matched) that have not adopted the innovation. Although this "saturation" evaluation design does, in principle, allow feedback effects to be captured, the program may have to be kept in place for many years, with firm guar- antees of permanency, before these effects reach full potency. Moreover, cross-site comparison designs will produce unreliable estimates of program effects if the program and comparison sites differ in ways that are inade- quately controlled for in the evaluation (see Robinson Hollister and Jennifer Hill 1995; Friedlander and Robins 1995). Indeed, even if sites are ran-

23 In at least two recent instances-the Massa- chusetts Employment and Training Choices Pro- gram (Demetra Nightingale et al. 1991) and the Washington State Family Independence Program (S. Long and Wissoker 1995)-these issues were considered so important that a deliberate decision was made against using a random assignment evaluation design that would create a no-program control group and would therefore interfere with site-wide program coverage.

24To illustrate the potential bias caused by distor- tions in the participant population, Manski in- vestigates nonparametric bounded estimators that are virtually assumption-free (see, also, Angrist and Imbens 1991). As Manski shows, the bounds can often be quite large, and can be narrowed only if the evaluator is willing to impose strong and untestable assumptions about behavior of individuals outside the specific program being evaluated.

25 Hypotheses concerning community effects are currently being tested in the evaluation of the Youth Fair Chance Demonstration,

26In a study of "creaming" in the JTPA Title IIA program, Kathryn Anderson, Richard Burkhauser, and Jennie Raymond (1993) find that the problem of nonrandom selection of participants is not as serious as some critics suggest.




domly assigned to program and control status, there may simply be too few sites to assure that the two groups of sites do not differ in some unobserved way.

Fourth, as suggested in Section VA.2, it is often the case that some members of control or comparison groups receive services similar to those received by program group members. The possibility of substituting training program activities for similar activities provided elsewhere first gained empirical attention in the WIN evaluations of the 1980s, when it was found that participation in education and training activities observed in a randomly assigned program group also took place among members of the control group (James Riccio et al. 1986; Gayle Hamilton and Friedlander 1989). It was confirmed in several later evaluations, notably in the JTPA evaluation (Orr et al. 1996; Heck- man and Smith 1995). Some control group members engaged in activities through adult schools, community colleges, or other local institutions, and they did so without special program assistance. Moreover, in order to provide education and training, government programs often send their enrollees to the same local institutions, where they attend classes side-by-side with individuals who are in the target population but who are not enrolled in the government training program.

Under such circumstances, when Pio is defined as zero for all sample members not in the government training program, bt in equation (la) does not measure the pure effect of participating in training versus not participating in training. Rather, it measures the incremental effect of the additional participation in training stimulated by the program being evaluated. This measure is clearly policy-relevant, but it is not what is implied by Model 1. In addition,

the fact that comparison group members, as well as program group members, engage in training is a source of at least two threats to external validity. First, not only will the evaluated program differ over time or from one place to another, but the array of activities available to comparison group members will also differ, complicating the problem of generalizing the evaluation results. Second, the very existence of the program being evaluated may change the number of training opportunities available to the comparison group. This second threat to external validity, which Heckman and Smith (1995) call "substitution bias," could occur if, by absorb- ing some persons who desire.training, the evaluated program frees up more nonprogram training slots for others who want training. Or, if the evaluated program is large enough, it may induce state and local governments to refrain from funding training activities they would normally provide in the absence of the program.

B. Estimating the Effects of Policy

Participation in similar or identical training activities by persons who are in the program-eligible population but are not enrolled in the government training program proper introduces a profound conceptual problem for Model 1. As we have just indicated, estimates of bt no longer represent the effect of training on participants. Moreover, any realistic model must allow for the possibility that training program activities that suc- cessfully increase skills and help participants find jobs may nevertheless fail to produce an effect on the earnings of training program graduates if participants would have participated in similar activities on their own in the absence of the special program. The net effect of the program on participation in training in that case is zero, and the net effect




on labor market outcomes must also be zero, unless the training program im- parts some extra efficiency to the activities in question.

To make the model more realistic and to allow for duplication of activities inside and outside the training program, we begin by positing that an individual, i, in the population eligible for the government training program faces a number of opportunities for training-of which the government training program is only one. Under these real-world conditions, the government program does not "provide training" but rather makes an "offer of training" consisting of a bundle of services to facilitate participation in certain activities plus supports and incentives to individuals to participate in those activities. One of the services offered may be the actual training, or it may be referral and access to training provided elsewhere in the community. Some training activities, such as remedial reading and math courses, will be virtually identical for training participants whether they are enrolled in the government training program or in some other program. Other activities, notably subsidized employment, will be available only to government training program enrollees. The program offer may also be called the government "policy" with respect to the training for the program in question. The prospective program enrollee considers whether to accept that offer, a competing offer, or none at all.

In our model, participation, Pio, can thus no longer be defined as unique to the training program in question. In- stead, Pio must be redefined to represent participation in any training activities similar to those offered by the special program. A training "participant" is defined as a person engaged in one of these activities, whether enrolled in a government training program (an

"enrollee") or acting outside the program on his or her own initiative. The distinction between program "enrollee" and training "participant" is important and is maintained throughout the rest of this article.

Returning to the formal model, we assume, for simplicity, that Pio is a scalar representation of a single training activity and that there is no difference in the efficiency of similar activities between the government program and nongovernmental providers. We then rewrite Model 1 as Model 2 by adding a term to the second equation:

Yit = ctXi + btPio + ujt, (2a)

Pio = aoZi + goTi + eio. (2b)

In this model, Ti is a binary scalar that takes on the value of unity if the program offer of training is in effect for individual i and zero if the offer is not in effect. Thus, go will measure the program's incremental effect on participation-that is, the change in training participation induced by the program. Under this formulation, bt is restored to its original meaning as the measure of the effect of participating in training on participants. In this case, bt applies both to those who receive training as enrollees in the government program and those who receive it on their own.27

27To generalize Model 2, Ti could be specified as an array of government services and incentives characterizing training "policy." In addition, Model 2, as we have written it, makes no provision for decreasing marginal returns to training, so that bt does not decline as the scale of the training program increases (i.e., for programs for which go is large). Incorporating scale effects could be done by making bt a function of P averaged across the

opulation. Allowing personal characteristics to affect the returns to training and the training decision could be accomplished by making bt a function of Xi and go a function of Z,. Finaly, a further generalization of Model 2 would be to distinguish the efficiency of training participation that does or does not come through the government program. This could be accomplished by creating separate P variables and b coefficients for the government program and other training.




By substituting for Pio from equation (2b), equation (2a) can be rewritten as

Yit=ctXi + (btao)Z1 + (btgo)Ti + bteio + uit.

(2a')

Thus, for person i, the total effect of the program being evaluated on the outcome variable is btgo, which is the product of the effect of participating on participants (bt) and the incremental effect on participation (go). The aggregate program effect (ignoring macro effects) is btgo multiplied by the number of persons in the eligible population. As should be evident, a training program with a large effect of participating on participants (bt) could yield a rather low total program effect if the incremental effect on participation (go) is small. Con- versely, a modest effect per participant may lead to a larger total effect if the increase in participation from the new program is large. Whenever participation in a government training program mostly duplicates participation that would have occurred anyway, go will be close to zero and the aggregate effect of the program will be small, even if the training is effective. The point of Model 2 is that a full assessment of the effects of a government training program requires not only an estimate of the effects of training on training participants, but also an understanding of program effects on overall participation in training in the context of existing training opportunities.

The revised model may be used to illustrate a fundamental noncomparabil- ity between estimated program effects that come from participant/nonparticipant designs on the one hand and from comparison group designs (including randomized experiments) and comparison area designs on the other. A participant/nonparticipant design estimates bt in equation (2a) by comparing outcomes

for training participants and nonparticipants, in exactly the same fashion as in equation (la). In contrast, under comparison group designs, a program group represents condition Ti = 1, and a comparison group (or comparison area sample) represents condition Ti = 0 in equation (2a'). Not all members of the program group are participants and not all members of the comparison group are nonparticipants. The coefficient of T1 is btgo, not bt. In examining estimates from the empirical literature in this article, it is therefore invalid to compare the magnitude of estimated effects from the earlier participant/nonparticipant designs with those from the later experimental designs and with other comparison group or area designs. Because go is less than unity, participant/nonparticipant designs will yield more positive estimated effects for a given program or for programs with similar activities, even when selection bias has been cor- rected. The statistical significance of estimates from the two kinds of designs may also not be comparable if go is small. Only the signs should be the same (as long as go is positive).

For the estimated effects we present in this article, however, valid comparisons of magnitude can be made across all evaluations using the internal rate of return (IRR). Estimates of the IRR compare the program effect with the cost of achieving that effect. Cross- study comparability of IRRs is pre- served, whether a particular program effect is measured as bt or btgo, as long as that program's cost is measured on the same basis (that is, as a per-participant cost for the former or as a net difference between cost per program group member and cost per comparison group member for the latter).

Participant/nonparticipant designs cannot provide estimates of the program effect on the amount of training activ-




TABLE 1 TRAINING PROGRAM EVALUATION STUDIES

VOLUNTARY PROGRAMS

Years of Program Scope Operationa Target Group

MDTA NAT 1962-1974 Disadvantaged adults and youth

NYC NAT 1964-1974 Disadvantaged youth

JOBS68 DEMb 1968-1974 Disadvantaged adults

JC NAT 1964-present Disadvantaged youth

CETA NAT 1974-1983 Disadvantaged adults and youth

SW DEM 1975-1978 Long-term AFDC recipients, ex-addicts, ex-offenders, high school dropouts

HHA DEM 1983-1986 AFDC recipients TOPS DEM 1983-1986 AFDC recipients NJGD DEM 1984-1987 AFDC recipients MFSP DEM 1982-1988 Low-income minority single mothers

ET DEMc 1983-1989d AFDC recipients JS DEM 1985-1988 High school dropouts NC DEM 1989-1992 AFDC high school dropouts

JTPA NAT 1983-present Disadvantaged adults and youth

a "Years of Operation" do not necessarily coincide with dates of authorizing legislation. Calendar years are shown, not fiscal years. b JOBS68 was a national program but is classified as a demonstration because it was short-lived and featured only a single training activity. c ET was a state-run version of a national program but is classified as a demonstration because its research interest lies mainly in the large scale of its voluntary approach to training for welfare recipients. d In 1989, ET began operating under authority of the new federal Job Opportunities and Basic Skills Training (JOBS) Program.

ity, go. Under comparison group or area designs, go can be found by estimating equation (2b); that is, go is the coefficient of Ti in a regression of Pio on Ti and Zi. Current evaluation practice under such designs is to report program and comparison group levels of partici-

pation in the various activities offered by the government training program. With this information, it is often possible to determine whether a weak total effect results from a limited program effect on participation in training (go near 0) or from a small effect of the




TABLE 1 (Cont.) TRAINING PROGRAM EVALUATION STUDIES

VOLUNTARY PROGRAMS

Method of Program Main Activitiese Evaluation Study Evaluation

MDTA CT, OJT Ashenfelter (1978), Cooley, McGuire, & NXL Prescott (1979), Kiefer (1978, 1979),

Gay and Borus (1980), and Bloom (1984b) NYC CT, PWE Kiefer (1979), NXL

Gay and Borus (1980) JOBS68 OJT Kiefer (1979), NXL

Gay and Borus (1980) JC CT, PWE Cain (1968), Gay and Borus (1980), NXL

Kiefer (1979), Mallar et al. (1982) CETA CT, OJT, PWE, PSE Westat (1984), Ashenfelter and Card (1985), NXL

Bassi (1983, 1984) Bassi et al. (1984), Bloom (1987), Bryant and Rupp (1987), Dickinson,

Johnson, & West (1984, 1986, 1987a, 1987b), Finifter (1987)

SW PWE with training Hollister, Kemper, & Maynard (1984), XL Couch (1992)

HHA PWE with training Bell and Orr (1994) XL TOPS OJT, UWE Auspos, Cave, & Long (1988) XL NJGD OJT Freedman, Bryant, & Cave (1988) XL MFSP CT, OJT Burghardt et al. (1992) XL

ET CT, OJT, PWE, UWE Nightingale et al. (1991) NXL JS CT Cave et al. (1993) XL NC CT, PWE, UWE Quint et al. (1994) XL

JTPA CT, OJT Orr et al. (1996) XL

e Most programs with training components also provided assistance with job search. Key: MDTA = Manpower Development and Training Act; NYC = Neighborhood Youth Corps; JOBS68 = Job Opportunities in the Business Sector; JC = Job Corps Program; CETA = Comprehensive Employment and Training Act; SW = National Supported Work Demonstration; HHA = AFDC Homemaker-Home Health Aide Demonstrations; TOPS = Maine Training Opportunities in the Private Sector Program; NJGD = New Jersey Grant Diversion Project; MFSP = Minority Female Single Parent Demonstration; ET = Massachusetts Employment and Training Choices Program; JS = JOBSTART Demonstration; NC = New Chance Demonstration; JTPA = Job Training Partnership Act; NAT = national; DEM = special demonstration; AFDC = Aid to Families with Dependent Children; CT = classroom training (basic education and occupational skills training); OJT = on- the-job training; UWE = unpaid work experience; PWE = paid work experience; PSE = public service employment; NXL = nonexperimental; XL = experimental.

training itself (go substantially greater than 0).

C. Empirical Evidence for Voluntary Programs

The jumping-off point for our exami- nation of empirical results is the com-

pendium of past research in Charles Perry et al. (1975). Nearly all the studies examined by this group of researchers was understood to be inade- quate. Of the 252 studies covered, many had no estimates of program effects; others judged effects from simple be-




fore/after comparisons of wage rates or earnings. A minority of the studies used comparison groups but created these groups from program "no-shows" or dropouts, which today would be generally recognized as inviting bias from selection on unobservables. "In almost every case in which a [comparison] group was used," the authors concluded, "there were valid reasons to question the comparability of the [comparison group] and the treatment group" (p. 139). Random assignment was, however, considered impractical.

Despite the limitations in methodology, Perry and his colleagues were able to form opinions on several of the research issues that would figure promi- nently on the agenda of evaluation research for the next 20 years. The authors assigned a range of $1,000 to $2,000 (1996 dollars) to short-term effects on annual earnings of skills training (under MDTA), assigned smaller estimates for effects of other program activities, and labeled work experience as the least effective. Program effects were judged largest for women, somewhat smaller for men, and smallest for youth. Earnings increases were attrib- uted mainly to increased employment, rather than to increased earning power, raising doubts about the worth of skills development activities within training programs.

Changes in methodology were already in progress. One of the major com- plaints raised by the Perry et al. (1975) study was the absence of "systematically collected follow-up data" on training program enrollees. This absence was addressed under CETA by the estab- lishment of the Continuous Longitudi- nal Manpower Survey (CLMS). This survey of CETA participants was linked to earnings records maintained by the Social Security Administration (S SA) and supplemented by comparison

groups created from the Current Popu- lation Survey (CPS) to form the basis of a number of CETA evaluations. The widely differing nonexperimental effects estimated using the CLMS data would, however, ultimately undermine this comparison group approach and lead to the abandonment of a similar evaluation strategy for JTPA.

The year 1975 also witnessed the start of the National Supported Work experiment, which was to set the stage for a dramatic shift from nonexperimental to experimental (i.e., random assignment) evaluation of training programs. That shift was later given impetus by the conclusions of a National Academy of Sciences committee reviewing research on youth training programs (Charles Betsey, Hollister, and Mary Papageorgio 1985) and by the recom- mendation of an invited panel of experts (Stromsdorfer et al. 1985) that the U.S. Department of Labor scrap plans for a nonexperimental evaluation of JTPA. They favored, instead, using a random assignment design. As shown in Table 1, a number of random assignment evaluations were conducted for voluntary programs during the 1980s- and more for mandatory programs (next section)-culminating in the recently completed multi-site random assignment evaluation of JTPA.

Table 1 lists in approximate chrono- logical order the voluntary training programs that have been evaluated using individual-level data since 1975.28 The table classifies each program by its scope: either national or special demonstration. For each program, the table shows the years of program operation (not necessarily the years covered

28 One exception, Cain (1968), is included in recognition of its importance in the evaluation history of the Job Corps. For recent summaries of training program evaluation results, see Gueron and Edward Pauly (1991) and LaLonde (1995).




by the evaluations), the demographic groups targeted, the principal activities, the major evaluation studies, and the evaluation method used. Although job search assistance is not indicated as a separate activity, most programs offered some help, formal and informal, in finding unsubsidized employment.

Because of methodological differences and other factors, not all the studies in Table 1 should be considered on equal footing. Our top criterion for judging the reliability of estimated program effects is a random assignment research design. We consider the results from experiments to be generally less subject to bias and imprecision than the results from studies using nonexperimental methods. We concur with Barnow (1987), who, in reviewing the accumulated nonexperimental effects estimates for CETA, concluded that "the confidence interval surrounding these estimates must be considered quite large considering the sensitivity to alternative specifications and the lack of any strong reasons to accept findings from one study over those of another" (p. 189). Many nonexperimental analysts also acknowledge the benefits of random assignment.

In our view, results from the recent JTPA evaluation (Orr et al. 1996; Bloom et al. 1997) are the most important. JTPA is national in scope-it is, in fact, the existing national program- and the evaluation design was experimental. In addition, the JTPA research team undertook extensive sensitivity analysis to examine possible underre- porting bias and survey nonresponse bias in the follow-up earnings data, which represent two of the principal threats to the internal validity of experimental research at present. In this paper, we often assess results of earlier studies against those found in the JTPA evaluation.

Table 2 summarizes findings from the 30 studies of the 14 voluntary programs listed in Table 1.29 Studies are orga- nized by demographic group and program scope. Adult men, adult women, and youth are shown in separate panels. Within each demographic panel, national programs (MDTA, CETA, and JTPA) are shown first, then special demonstrations. We report effects on earnings in the second year after training because the majority of studies follow trainees for at least this long.30 For studies in which second-year effects are unavailable, we report first-year effects. All earnings effects have been con- verted to 1996 Quarter 3 dollars, using the GDP chain-type price index.

For each program, the following summary statistics are reported: (a) the unweighted mean effect on annual earnings across evaluations;3' (b) the minimum and maximum estimated effects; (c) the number of earnings effects

29 Two studies-Nicholas Kiefer (1979) and Robert Gay and Borus (1980)-evaluate more than one program.

30 A number of evaluations have provided estimates of effects on welfare payments and welfare dependency, but, owing to space limitations, we omit these results.

31 In the "Mean Annual Effect" column of Table 2, each evaluation of a particular program contributes one estimate of an earnings effect for that program. Because several studies do not report an overall estimate but only report estimated program effects separately by site or subgroup (for example, minority group status or ethnicity, or subgroups receiving different combinations of services), it was often necessary to compute a single earnings effect for a study by averaging the subgroup estimates, using the size of each of the subgroups as weights, or for sites, using the unweighted site estimates. If a given study reports estimates for more than one econometric specification, the author's preferred specification was used if given; otherwise, an unweighted average across specifications was used. Using the single aggregate weighted effect from each study for a par- ticuIar rogram, an unweighted mean effect was then cafculated across studies. The component site or subgroup estimates from each study were, however, used in establishing the range and the fre- quency of statistically significant estimates in the "Range of Effects" column of the table.



TABLE 2 EFFECTS OF VOLUNTARY TRAINING PROGRAMS ON PARTICIPANT EARNINGS BY DEMOGRAPHIC GROUP

Demographic Group Mean Range of Effects (if more than one) and Program Annual (num. negative and stat. sig./num. negative and not stat. sig./num.

(Num. of Studies) Effect positive and not stat. sig./num. positive and stat. sig.)

Adult Men National

MDTA (6) $151 -$2,127 to $2,605 (2/2/2/5) CETA (9) -$587 -$3,342 to $1,634 (3/4/3/3) JTPA (1) $970 (0/0/0/1)

OJT $1,275 (0/0/0/1) CT $1,032 (0/0/1/0)

Demonstration

JOBS68 (2) $344 -$1,274 to $2,013 (0/2/1/1) SW (1) $419 $402 to $440 (0/0/2/0)

Adult Women National

MDTA (5) $1,926 $942 to $3,527 (0/0/1/8) CETA (9) $1,797 $28 to $2,815 (0/0/1/13) JTPA (1) $960 $771 to $1,103 (0/0/0/2)

OJT $1,157 $693 to $2,234 (0/0/1/1) CT $414 $316 to $498 (0/0/2/0)

Demonstration

JOBS68 (2) $1,676 $428 to $3,150 (0/0/1/3) SW (2) $1,309 $554 to $2,064 (0/0/1/1) HHA (1) $1,849 $209 to $3,749 (0/0/2/5) NJGD (1) $1,017 (0/0/0/1) TOPS (1) $1,448 (0/0/0/1) MFSP (1) $793 $108 to $1,722 (0/0/3/1) ET (1) $999 (0/0/0/1)

Youth

National NYC (2) -$531 -$3,742 to $3,630 (2/3/1/2) JC (4) $586 -$3,994 to $1,902 (3/3/4/1) CETA (5) $450 -$2,475 to $3,715 (4/2/4/4) JTPA (1) -$171 -$724 to $184 (0/1/1/0)

Demonstration

SW (2) $269 $20 to $517 (0/0/2/0) JS (1) $553 $424 to $578 (0/0/3/0) NC (1) -$295 (1/0/0/0)

Notes: Program effects and costs are in 1996 dollars. Program effects pertain to the second year after training (or earlier if second year effects are not available). "Mean Annual Effect" is calculated using one estimate from each study, unweighted. When a single overall estimate is not presented in a study, one is calculated by averaging across sites or demographic subgroups. The range of effects and statistical significance is reported over whatever full- sample, site, or subgroup estimates are reported. "Net Cost" is calculated as an average, unweighted, across studies having cost estimates. "Real Rate of Return" is calculated from "Net Cost" (year 0) and "Mean Annual Effect" (years 1-3 and 1-10). Some studies estimated effects for more than one kind of training, but only total program effect is reported here, except where noted. In a few studies, statistical significance is not reported. n.a. = information not available.



TABLE 2 (Cont.) EFFECTS OF VOLUNTARY TRAINING PROGRAMS ON PARTICIPANT EARNINGS BY DEMOGRAPHIC GROUP

Real Rate of Return Demographic Group Net Cost of Training If Mean Effect Lasts

and Program Per Participant (Num. of Studies) (Num. of Studies) 3 Years 10 Years

Adult Men National

MDTA (6) $6,053 (1) <0 <0 CETA (9) $8,919 (2) <0 <0 JTPA (1) $1,065 (1) 74% 91%

OJT $1,320 (1) 80% 97% CT $1,172 (1) 70% 88%

Demonstration

JOBS68 (2) n.a. n.a. n.a. SW (1) $13,425 (1) <0 <0

Adult Women National

MDTA (5) $6,053 (1) <0 29% CETA (9) $8,919 (2) <0 15% JTPA (1) $1,500 (1) 41% 64%

OJT $1,059 (1) 94% 109% CT $2,100 (1) <0 15%

Demonstration

JOBS68 (2) n.a. n.a. n.a. SW (2) $15, 244 (1) <0 <0 HHA(1) $9,741 (1) <0 14% NJGD (1) $870 (1) 103% 117% TOPS (1) $2,278 (1) 41% 63%

MFSP (1) $5,882 (1) <0 6% ET (1) $1,931 (1) 26% 51%

Youth National

NYC (2) n.a. <0 <0 JC (4) $11,010 (2) <0 <0 CETA (5) n.a. <Oa <Oa

JTPA(1) $2,006 (1) <0 <0 Demonstration

SW (2) $13,087 (1) <0 <0 JS (1) $6,4M1 (1) <0 <0 NC (1) n.a. <0 <0

a Assuming costs similar to those of adults, the real rate of return would be negative. Key: See Table 1.




estimates that are negative and statistically significant, negative and not statistically significant, positive and not statistically significant, and positive and statistically significant (statistical significance defined as a probability level of 10 percent or lower); (d) the net cost of training per participant; (e) the number of studies with cost estimates available;32 and (f) our estimate of the real internal rate of return under two alternative assumptions about how long the estimated mean effect on earnings lasts (three years versus ten years).33

In the JTPA study, researchers provided effects estimates in two formats: "per sample member" and "per participant."34 Because the participation rate within the program group is less than 100 percent, these two estimates differ. The per-sample-member estimate is the basic estimated effect generated by the

random assignment design, calculated as the simple (regression-adjusted) difference in means between the program group and the control group. The per- participant estimate is obtained by dividing the per-sample-member estimate by the program participation rate, following Bloom (1984a). Net cost estimates are transformed in the same manner, so the rate-of-return calculations are unaffected. To increase comparability of estimates across experiments, we utilize the per-participant estimates from JTPA and adopt the same convention whenever possible for the other experimental studies, making the division using participation rates reported in the evaluations. This convention also makes the experimental estimates more comparable with those from the nonexperimental participant/nonparticipant comparison designs.

A broad overview of the results re- veals wide variation among the estimates. The estimates of effects on annual earnings presented in Table 2 range from a low of -$3,994 for non- black female youth in the Job Corps (reported by Gay and Borus, 1980) to a high of $3,749 for female AFDC recipients in the Texas site of the Home- maker-Home Health Aide Demonstra- tions (reported by Bell and Orr 1994). In total, 91 out of the 123 estimates (about three-fourths) are positive. Some 55 estimates are positive and statistically significant, more than might be expected from chance alone, but 15 are negative and statistically significant, also more than might be expected from chance.

Much of the variation in estimated program effects is associated with nonexperimental designs. Part of this nonexperimental variation stems from a lack of sufficient information to adjust fully for pre-program differences between participants and nonparticipants. For example, the limited prior earnings

32The quality and completeness of cost information varies considerably across studies. We attempted, whenever possible, to obtain net administrative cost per participant (excluding opportunity costs to trainees), subtracting out training costs expended on comparison group members. Such estimates represent the additional real resources consumed by the program. Net cost estimates were, however, generally not available in participant/nonparticipant studies. For some kinds of training (primarily paid work experience), we often could not remove payments to participants from the published cost estimates.

33 In calculating the rate of return, training costs are assumed to be incurred in year 0 and the benefits are assumed to be received in years 1 through 3 or, alternatively, years 1 through 10. If a program exhibits a negative effect on earnings, we assign a negative rate of return. Our rate of return estimates are obviously quite crude and intended to give only a rough sense of whether the government has made a good investment in the training program. Full-blown benefit-cost studies often allow benefits to vary over time, consider other benefits besides gains in earnings (e.g., reductions in crime and improvements in health), and other costs (e.g., the opportunity costs of participating in a training program).

34 Orr et al. (1996, pp. 35-37) use the term "as- signees" to denote the full sample of persons randomly assigned (whether or not they participate) and the term "enrollees" where we use "participants."




histories for youths provide only weak explanatory power in predicting their future earnings. Thus, quite large positive and negative estimates of earnings effects for this group have been obtained in nonexperimental evaluations.

Another factor contributing to the variation associated with the nonexperimental methods is specification uncertainty. One of the chief difficulties has been identifying permanent and transitory components of variation in preprogram earnings in controlling for differences between participants and nonparticipants. Wide variation among estimates provided by different nonexperimental specifications applied to the same data set for a single program is documented by Ashenfelter and Card (1985), Barnow (1987), and Dickinson, T. Johnson, and West (1987b) for CETA and by LaLonde (1986) and Fraker and Maynard (1987) applying nonexperimental methods to the experimental data of the National Supported Work Demonstration. Heckman and Hotz (1989) argue that appropriate utilization of specification tests can narrow this wide variation, but Friedlander and Robins (1995) find that such tests do not narrow the range of nonexperimental estimates by very much in practice.

Consistently strong evidence has accumulated that government training programs have been effective for adult women. The experimental estimates of JTPA's effects on earnings are positive and statistically significant, and the rate of return on cost in JTPA is large even in the short term. Both MDTA and CETA also exhibited large positive average earnings effects, with all estimates positive and nearly all statistically significant. Among all ten programs evaluated for adult women, the mean effect is positive in every one. About three-quarters of the 49 estimates for women in the "Range of Effects" col-

umn of Table 2 are positive and statistically significant, and none is negative. Furthermore, the average effect across programs for adult women is close to $1,400 per year, and in many cases the programs yielded a substantial positive rate of return. Nevertheless, as pointed out by Heckman, Rebecca Roselius, and Smith (1994), as well as others, such earnings effects, although substantial, are not large enough to lift most families out of poverty. Moreover, for women who head families on welfare, the earnings gains are often partially offset by reductions in transfer benefits.

Evidence has been accumulating for a number of years that training programs have been ineffective in producing lasting earnings effects for youth. Perry et al. (1975) viewed expenditures on youth in the programs of the 1960s-especially the large expenditures for work experience wages in the Neighborhood Youth Corps (NYC), whose budget grew to exceed even that of MDTA-to a great extent as income transfers, providing little in the way of enduring en- hancements to earning power. As shown in Table 2, studies of the NYC and the Job Corps (JC) have yielded a scatter of positive and negative estimates, with somewhat more of the latter. One nonexperimental CETA study, Dickinson, T. Johnson, and West (1984), produced quite large positive estimated earnings effects for youth. The experimental estimates from the JTPA evaluation, however, are small and bracket zero. This result is especially important because the net costs of working with youth in that program were similar to those for adult men and women, making it therefore difficult to conclude that youth were not receiving attention from the program. Moreover, no significant positive earnings effects were found for either male or female youth in any of three program activity clusters or 39 subgroups




examined by the JTPA evaluators. In recent years, none of the experimental estimates from special demonstrations have been positive and statistically significant for youth. These results support the view that training programs have not been effective in increasing the post-program earnings of youth.

The history of the all-youth Job Corps has been especially interesting. That program's residential facilities have made for a high per-participant price tag. Perry et al. (1975) judged the program's earnings effects to be "marginal" (p. 68), and a large negative estimate (the bottom of the range shown in Table 2) was obtained by Gay and Borus (1980). Nonetheless, the Job Corps, be- gun in 1964, is the oldest program in Table 2 still in operation and retains a high degree of popularity with Con- gress. Frequently cited in defense of the Job Corps is the large positive earnings effect estimated by Mallar et al. (1982, the top of the range shown in Ta- ble 2), which was based on what some analysts consider to be a relatively strong comparison area design. Re- cently, however, the JOBSTART evaluation experimentally tested a nonresi- dential version of the Job Corps and found weak earnings effects. Because of the continued uncertainty about the true effects of the Job Corps, a multi- site, experimental evaluation of this program is currently being conducted.

In contrast to the conclusions about youth, conclusions about men were given a boost by the JTPA evaluation. The estimated average earnings effect shown in Table 2 for men in MDTA is small. In CETA, more than half the effects estimated for men are less than zero, and the average is negative. Esti- mated earnings effects for JOBS68 are half negative and half positive, with a small mean positive effect. The experimental estimates from Supported Work

are small but cover special groups (ex- offenders and ex-addicts) that are not fully representative of the main population of economically disadvantaged adult men. The JTPA earnings effect for men, however, is as large as the average effect for women, and likewise has a high rate of return, even in the short run. The JTPA finding for men, therefore, represents a significant break with the results of past evaluations.

A longstanding problem in program evaluation has been distinguishing the effects of different program activities. The presumption commonly held from the earliest days of federal involvement, and a view expressed by Perry et al. (1975, esp. pp. 6, 28), is that the more skills development an activity achieves, the larger and longer-lasting will be its earnings effects. Skills development is often implicitly associated with the intensity and cost of an activity, with greater skills development seen as re- quiring greater effort by participants and greater costs to programs. A key research question has been: How much do activities intended to enhance skills actually increase skills, earning power, and long-term earnings? In our view, the evidence is mixed. A link between increased cost and intensity of training and greater earnings effect has not been firmly established.

Activity-specific earnings effects estimated for MDTA and CETA do not provide widely credible guidance on this issue. Barnow (1989, p. 121) reported a general consensus within the evaluation community that earnings effects of on-the-job training were slightly larger than those for classroom training, but also concluded that there was "only weak evidence that CETA training programs increased the skill levels of participants" (p. 126). Activ- ity-specific earnings effects estimates found in the CETA evaluations suggest




low rates of return to training in general and reveal no positive relationship between the cost of an activity and the magnitude of the earnings effect.

Work experience stands out in the pre-CETA and CETA literature as being judged quite costly, owing in part to the stipends paid to participants, but having little actual training content and only a small effect on post-program earnings. During the late 1970s and early 1980s, two kinds of work experience designed specifically to incorporate strong training components were implemented as special demonstrations (Supported Work [SW] and Home- maker-Home Health Aide [HHA]) and were evaluated using random assignment designs. The results (Table 2) suggest that work experience that provides training can be effective for adult welfare women, but they affirm the high- cost view. Moreover, the activity was found to be ineffective for the particular groups of adult men and youth studied. Work experience has become one of the lesser utilized activities in present-day voluntary training programs.35

Early evaluations of on-the-job training based on nonexperimental analyses of JOBS68 data produced mixed results,

implying positive earnings effects primarily for women. During the 1980s, random assignment evaluations of special demonstrations of on-the-job training for adult welfare women (New Jersey Grant Diversion [NJGD] and Maine TOPS) found positive earnings effects and high rates of return on net cost, even in the short run (Table 2). Results from the JTPA experiment (Table 2) also found positive earnings effects for on-the-job training and suggest high rates of return on cost even in the short run for both adult men and adult women. Whether these positive effects come from skills acquisition, from the wage subsidy given employers as part of the activity, or from program assistance given to enrollees in finding on-the-job training opportunities has not been established.

Classroom training was not tested by random assignment until the JTPA evaluation. Results for classroom training in JTPA, primarily in occupational skills rather than remedial basic education, were mixed (Table 2). Positive earnings effects were found for men and women, but these were statistically significant only when the two groups were com- bined, and rates of return were high only for men. Again, it is not clear how much of the earnings effects can be at- tributed specifically to skills acquisition. For youth, the JTPA researchers found no evidence that either on-the- job or classroom training improved earnings (Orr et al. 1996, p. 177).36

35 Work experience without intensive training has retained some appeal in mandatory programs for welfare recipients, where government budget costs can be kept low by substituting welfare payments for work experience wages and where the production of pub ic goods and services by work experience participants is accorded a high value. From the social cost perspective, work experience wages are not counted, because they are merely transfers and do not represent real resources used up. We have excluded them in SW and HHA costs and wherever else the data allow. In addition, the value of goods and services produced by work experience participants in their subsidized jobs may be substantial and would be counted as a social benefit in a complete benefit-cost analysis. For example, the value of in-program product was large enough to make the total net benefits to society exceed social net costs for two of the four SW subgroups and for as many as six of the seven HHA sites.

36 In the JTPA evaluation, random assignment to particular kinds of training occurred after individuals were recommended for the training, but before they received it. Thus, the JTPA results reported in Table 2 are for the kind of training to which individuals were assigned, not necessarily the training they actually received. Because not all persons received the training to which they were assigned, the results by kind of training for JTPA mix more than one kind of training. The implications of this "post-randomization treatment choice" are discussed in Hotz and Seth Sanders (1994).




The effectiveness of classroom training may depend crucially on the relative emphasis placed on upgrading general academic skills versus training for a specific occupation. That conclusion is suggested by results from the Minority Female Single Parent (MFSP) demonstration, a random assignment test of intensive education and training for a largely welfare target population. Among the four evaluation sites, post- program earnings effects were small at sites that attempted to correct longstanding reading and math deficits with an initial stint in remedial basic education. Statistically significant positive earnings effects were found only for the one site-the San Jose Center for Employment Training (CET)-that put participants immediately into skills training for particular jobs, regardless of their previous educational attainment, and followed up with job placement after training (John Burghardt et al. 1992). CET was also the only site with positive earnings effects in the JOB- START demonstration.

The absence of long-term follow-up in most studies is a critical problem in assessing the effectiveness of lengthy and costly skills development activities. The limited evidence available suggests that earnings effects may persist. Ken- neth Couch (1992) found that the effects of the National Supported Work Demonstration lasted for at least eight years after training. Amy Zambrowski and Anne Gordon (1993) found that earnings effects for CET in MFSP persisted through at least five years. In analyzing extended follow-up data from the JTPA experiment, the U.S. General Accounting Office (1996) found that effects for adults continued over five years of follow-up, although the later- year effects were smaller than the peak effect and were not generally statistically significant. Friedlander and Bur-

tless (1995) found that the effects of three mandatory welfare-to-work programs emphasizing rapid employment- two low-cost and one middle-cost program -peaked and then declined substantially by the fifth year after the training. The effects of a fourth program, which placed more emphasis on skills upgrading, persisted. A link between the training content of activities and the durabil- ity of earnings effects has not been demonstrated conclusively, however.

Did the shift toward random assignment research designs after 1975 yield results that ought to alter opinion about program effects? The experimental evidence should reduce concerns that participating in voluntary training programs may result in significant lost earnings opportunities: the negative and statistically significant estimates in Ta- ble 2 are all nonexperimental except for one (New Chance). The view that programs are relatively effective for adult women remains unchallenged, but the supporting evidence has been strength- ened. For adult men, however, the findings of the JTPA experiment suggest more favorable prospects in training than did past nonexperimental estimates. In addition, for both men and women, the large estimated rates of return on net cost calculated for the JTPA experiment provide a much more optimistic picture than do the nonexperimental MDTA and CETA findings. For youth, in contrast, concern about training program effectiveness can only be heightened by the experimental evidence. Generally speaking, experimental estimates have exhibited less variation than have nonexperimental estimates, most notably for youth.

One clear contribution of the experimental approach has been to advance awareness of substitution and duplication of training activities inside and outside government training programs. Es-




timates of the net cost of resources used up in training programs have decreased substantially as greater attention has been paid to improving estimates of participation in similar activities by control group members. On the other hand, the experimental approach has made less progress in quantifying the training content of the various program activities and in systematically investigat- ing the critical links between training content and the enhancement of skills, earning power, and actual earnings.

At present, the most important unre- solved issue concerning voluntary training programs for adults is the efficacy of various policy tools intended to increase program scale by increasing the number of participants and the intensity and expense of the activities provided to them.37 By how much would changes in incentives, supports, targeting strategies, and program operating practices increase participation? How large would be the resulting increases in aggregate program costs and earnings effects?38 The JTPA experiment has provided the most credible evidence to date about

training program effectiveness, but the estimated effects pertain only to the practices and scale typified by the programs actually evaluated. It is unclear whether the high rates of return observed in the JTPA experiment would still be observed if the scale of participation were substantially increased.

The apparent difficulty in generalizing effects estimated in a random assignment evaluation to a range of policy options has been an important and trou- bling theme in recent critiques of the experimental evaluations of training programs (Section V.A.3). In our view, one particularly telling ramification of these critiques has been the failure of the experimental method thus far to address systematically and rigorously the scale issue. We think it would be an error to read these critiques as a call for a return to the nonexperimental methods utilized in the past. The contribution of these critiques has been, instead, to identify serious problems in generalizability that random assignment does not solve.

VI. Evaluating Mandatory Training Programs

Analytically, the most important aspect of mandatoriness in a training program is the possibility of program effects on enrollees who do not participate in formal program activities. Such effects may be produced directly on welfare income by financial penalties, called sanctions, which typically amount to about 20 percent of the monthly welfare check. Effects on nonparticipants may also be produced indirectly as individuals find employment or leave welfare to avoid pressure from program staff to comply with a time-consuming participation requirement, the so-called deterrent effect. In this connection, it is important to note that mandatory welfare-to-work programs generally permit

37 Steven Sandell and Rupp (1988) find that, among adults, as few as 2.3 percent of the target group defined by law (i.e., persons fitting the defi- nition of "economically disadvantaged") are engaged in JTPA activities, although the percentage is much greater (about 13 percent) among the small minority of this population who are jobless and looking for work. Orr et al. (1996, p. 232, n. 10) cite another 2 percent estimate for voluntary participation among welfare recipients in the Homemaker-Home Health Aide (HHA) Demon- strations.

38 In considering the consequences of increasing program effort, Friedlander (1993) posits an inverted U relationship between the degree to which a subgroup is disadvantaged and the amount of program effort required to obtain an earnings effect of a given magnitude. For the most disadvantaged program enro lees, there may be a "threshold"of program effort below which no effects on employment or earnings will be realized. If this threshold is high, then a Large amount could be spent to increase participation among the most disa vantaged without producing much effect on their earnings.




enrollees to engage in part-time employment while they remain on welfare as a substitute for participation in a training program activity. Indeed, employment that is a substitute for participation may occur as often as program participation itself in some programs, especially in states where welfare grant levels are high enough to permit substantial concurrent mixing of work and welfare (see, e.g., Hamilton 1988, p. xviii). The potential for sizable program effects on nonparticipating program enrollees makes the interpretation of estimated earnings effects for mandatory programs different from voluntary programs.

A. Accounting for Program Effects on Nonparticipants

To incorporate the potential effects of mandatory programs on nonparticipants, equation system (2) is modified by adding an additional term to the first equation, creating system (3):

Yit = ctXi + btPio + htTi(1 - Pio) + uit, (3a)

Pio = aoZi + goTi + eio. (3b)

Again, the dummy variable Ti indicates whether the program "offer"-or, in the mandatory case, the program "requirement"-is or is not in effect for individual i. The new coefficient, ht, is the program effect on nonparticipants who are covered by the participation requirement.

As with equation system (2), estimating the full effect of the program in system (3) requires an evaluation sample that includes some individuals for whom Ti = 1, the program group, and others for whom Ti = 0, the comparison group. These groups may be created in areas with and without the program or within a single area among individuals who are and are not subject to the program participation requirement. The

latter kind of comparison group may be created by random assignment. As before, both program and comparison groups will have participants and nonparticipants in training activities, although it is naturally assumed that participation will be greater when Ti = 1.

The total effect of the program on individuals subject to program participation requirements can be found by substituting (3b) into (3a), evaluating Yit at Ti = 1 and Ti = 0, and taking the difference (noting that Ti x Ti = Ti for the dummy variable), which is

btgo + ht[1 - Pio(Ti = 1)], (3c)

where Pio(Ti =1) represents aoZi + go, the probability of participation when Ti = 1. An estimate of the entire expression (3c) can be obtained as the coefficient of Ti in a regression of Yit on Ti and control variables. The first term in this expression, btgo, is the same as the coefficient of Ti in equation (2a'). The new term is the program's effect on nonparticipants, ht, multiplied by the probability of nonparticipation, 1 - Pio(Ti = 1), among those covered by the participation requirement.

Although the entire expression (3c) may be estimated from a comparison group or comparison area design, unique estimates of bt and ht cannot be recovered, even if go and 1 - Pio(Ti = 1) have been estimated from equation (3b). Thus, in contrast to voluntary programs, comparison area and comparison group (including random assignment) designs for mandatory programs typically cannot provide valid estimates of the earnings effects of participating in activities, bt. Consequently, estimates of participation effects do not generally appear in mandatory program evaluation studies. Nor is dividing published effects estimates by the program participation rate a valid option, as it is for voluntary program evaluations. Direct




comparisons of the estimated effects of voluntary and mandatory programs, therefore, cannot be made, even when experiments are used to evaluate both. Valid comparisons can, however, be made using estimated internal rates of return.39

B. Empirical Evidence for Mandatory Programs

Table 3 lists the main mandatory training programs and the 16 evaluation studies of them. As indicated, over the last 20 years, all the major evaluations of mandatory programs have been based on random assignment designs. Table 4 shows estimated program effects. All the studies shown for adult welfare men were part of studies that included adult welfare women. All WIN and JOBS studies in Table 4 used similar (random assignment) designs and data sources, providing an unusually high degree of comparability among themselves. Typically, about 50 percent of the samples of randomly assigned program enrollees actually participated in a formal program activity within about a year of entry into a WIN or JOBS program. Two categories are shown for WIN programs. The earlier WIN studies examined the least costly programs, which offered mainly supervised job search, with unpaid work experience included in some of them (WIN-JS/WE in Table 4). Two somewhat higher-cost efforts that added remedial education and vocational training activities are classed as "WIN mixed

services" programs (WIN-MIXED). The net costs of the evaluated JOBS programs are greater than those for WIN because JOBS programs assigned a significant share of enrollees to education and training.

The great majority of the earnings effects shown in Table 4 for mandatory programs are positive. Within each WIN/JOBS category, effects are larger for women than for men, and rates of return are higher. Most estimated effects for women are statistically significant; most estimated effects for men are not. These gender differences in fa- vor of women in mandatory programs are consistent with the gender results for voluntary programs prior to JTPA. The JTPA rates of return for men are clearly larger than those for men in mandatory programs.

Economists generally presume that mandatory programs will have smaller earnings effects per dollar spent than will voluntary programs because they include many enrollees who do not ex- pect much financial benefit from participation. The results for men in WIN/JOBS versus JTPA affirm this view, but otherwise we do not find strong and consistent support for it (see also Orr et al. 1996, pp. 203-05). Moreover, differences for men between WIN/JOBS and JTPA may result from the more disadvantaged population in mandatory programs or simply from the fact that WIN/JOBS enrollees are receiving welfare and nearly all JTPA men are not. The earnings effects of mandatory programs have also sometimes been constrained by the necessity of spreading a modest program budget quite thinly over a large mandated target population, limiting the ability to promote significant and sustained program activity (e.g., FSETP).

Mandatory program evaluations, like those for voluntary programs, have pro-

39For the sake of completeness, it should be noted that bt for mandatory programs can be estimated within the context of a comparison- group/area design by a nonexperimental participant/nonparticipant comparison in which nonparticipants are not subject to the program mandate (i.e., for whom Ti = 0), providing, of course, that selection bias can be removed. If bt can be estimated in such a fashion, then ht can be recovered as well.




TABLE 3 TRAINING PROGRAM EVALUATION STUDIES

MANDATORY PROGRAMS

Years of Main Program Scope Operation' Target Group Activities

WIN-JS/WE NAT, DEM 1967-1989 AFDC recipients JS, UWE

WIN-MIXED DEMb 1982-1987 AFDC recipients JS, UWE, CT FSETP NAT 1987-present Food Stamp recipients JS JOBS NAT 1989-1996c AFDC recipients JS, UWE, CT

a "Years of Operation" do not necessarily coincide with dates of authorizing legislation. Calendar years are shown, not fiscal years. b We classify Baltimore Options as a special demonstration because its organization differed from most WIN programs. c Since 1996, states have continued to operate mandatory welfare-to-work programs under legislation that created welfare block grants to states.

duced mixed evidence about the ability of more intensive and expensive skills development activities to increase skills, earning power, and long-term earnings.40 In general, larger dollar expenditures have not produced markedly larger earnings effects and have therefore met with decreasing rates of return (see Table 4). For women, who have the more positive results, short-term rates of return calculated from mean effects are positive and very large for WIN- JS/WE and WIN-MIXED, but not for the more costly JOBS programs, for which only the longer-term rates are positive. The JOBS results may be associated with extensive use of remedial education as a first activity, rather than immediate assignment to job-specific training that was used effectively in the

voluntary CET program described earlier.

Particularly important in this connection are findings from a random assignment evaluation of California's JOBS program, GAIN, which stands out nationally in the large scale and expense of its investment in remedial education. Earnings effects for those two-thirds of program enrollees who were specifically targeted for the education activities were relatively modest in light of the additional net costs incurred for them (Riccio, Friedlander, and Stephen Freedman 1994, p. 260). Further, a post-program test of basic reading and math skills, an innovation in training program research, revealed improvements in these skills in only one of five research counties (Friedlander and Karin Martinson 1996). Finally, the largest effects on earnings were found in the one locality that emphasized

40 See Friedlander and Gueron (1992) for an analysis of the relationship between net cost and program effects in 13 welfare-to-work programs.




TABLE 3 (Cont.) TRAINING PROGRAM EVALUATION STUDIES

MANDATORY PROGRAMS

Method of Program Evaluation Study Evaluation

WIN-JS/WE Goldman (1981), Wolfhagen (1983), Friedlander et al. XL (1985," 1986, 1987), Riccio et al. (1986)"d Goldman,

Friedlander, & Long (1986) WIN-MIXED Friedlander (1987)"d Friedlander and Hamilton (1996)'/ XL

FSETP Puma and Burstein (1994) XL JOBS Fein, Beecroft, & Blomquist (1994), Blomquist (1994),e XL

Kemple, Friedlander, & Fellerath (1995), Riccio, Friedlander, & Freedman (1994), Freedman et al. (1996),

Hamilton et al. (forthcoming)

d Supplemental follow-up contained in Friedlander and Burtless (1995). e Cost estimates were obtained from personal communication with the author. Key: WIN-JS/WE = Work Incentive Program emphasizing supervised job search and work experience; WIN- MIXED = Work Incentive Program incorporating education and/or training; JOBS = Job Opportunities and Basic Skills Training Program; FSETP = Food Stamp Employment and Training Program; NAT = national; DEM = special demonstration; AFDC = Aid to Families with Dependent Children; JS = job search training and assistance; UWE = unpaid work experience; CT = classroom training (basic education and occupational skills training); XL = experimental.

rapid employment over initial assignment to extended education activities.

Effects on welfare receipt and welfare payments (not shown in the table) have been found for some programs but not for others. The magnitude of welfare effects appears to depend in large part on the goals of local program administrators: higher earnings and greater "economic security" versus increased employment and more rapid welfare case closure. Regardless, expenditures for welfare-to-work programs have remained small compared with total welfare payments. Thus, even programs producing relatively large welfare effects reduce average payments by only 10 to 15 percent in the short term, and these effects tend to decrease over time. The great majority of welfare recipients who would have remained on public assistance without WIN or JOBS did so even with those programs.

VII. Estimating Effects of Government Training Programs on Society

To determine whether government training programs are socially efficient, evaluations cannot be limited to effects only on persons enrolled in the program. They must also take account of effects on persons not enrolled in the program. Doing this, however, is useful only if the program is found to have positive effects on those who enroll. Otherwise, effects on society as a whole are likely to be negative or, at best, negligible. If the program does have a positive earnings effect on those who receive training, then the effect on society may be positive, depending on the social costs incurred in operating the program. The usual way of taking account of societal effects is through benefit-cost analysis. For the reasons discussed below, existing benefit-cost




TABLE 4 EFFECTS OF MANDATORY TRAINING PROGRAMS ON EARNINGS OF INDIVIDUALS COVERED

BY THE PARTICIPATION REQUIREMENT BY DEMOGRAPHIC GROUP

Range of Effects (if more than one) Demographic Group (num. negative and stat. sig./num. negative

and Program Mean and not stat. sig./num. positive and not stat. (Num. of Studies) Annual Effect sig./num. positive and stat. sig.)

Adult Welfare Mena WIN-JS/WE (1) $190 (0/0/1/0) WIN-MIXED (1) $448 (0/0/1/0) JOBS (2) -$28 -$448 to $1,594 (0/3/2/1)

Adult Welfare Womenb WIN-JS/WE (7) $438 -$56 to $813 (0/1/1/5) WIN-MIXED (2) $728 $710 to $746 (0/0/0/2) JOBS (4) $444 $88 to $1,145 (0/0/4/7)

Food Stamp Recipients FSETP (1)c -$86 (0/1/0/0)

a WIN and JOBS results for "Adult Welfare Men" pertain to the two-parent AFDC-U welfare category, whose case heads are nearly all male. b WIN and JOBS results for "Adult Welfare Women" pertain to the single-parent AFDC basic welfare category, whose case heads are nearly all female. c The evaluation sample for FSETP was 58 percent male and 42 percent female.

findings for training programs are subject to a great deal of uncertainty.

A. The Contemporary Benefit-Cost Approach

The accounting framework used today in conducting benefit-cost analyses of training programs was developed in the late 1960s by Einar Hardin and Borus (1969) and refined in the early 1980s by Peter Kemper, David Long, and Craig Thornton (1981). A very simple version of this framework is given in Table 5.

Only benefits and costs that are typically estimated are listed in Table 5. Dollar amounts are indicated in the table as the program effect on earnings (E), tax payments (T), welfare payments (W), and net costs (C). The plus and minus signs indicate whether each amount is expected to be a benefit (+)

or cost (-) from the perspectives of three groups: persons enrolled in the program, persons not enrolled in the program (including taxpayers, who pay the cost of operating the program), and the whole of society (enrollees plus nonenrollees). As indicated, benefits and costs to society are simply the alge- braic sum of benefits and costs to enrollees and nonenrollees. Hence, the framework implies that if a training program causes a decline in transfer payments received by program enrollees (for example, unemployment compensa- tion or welfare payments), then this decline should be regarded as a cost to enrollees (albeit one that may be offset by earnings increases); as a savings or benefit to taxpayers; and as neither a benefit nor a cost to society, but simply a transfer of income from one segment of society to another. One goal of train-




TABLE 4 (Cont.) EFFECTS OF MANDATORY TRAINING PROGRAMS ON EARNINGS OF INDIVIDUALS COVERED

BY THE PARTICIPATION REQUIREMENT BY DEMOGRAPHIC GROUP

Real Rate of Return Demographic Group Net Cost of Training If Mean Effect Lasts

and Program Per Enrollee (Num. of Studies) (Num. of Studies) 3 Years 10 Years

Adult Welfare Mena WIN-JS/WE (1) $1,120 (1) <0 <0 WIN-MIXED (1) $1,150 (1) 8% 8%d

JOBS (2) $2,149 (2) <0 <0 Adult Welfare Womenc

WIN-JS/WE (7) $412 (7) 91% 106% WIN-MIXED (2) $1,344 (2) 29% 53% JOBS (4) $1,936 (4) <0 19%

Food Stamp Recipients FSETP (1)d $173 (1) <0 <0

d The observed time pattern of effects indicates a rapid decline well before 10 years, making the 10-year rate of return similar to the 3-year rate of return. Notes: See Table 2. Key: See Table 3.

ing program benefit-cost analysis, as the last row of Table 5 suggests, is to determine whether the program being evaluated has a positive or negative payoff from each of the three perspectives rep- resented by the three columns. The societal perspective is usually viewed by economists as the appropriate one for assessing the efficiency of a training program. Policy makers, however, often focus on persons not enrolled in the program because the net effect on this group also determines whether the program increases or decreases government budgetary requirements.

The framework shown in Table 5 em- bodies several assumptions typically made in conducting benefit-cost analyses of training programs. A few of the more important of these assumptions are briefly discussed below.

1. Distributional Issues. In benefit-

cost analyses of training programs, dollars gained or lost by program enrollees are usually valued the same as dollars gained or lost by nonenrollees. En- rollees, however, have much lower incomes, on average, than do nonenrollees. The marginal utility of income, therefore, may differ between the two groups. This issue is relevant whenever a training program makes enrollees better off and nonenrollees worse off, or vice versa. For example, benefit-cost analyses of mandatory programs for welfare recipients indicate that some (but not all) of these programs reduce the incomes of enrollees but result in net gains for nonenrollee taxpayers (see Anthony Boardman et al. 1996, Table 14.1). A considerable literature exists concerning the possibility of treating this issue by giving each dollar of the gains and losses of relatively low-in-




TABLE 5 STYLIZED ACCOUNTING FRAMEWORK FOR TRAINING PROGRAM BENEFIT-COST ANALYSIS

Persons Persons Not Enrolled in the Enrolled in the Society

Variable Program Program (row sum)

Program effect on Earnings +E 0 +E Tax payments -T +T 0 Welfare payments -w +W 0

Net program operating costs 0 -C -C Net effect of program (column sum) ? ? ?

come persons greater weight in benefit- cost analyses than each dollar of the gains and losses of higher-income persons (see Boardman et al. 1996, ch. 14, and the references contained therein). Because the weights needed to do this are unknown, training program benefit- cost analyses do not explicitly treat the gains and losses of lower- and higher-income persons differently. Instead, as indicated above, benefits and costs are reported separately for program enrollees and nonenrollees so that policy makers can examine the effects on each group and apply their own subjective weights.

2. The Extrapolation Problem. As- sessing whether government training programs are socially efficient depends critically on how long earnings increases last for program enrollees.41 Program benefits will be smaller if earnings increases quickly fade (or "decay") and much larger if they last for the remainder of the enrollees' working lives. Unfortunately, data on earnings effects are usually limited to three years or less. Benefit-cost analysts must then project earnings effects

into the future without any firm empirical basis. Often a "sensitivity analysis" is conducted by making several different assumptions about the long- run time path of earnings effects. Such an analysis can illustrate the magnitude of uncertainty but does not diminish it.

3. Intangible Effects. Intangible effects include the value of leisure for- gone by program enrollees and the value of satisfaction gained by both enrollees and nonenrollees from the substitution of earnings for welfare payments. Because they are difficult to measure, intangible effects are rarely assigned a value in evaluations of government training programs. This practice may be conceptually unsound. For example, if a training program causes a program participant to work more, the individual's real net gain is not his or her financial gain (as implied by the usual training program benefit-cost framework) but rather his or her increase in utility. The few attempts that have been made to impute the real net gain rather than the financial gain (Bell and Orr 1994; Greenberg 1997), suggest that the difference between the financial gain and the real net gain can be appreciable.

The issue of unobserved increases in utility arises in measuring the bene-

41 Because the benefits of training programs typically occur after the costs, discounting is necessary. There is, of course, some controversy among economists concerning the appropriate social discount rate.




fits of government training programs for nonenrollees. No attempt has ever been made to elicit taxpayers' will- ingness to pay for the substitution of work for transfer payments induced by training programs. One approach that could be used for this purpose is contingent valuation, which utilizes surveys to attempt to measure will- ingness to pay for changes in the quan- tity and quality of goods not exchanged in markets. (See Richard Bishop and Thomas Heberlein 1990, for an overview.) Considerable controversy sur- rounds the validity of contingent valuation, however (Jerry Hausman 1993).

B. General Equilibrium Effects

Government training programs may have important effects on the behavior and well-being of some persons not enrolled in a program. These effects are almost never taken into account in training program benefit-cost analyses. Two such effects are entry effects and displacement effects. Empirical evidence about the magnitude of both of these effects is quite limited. Our assessment of the theoretical arguments is that the importance of entry effects is somewhat speculative, whereas displacement could substantially undercut the social benefits of government training programs, reducing them well below the benefits measured in a typical benefit-cost analysis.

1. Entry and Deterrent Effects. If training program services are perceived as beneficial, some persons who are initially ineligible to participate may leave their jobs in order to qualify (an "entry" effect). On the other hand, in the case of mandatory programs for welfare recipients, some individuals who might otherwise have entered the welfare rolls may decide not to do so to avoid the "hassle" of participating (a

"deterrent" effect).42 Manski and Gar- finkel (1992a) and Moffitt (1992, 1996), among others, argue that program entry effects or deterrent effects could be substantial.

Findings on entry effects are available from five aggregate-level time series studies that examine how training programs affect applications for welfare. However, the value of the empirical results is reduced by their sensitivity to model specification changes.43 The results, in our view, are inconclusive. All five studies compare welfare application rates in sites that have a training program for welfare recipients with application rates in sites that do not have training. Three of the five studies are consistent with expectations that voluntary programs for welfare recipients en- courage entry onto the welfare rolls and mandatory programs discourage entry. Of the two studies of voluntary programs, T. Johnson, Daniel Klepinger, and Fred Dong (1990) find that a voluntary program in Oregon had a positive entry effect, but Wissoker and Harold Watts (1994) do not find a positive entry effect for a voluntary program in the state of Washington. Of the three studies of mandatory programs, two indicate, as anticipated, that entry effects were negative (Fisher Chang 1996; and Eliza- beth Phillips 1993), while the remaining study finds no evidence of a negative entry effect along with a much larger in-

42This is conceptually similar to welfare recipients leaving the rolls when they are informed that they will be subject to newly established mandatory work or training requirements. Such "exit effects"occurred (but were not separately identified) under the experimental evaluations of mandatory pro grams discussed earlier.

UProgram effects on overall caseload levels may be due not only to changes in the number of applications, but also to changes in the fraction of applications accepted onto the welfare rolls and to changes in the number of exits from the rolls. Only changes in the number of applicants represent an entry effect.




crease in exits (Bradley Schiller; and C. Nielsen Brasher 1993).44

2. Displacement. Training program graduates may end up in jobs that otherwise would have been held by individuals not in the program (G. Johnson 1979). If these displaced individuals become unemployed or accept lower-wage jobs, their earnings will fall. The social effect of training programs on employment and earnings will therefore be less than effects for program graduates. Despite these potential adverse effects, there is virtually no research quantifying the magnitude of displacement caused by training programs for the economically disadvantaged.45

Several arguments have been put forward to suggest that displacement may not seriously undermine training program effectiveness. First, macroeconomic policy may be able to expand employment enough to absorb new training program graduates and thereby prevent displacement. Second, as Co- hen (1969) and G. Johnson (1979) point out, if training program participants are less likely to seek employment while they are in training than they otherwise would have been, then more jobs will be open to nonparticipants, at least temporarily. Third, as emphasized by G. Johnson (1979) and

Katz (1994), if training programs can impart skills that allow trainees to leave slack occupational labor markets for tight ones, then they can decrease the competition for job vacancies in the slack markets, making it easier for those who remain in these markets to find jobs. Such a possibility could produce a result that is the exact opposite of a displacement effect: total employment could increase by more than the number of persons who are trained.

VIII. Conclusions and Agenda for Future Research

Evaluations of government training programs for the economically disadvantaged have yielded important information about the effectiveness of such programs. Nonetheless, some uncertainty remains about the returns to social expenditures on existing programs, and large open questions persist about strategies for improving effectiveness in the future.

A. What We Know

Most of what we know about training programs concerns their costs and their short term financial effects on persons enrolled in them. The most optimistic findings are for adult women. Nearly every evaluation of training programs for this group has found positive earnings gains, and most of the estimates have been statistically significant. Al- though these gains may appear modest in absolute terms, the public investment in these programs is also modest. The implied social rate of return on the resources expended on these programs is, in fact, sometimes quite high, and continued public funding seems war- ranted on this basis. These favorable results for women hold not only for voluntary programs but also for mandatory

44In addition to the studies cited in the text, Moffitt (1996) illustrated entry effects for voluntary and mandatory programs using a microsimula- tion model. Moffitt's analysis suggests that a mandatory program for AFDC recipients with a heavy participation time requirement would reduce entry into welfare, but a voluntary program would increase entry. Much of the latter effect results from an assumed reduction in the stigma attached to welfare receipt.

45 During the 1970s, there were a number of empirical studies of displacement. These studies focused almost exclusively on the extent to which unemployed workers absorbed by public sector job-creation programs displaced regular government workers and provide little insight into displacement in the private sector that would result from training programs.




programs, in which adult women on welfare are the principal target population. The earnings increases to individuals required to participate are, however, often partially offset by decreased welfare benefits, reducing the net effect on income.

The full set of evaluation findings for adult men leaves more uncertainty. Nevertheless, the estimated effects for adult men in the experimental evaluation of the current national voluntary training system, JTPA, are encouraging. The few results for adult men receiving welfare in mandatory programs show less positive earnings effects.

Evaluation findings for youth are of special interest, because youth experience a relatively high degree of labor market difficulty and because permanent increases in earning power for youth could potentially provide returns to social investments over a relatively long post-program working life. With the possible exception of the Job Corps, however, no training program has been found effective for youth. The evidence for the Job Corps, which has by far the greatest per-participant cost of any currently operating government training program, is mixed. Findings from a nationally based experimental evaluation of the Job Corps, currently under way, will be crucial in resolving uncertainties about the program's effectiveness. Negative results from this evaluation would rein- force the serious questions about the efficacy of continued expenditures on traditional kinds of training programs for young persons, at least insofar as more favorable labor market outcomes are the program objective. Even positive results will not diminish the need to develop new and less expensive strategies that will work effectively for large numbers of youth, because the high cost of Job Corps limits the

size of the population that can be served.46

Even if some training programs have substantial positive effects on the earnings of participants, their aggregate effects would appear to be quite modest. It seems clear that the aggregate effects of JTPA are minimal, both on the le- gally defined target population and on the labor force as a whole. Program budgets are simply too small to permit JTPA to reach much of its target population (cf. Heckman, Roselius, and Smith 1994). The mandatory programs operated under JOBS may have produced some reductions in aggregate welfare receipt in localities that have moved aggressively to expand program coverage, but contributions to reduced poverty almost certainly have been slight. At the same time, evaluation research has failed to provide a solid empirical basis for expecting that increased funding would not run up against sharply decreasing returns. One program feature determined by funding is scale-the number of participants en- gaging in program activities. For voluntary programs, discussions about the effects of increasing total participation must be viewed as fundamentally speculative, even for programs and target groups for which effectiveness at current spending and participation levels has been demonstrated. For mandatory programs, there is some empirical evidence indicating that recently legislated increases in the share of the welfare population covered by participation requirements have increased the total program effect on earnings and welfare receipt without dramatically increasing per-enrollee costs. Evidence has been inconsistent, however, regarding additional earnings effects obtained from

46 Possible strategies for improving the economic circumstances of youth are discussed in Orr et al. (1996, pp. 216-31 passim).




working with greater numbers of more disadvantaged individuals within a defined mandatory population.

The second program feature determined by funding is the intensity and duration of skills-building activities in which participants engage. It is these activities-classroom and on-the-job training rather than the lower-cost personal counseling, job-search assistance, and direct placement-that are looked to for lasting improvements in individual productivity and earning power. The most heartening empirical findings with respect to skills development are the long-term earnings effects found recently in analyses of extended follow-up data for a few skills-oriented programs. There remains, however, a lack of com- pelling evidence that skills-building activities have actually enhanced skills that are of value to employers or have accounted for a dominant share of program earnings effects relative to the lower-cost activities bundled with them. Indeed, evaluation results for both voluntary and mandatory programs heighten rather than allay concerns about the cost-effectiveness of more expensive program components and suggest that the fine details of program organization, in ways as yet poorly understood, may be critical in determining the success of skills-building efforts.

B. What We Need to Know

Several questions need to be addressed in future evaluations of government training programs for the economically disadvantaged. Overarching all is this question: How and by how much can the aggregate effects of government training programs be increased in a cost-effective manner? The question has three priority subtopics: youth, scale, and skills. Clearly, one top priority is to find cost-effective techniques for working with disadvantaged out-of-

school youth. A second high priority is to answer questions pertaining to the cost-effectiveness of increasing program scale: Can total participation in voluntary programs be increased at a reason- able cost? Can targeting strategies be developed to limit inefficient substitution of program for nonprogram training? If so, can the effectiveness of existing service mixes be maintained over a broader, probably more disadvantaged population? Third, can additional intensive, skills-building activities be orga- nized in a fashion that will produce lasting earnings effects large enough to make their use cost-effective at large scale, especially for the more disadvantaged? At a somewhat lower priority, we ask: To what extent does displacement offset the measured employment and earnings effects? This issue, although recognized for nearly three decades to be potentially important, has yet to be tackled in any meaningful way. Finally, entry effects deserve some research attention as a potential threat to program effectiveness.

C. Agenda for Future Evaluations

The prospects for original and useful work by economists in addressing these questions, we believe, are significant. Some of this work must occur outside the context of specific evaluations. At the most basic level, economists need to do additional work toward developing a fully fledged economic theory of evaluation, a theory that would provide better guidance about the most valuable kinds of information that could be generated from evaluations. Theoretical work also needs to be done to define better the optimal role of government training programs for the economically disadvantaged in relation to academic and vocational schooling and to training provided by employers on the job. In addition, we suspect that some immedi-




ate progress may be made on the issue of displacement by reviewing existing labor market literature on substitution across grades of labor, both to provide empirical information and to formulate hypotheses that could be tested in future evaluations. Beyond that, only the evaluation of programs implemented at maximum scale within localities appears to us to be capable of yielding estimates of market-wide displacement effects. Finally, entry effects resulting from training programs targeted at welfare recipients can continue to be studied by collecting time-series data on welfare application rates for sites that adopt large-scale intensive programs. Although studies based on aggregate data are often problematic owing to the difficulty of controlling for other factors that influence application rates, they are inexpensive to conduct. Only a few such studies presently exist, and more should be undertaken. In addition, some progress is being made in designing field experiments capable of measuring entry effects using individual-level data (Card, Robins, and Winston Lin 1997).

In planning future program evaluations, the emphasis should be on addressing the above-mentioned issues of youth, scale, and skills. Toward that end, we believe it important to continue the trend away from traditional "black box" evaluations that yield only a summary estimate of program effectiveness. Instead, evaluation resources should focus on improving training technique. Closer study of training program technique means looking at those aspects of training associated with success or failure, given the level of funding. Ways of working more effectively with youth, ways of increasing program participation without decreasing average earnings gains, and ways of organizing skills- building activities to get the most out of them-these should all be the subjects

of studies of program technique. Future evaluation designs should make greater use of direct comparisons of competing candidates for best program practice. One method for testing the effectiveness of alternative "service strategies" was used in the JTPA evaluation, which estimated experimental effects separately for three clusters of activities by randomizing sample members after program intake staff had recommended them for specific services. Another method, employed in some JOBS evaluation sites, was to randomly assign all program enrollees, regardless of program staff preferences, to either a rapid employment program approach or to an approach that aimed for long-term skills development. Finally, some theoretical work has been undertaken to develop feasible designs for comparing program approaches across randomly assigned local offices (Greenberg, Meyer, and Wiseman 1993). These research designs do not come free of conceptual difficulties, however, and they also present serious practical challenges, not the least of which is simply maintaining the distinctiveness of the competing service strategies and assuring high participation rates in the activities of interest.

Under any of these designs, training techniques demonstrated to be highly effective in the study sites can be repli- cated and evaluated at additional sites to determine whether the original favorable results are generalizable. Some efforts along these lines are currently taking place, such as the replication and evaluation in Los Angeles of a successful welfare-to-work program tested earlier in Riverside, California (Riccio, Friedlander, and Freedman 1994) and a multi-site replication and evaluation for youth of the San Jose CET program that produced earnings gains in MFSP and JOBSTART. There is, however, a need for further work in integrating ex-




perimental and nonexperimental methods. Nonexperimental methods could be an important adjunct to random assignment in determining the path through which training programs influence earnings, such as through changes in educational attainment. They could also be important for increasing our understanding of the determinants and consequences of population participation rates and of substitution of activities across government and nongovern- ment providers, across periods of an individual's life cycle, and across epi- sodes of nonemployment and employment.

Studying training technique will require additional and more detailed data than have been collected in most previous evaluations. Clearly, to assess intensive skills-development activities, the focus must shift from short- to long- term earnings effects. As a partial substitute for long-term results, however, greater attention should be paid to measuring program effects on the skills demands of jobs entered and on hourly wage rates and other terms of employment, including prospects for future on-the-job training and wage growth, which, in theory, should be improved by skills upgrading. More detail will be required in describing the nature of the training activities and the behavior of participants who engage in them, both to document that the prescribed training was received and to serve as a basis for replicating approaches that prove successful. Special consideration will have to be paid to measuring substitution of services provided by the program being evaluated for training services available elsewhere in the same locality. Efforts will have to be directed toward determining whether skills have actually been acquired, to what extent, and whether the acquired skills are valued by employers. To achieve these

measurement objectives, increasing research outlays on in-classroom observation and on relatively expensive surveys and pre-and post-program skills tests for study participants would appear un- avoidable. Without detailed knowledge about the nature of the training, how it was administered, who received it, what activities it replaced, and whether it actually increased the skill level and productivity of the trainee, it will be difficult to draw firm conclusions about the relative effects of different program activities.

An important issue in studying training technique involves the degree of control that evaluators should have over the program being evaluated. In a typical training program evaluation, state or local administrators usually choose the array of services to be offered and, perhaps with input from participants, decide who receives each service. The evaluator then attempts to measure the effect of the program as implemented. Such an approach can hinder the ability to study new program techniques. Inno- vations in technique are, by their nature, often difficult to find in practice and must sometimes be set up specifically for the purpose of study. In addition, when evaluators control the services for which each program participant is eligible, they are better able to maintain the distinctiveness of alternative service streams and to determine the differential effect of several service combinations. Greater control over the programs to be tested, however, can often be gained only when research budgets include substantial resources to compensate local agencies for changes in their program operations.

Were additional research funding available, it could be devoted to investi- gating various hypotheses advanced about group dynamics in training. These hypotheses stem from the nas-




cent literature on potential "community effects" and concern influences on motivation coming from classroom peers and from the participant's social and community context. Economists have been responsible for most major evaluations of training programs, but they do not have special expertise in these moti- vational issues. In this area, then, economists might find it fruitful to work more closely with measurement experts in sociology, psychology, education, and urban anthropology.

REFERENCES

ANDERSON, KATHRYN H.; BURKHAUSER, RICH- ARD V. AND RAYMOND, JENNIE E. "The Effect of Creaming on Placement Rates under the job Training Partnership Act," Ind. Lab. Relat. Rev., July 1993, 46(4), pp. 613-24.

ANGRIST, JOSHUA D. AND IMBENS, GUIDO W. "Sources of Identifying Information in Evalu- ation Models." NBER Technical Working Paper 117, Dec. 1991.

ANGRIST, JOSHUA D.; IMBENS, GUIDO W. AND RUBIN, DONALD B. "Identification of Causal Effects Using Instrumental Variables," J. Amer. Statist. Assoc., June 1996, 91(434), pp. 444-55.

ASHENFELTER, ORLEY C. "Estimating the Effect of Training Programs on Earnings," Rev. Econ. Statist., Feb. 1978, 60(1), pp. 47-57.

ASHENFELTER, ORLEY AND CARD, DAVID. "Using the Longitudinal Structure of Earnings to Esti- mate the Effect of Training Programs," Rev. Econ. Statist., Nov. 1985, 67(4), pp. 648-60.

AUSPOS, PATRICIA; CAVE, GEORGE AND LONG, DAVID. Maine: Final report on the training opportunities in the private sector program. New York. Manpower Demonstration Research Cor- poration, 1988.

BARNOW, BURT S. "The Impact of CETA Pro- grams on Earnings: A Review of the Litera- ture," J. Human Res., Spring 1987, 22(2), pp. 157-93.

. "Government Training as a Means of Re- ducing Unemployment," in Rethinking employment policy. Eds.: D. LEE BAWDEN AND FELICITY SKIDMORE. Washington, DC: Urban Institute Press, 1989, pp. 109-35.

BARNOW, BURT S.; CAIN, GLEN G. AND GOLD- BERGER, ARTHUR S. "Issues in the Analysis of Selectivity Bias," Evaluation studies review annual. Vol. 5. Edited by ERNST W. STROMSDORFER AND GEORGE FARKAS. Bev- erly Hills, CA and London: Sage Pub., 1980, pp. 43-59.

BASSI, LAURIE J. "The Effect of CETA on the Postprogram Earnings of Participants," J. Hu- man Res., Fall 1983, 18(4), pp. 539-56.

. "Estimating the Effect of Training Pro- grams with Non-Random Selection," Rev. Econ. Statist. Feb. 1984, 66(1), p. 36-43.

- . "Estimating the Effect of Job Training Programs, Using Longitudinal Data: Ashenfel- ter's Findings Reconsidered: A Comment," J. Human Res., Spring 1987, 22(2), pp. 300-03.

BASSI, LAURIE J. ET AL. Measuring the effect of CETA on youth and the economically disadvantaged. Final Report prepared for the U.S. Department of Labor under Contract No. 20- 11-82-19. Washington, DC: Urban Institute, 1984.

BELL, STEPHEN H. AND ORR, LARRY L. "Is Subsi- dized Employment Cost Effective for Welfare Recipients? Experimental Evidence from Seven State Demonstrations," J. Human Res., Winter 1994, 19(1) , pp. 42-61.

BELL, STEPHEN H. ET AL. Program applicants as a comparison group in evaluating training programs. Kalamazoo, MI: W.E. Upjohn Institute or Employment Research, 1995.

BETSEY, CHARLES L.; HOLLISTER, ROBINSON G., JR., AND PAPAGEORGIOU, MARY R., eds. Youth employment and training programs: the YEDPA years. Washington, DC: National Academy Press, 1985.

BISHOP, RICHARD C. AND HEBERLEIN, THOMAS A. "The Contingent Valuation Method," in Eco- nomic valuation of natural resources: Issues, theory, and applications. Eds.: REBECCA L. JOHNSON AND GARY V. JOHNSON,. Boulder, CO: Westview Press, 1990, pp. 81-104.

BJORKLUND, ANDERS AND MOFFITT, ROBERT. "The Estimation of Wage Gains and Welfare Gains in Self-selection Models," Rev. Econ. Sta- tist., Feb. 1987, 69(1), pp. 42-49.

BLOMQuIST, JOHN D. The Ohio Transitions to In- dependence Demonstration: Report on program costs and benefits. Bethesda, MD: Abt Assoc., Inc., 1994.

BLOOM, HOWARD S. "Accounting for No-Shows in Experimental Evaluation Designs," Evaluation Review, Apr. 1984a, 8(2), pp. 225-46.

. "Estimating the Effect of Job-Training Programs, Using Longitudinal Data: Ashenfel- ter's Findings Reconsidered," J. Human Res., Fall, 1984b, 19(4), pp 544-56.

. "What Works For Whom?: CETA Impacts for Adult Participants," Evaluation Review, Aug. 1987, 11(4), pp. 510-27.

BLOOM, HOWARD S. ET AL. "The Benefits and Costs of JTPA Title II-A Programs: Key Find- ings From the National JTPA Study," J. Human Res., Summer 1997, 32(3), pp. 549-76.

BOARDMAN, ANTHONY E. ET AL. Cost-benefit analysis: Concepts and practice. Upper Saddle River, NJ: Prentice Hall, 1996.

BORUS, MICHAEL E. "A Benefit-Cost Analysis of the Economic Effectiveness of Retraining the Unemployed," Yale Econ. Essays, Fall 1964,4(2), pp. 371-429.

BRYANT, EDWARD C. AND RuPP, KALMAN. "Evaluating the Impact of CETA on Participant




Earnings," Evaluation Review, Aug. 1987, 11(4), pp. 473-92.

BURGHARDT, JOHN ET AL. Evaluation of the minority female single parent demonstration. Vol. I, Summary Report. New York: The Rockefeller Foundation, Oct. 1992.

BURTLESS, GARY. "The Case for Randomized Field Trials in Economic and Policy Research," J. Econ. Perspectives, Spring 1995, 9(2), pp. 63- 84.

BURTLESS, GARY AND ORR, LARRY L. "Are Classi- cal Experiments Needed for Manpower Pol- icy?" J. Human Res., Fall 1986, 21(4), pp. 606- 39.

CAIN, CLEN G. "Benefit-Cost Estimates for Job Corps." Discussion Paper No. 9-68, Institute for Research on Poverty, U. of Wisconsin, Madison, Sept. 1968.

-. "Regression and Selection Models to Im- prove Nonexperimental Comparisons," in Eval- uation and experiment. Eds.: CARL A. BEN- NETT AND ARTHUR A. LUMSDAINE. New York: Academic Press, 1975, pp. 297-317.

CARD, DAVID E.; ROBINS, PHILIP K. AND LIN, WINSTON. How important are 'entry effects' in financial incentive programs for welfare recipients. Ottawa, Ont.: Social Research and Dem- onstration Corporation, Aug. 1997.

CARD, DAVID AND SULLIVAN, DANIEL G. "Meas- uring the Effect of Subsidized Training Pro- grams on Movements in and out of Employ- ment," Econometrica, May 1988, 56(3), pp. 497-530.

CAVE, GEORGE ET AL. JOBSTART: Final report on a program for school dropouts. New York, Manpower Demonstration Research Corpora- tion, Oct. 1993.

CHANG, FISHER. "Evaluating the Impact of Man- datory Work Programs on Two-Parent Welfare Caseloads." Unpublished doctoral dissertation. Baltimore: U. of Maryland Baltimore County, 1996.

CLEMENTS, NANCY; HECKMAN, JAMES AND SMITH, JEFFREY. "Making the Most Out of So- cial Experiments: Reducing the Intrinsic Un- certainty in Evidence from Randomized Trials with an Application to the National JTPA Ex- periment." National Bureau of Economic Re- search Technical Paper 149, Jan. 1994.

COHEN, MALCOLM S. "The Direct Effects of Fed- eral Manpower Programs in Reducing Unem- ployment," J. Human Res., Fall 1969, 4(1), pp. 491-507.

CONLISK, JOHN. "Choice of Sample Size in Evalu- ating Manpower Programs: Comment on Pitcher an d Stafford," in Research in Labor Economics. Ed.: FARRELL E. BLOCH. Supple- ment 1, 1979, pp. 79-96.

COOLEY, THOMAS M.; MCGUIRE, TIMOTHY W. AND PRESCOTT, EDWARD C. "Earnings and Employment Dynamics of Manpower Trainees: An Econometric Analysis," in Research in Labor Economics. Eds: FARRELL E. BLOCH. Supple- ment 1, 1979, pp. 119-47.

COUCH, KENNETH A. "New Evidence on the Long-Term Effects of Employment Training Programs," J. Lab. Econ. , Oct. 1992,10(4), pp. 380-88.

DEHEJIA, RAJEEV H. AND WAHBA, SADEK. "Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Pro- grams." Unpublished paper. Nov. 1995.

DICKINSON, KATHERINE P.; JOHNSON, TERRY R. AND WEST, RICHARD W. An analysis of the impact of CETA programs on components of earnings and other out-comes. Final Report prepared for the U.S. Department of Labor, Employment and Training Administration, Menlo Park, CA: SRI International, Nov. 1984.

. "An Analysis of the Impact of CETA Pro- grams on Participants' Earnings," J. Human Res., Winter 1986, 21(1), pp. 64-91.

". The Impact of CETA Programs on Com- ponents of Participants' Earnings," Ind. Lab. Relat. Rev., Apr. 1987a, 40(3), pp. 430-41.

. "An Analysis of the Sensitivity of Quasi- Experimental Net Impact Estimates of CETA Programs," Evaluation Review, Aug. 1987b, 11(4), pp. 452-72.

FEIN, DAVID J.; BEECROFT, ERIK AND BLOMQUIST, JOHN D. The Ohio Transitions to Independence Demonstration: Final impactsfor JOBS and Work Choice. Bethesda, MD: Abt As- sociates, Inc., 1994.

FINIFTER, DAVID H. "An Approach to Estimating Net Earnings Impact of Federally Subsidized Employment and Training Programs, Evalu- ation Review, Aug. 1987, 11(4), pp- 528-47.

FRAKER, THOMAS AND MAYNARD, REBECCA. "The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Pro- grams," J. Human Res., Spring 1987, 22(2), pp. 194-227.

FREEDMAN, STEPHEN; BRYANT, JAN AND CAVE, GEORGE. New Jersey: Final report on the grant diversion project. New York: Manpower Dem- onstration Research Corporation, 1988.

FREEDMAN, STEPHEN ET AL. The GAIN evaluation: Five-year impacts on employment, earnings, and AFDC receipt. New York: Manpower Demonstration Research Corporation, July 1996.

FRIEDLANDER, DANIEL. Maryland: Supplemental report on the Baltimore Options Program. New York: Manpower Demonstration Research Cor- poration, 1987.

. "Subgroup Impacts of Large-Scale Wel- fare Employment Programs," Rev. Econ. Sta- tist., Feb. 1993, 75(1), pp. 138-43.

FRIEDLANDER, DANIEL AND BURTLESS, GARY. Five years after: The long-term effects of welfare-to-work programs. New York: Russell Sage Foundation, 1995.

FRIEDLANDER, DANIEL AND GUERON, JUDITH M. "Are High-Cost Services More Effective Than Low-Cost Services?" in CHARLES F. MAN- SKI AND IRWIN GARFINKEL, eds. 1992, pp. 143-98.




FRIEDLANDER, DANIEL AND HAMILTON. GAYLE. The Saturation Work Initiative Model in San Diego: A five-year follow-up study. New York: Manpower Demonstration Research Corpora- tion, 1993.

"The Impact of a Continuous Participation Obligation in a Welfare Employment Program," J. Human Res., Fall 1996, 31(4), pp. 734-56.

FRIEDLANDER, DANIEL AND MARTINSON, KARIN. "Effects of Mandatory Basic Education for Adult AFDC Recipients," Educational Evaluation and Policy Analysis, Winter 1996, 18(4), pp. 327-37.

FRIEDLANDER, DANIEL AND ROBINS, PHILIP K. "Evaluating Program Evaluations: New Evi- dence on Commonly Used Nonexperimental Methods," Amer. Econ. Rev., Sept. 1995, 85(4), pp. 923-37.

. "The Distributional Impacts of Social Pro- grams," Evaluation Review, Oct. 1997, 21(5), pp. 531-53.

FRIEDLANDER, DANIEL ET AL. Arkansas: Final report on the WORK program in two counties. New York: Manpower Demonstration Research Corporation, Sept. 1985.

West Virginia: Final report on the Com- munity Work Experience Demonstrations. New York: Manpower Demonstration Research Cor- poration, 1986.

. Illinois: Final report on job search and work experience in Cook County. New York: Manpower Demonstration Research Corpora- tion, 1987.

GARFINKEL, IRWIN; MANSKI, CHARLES F. AND MICHALOPOULOS, CHARLES. "Micro Experi- ments and Macro Effects," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 253-73.

GAY, ROBERT S. AND BORUS, MICHAEL E. "Vali- dating Performance Indicators for Employment and Training Programs," J. Human Res., Winter 1980,15(1), pp. 29-48.

GOLDBERGER, ARTHUR S. "Selection Bias in Evaluating Treatment Effects." Discussion Pa- per 123-72, Institute for Research on Poverty, U. of Wisconsin, Madison, 1972.

GOLDMAN, BARBARA S. Impacts of the immediate job search assistance experiment. New York: Manpower Demonstration Research Corpora- tion, 1981.

GOLDMAN, BARBARA; FRIEDLANDER, DANIEL AND LONG, DAVID. California: Final report on the San Diego job search and work experience demonstration. New York: Manpower Demon- stration Research Corporation, 1986.

GREENBERG, DAVID. "The Leisure Bias in Cost- Benefit Analyses of Employment and Training Programs," J. Human Res., Spring 1997, 32(2), pp. 413-39.

GREENBERG, DAVID; MEYER, ROBERT H. AND WISEMAN, MICHAEL. "Prying the Lid from the Black Box: Plotting Evaluation Strategy for Welfare Employment and Training Programs." Discussion Paper 989-93. Madison, WI: U. of

Wisconsin Institute for Research on Poverty, 1993.

. "Multisite Employment and Training Evaluations: A Tale of Three Studies," Ind Lab. Relat. Rev., July 1994, 47(4), pp. 679-91.

GUERON, JUDITH M. AND PAULY, EDWARD. From welfare to work. New York: Russell Sage Foundation, 1991.

HAM, JOHN C. AND LALONDE, ROBERT J. "Using Social Experiments to Estimate the Effect of Training on Transition Rates," in Panel data and labor market studies. Eds.: Joop HARTOG, GEERT RIDDER, AND JULES THEEUWES. North Holland: Elsevier Science Pub., 1990, pp. 157- 72

. "The Effect of Sample Selection and Ini- tial Conditions in Duration Models: Evidence from Experimental Data on Training," Econo- metrica, Jan. 1996, 64(1), pp. 175-205.

HAMERMESH, DANIEL S. "The Secondary Effects of Manpower Programs," Econ. Bus. Bull., Spring-Summer 1972, 24(3), pp. 18-26.

HAMILTON, GAYLE. Interim report on the Satura- tion Work Initiative Model in San Diego. New York: Manpower Demonstration Research Cor- poration, Aug. 1988.

HAMILTON, GAYLE AND FRIEDLANDER, DANIEL. Final report on the Saturation Work Initiative Model in San Diego. New York: Manpower Demonstration Research Corporation, Nov. 1989.

HAMILTON, GAYLE ET AL. Evaluating two welfare-to-work approaches: Two-year findings on the labor force attachment and human capital development programs in three sites. New York: Manpower Demonstration Research Corpora- tion, forthcoming.

HARDIN, EINAR AND BORUS, MICHAEL E. Eco- nomic benefits and costs of retraining courses in Michigan. East Lansing, MI: School of Labor and Industrial Relations, Michigan State U., Dec. 1969.

HAUSMAN, JERRY A., ed. Contingent evaluation: A critical assessment, New York: North-Holland, 1993.

HECKMAN, JAMES J. "Dummy Endogenous Vari- ables in a Simultaneous Equation System," Econometrica, July 1978, 46(3), pp. 931-59.

- . "Randomization and Social Policy Evalu- ation," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 201-30.

. "Instrumental Variables: A Study of Im- plicit Behavioral Assumptions in One Widely- Used Estimator." Unpublished manuscript, Jan. 18, 1996.

HECKMAN, JAMES J. AND HOTZ, V. JOSEPH. "Choosing among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs," J. Amer. Statist. Assoc., Dec. 1989, 84(408), pp. 862-74.

HECKMAN, JAMES J. AND ROBB, RICHARD, JR. "Alternative Methods for Evaluating the Impact of Interventions: An Overview," J. Econo- metrics, Oct./Nov. 1985, 30(1-2), pp. 239-67.




HECKMAN, JAMES J.; ROSELIUS, REBECCA L. AND SMITH, JEFFREY A. "U.S. Education and Train- ing Policy: A Re-Evaluation of the Underlying Assumptions Behind the "New Consensus'," in Labor markets, employment policy, and job creation. Eds.: LEWIS C. SOLMON AND ALEC R. LEVENSON. Boulder, CO: Westview Press, 1994, pp.83-121

HECKMAN, JAMES J. AND SMITH, JEFFREY A. "As- sessing the Case for Social Experiments," J. Econ. Perspectives, Spring 1995, 9(2), pp. 85- 110.

HECKMAN, JAMES; SMITH, JEFFREY AND TABER, CHRISTOPHER. "Accounting for Dropouts in Evaluations of Social Experiments." Economic Research Center, NORC Discussion Paper 94/3 May 1994.

HECKMAN, JAMES J. ET AL. "Characterizing Selec- tion Bias Using Experimental Data," Econo- metrica, forthcoming.

HOLLISTER, ROBINSON G., JR. AND HILL, JEN- NIFER. 'Problems in the Evaluation of Commu- nity-Wide Initiatives," in New approaches to evaluating community initiatives: Concepts, methods, and contexts. Eds.: JAMES P. CON- NELL ET AL. Washington, DC: Aspen Institute, 1995, pp.127-72.

HOLLISTER, ROBINSON G., JR.; KEMPER, PETER AND MAYNARD, REBECCA, eds. The National Supported Work Demonstration. Madison: U. of Wisconsin Press, 1984.

HOTZ, V. JOSEPH. "Designing an Evaluation of the Job Training Partnership Act," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, PP. 76-114.

HOTZ, V. JOSEPH AND SANDERS, SETH G. "Bounding Treatment Effects in Controlled and Natural Experiments Subject to Post-Randomi- zation Treatment Choice." Population Research Center, NORC Discussion Paper 94/2, Mar. 1994.

JACOBSON, LOUIS S. ET AL. "The Returns to Classroom Training for Displaced Workers." Chicago: Federal Reserve Bank of Chicago Working Paper, Macroeconomic Issues WP 94- 27, Oct. 1994.

JOHNSON, GEORGE. "The Labor Market Displace- ment Effect in the Analysis of the Net Impact of Manpower Training Programs," in Research in Labor Economics, Stupp[ement 1, 1979, pp. 227-54.

JOHNSON, TERRY R.; KLEPINGER, DANIEL H. AND DONG, FRED B. "Preliminary Evidence from the Oregon Welfare Reform Demonstra- tion." Unpublished paper, June 1990.

KATZ, LAWRENCE F. "Active Labor Market Poli- cies to Expand Employment Opportunity," in Reducing unemployment: Current issues and policy options. Proceedings from a Symposium sponsored by the Federal Reserve Bank of Kan- sas City, Jackson Hole, Wyoming, Aug. 1994,

pp.239-90. KEMPER, PETER; LONG, DAVID A. AND THORN-

TON, CRAIG. The supported work evaluation:

Final benefit-cost analysis. New York: Man- power Demonstration Research Corporation, 1981.

KEMPLE, JAMES J.; FRIEDLANDER, DANIEL AND FELLERATH, VERONICA. Florida's Project Inde- pendence: Benefits, costs, and two-year impacts of Florida's Jobs program. New York; Man- power Demonstration Research Corporation, Apr. 1995.

KIEFER, NICHOLAS M. "Federally Subsidized Oc- cupational Training and the Employment and Earnings of Male Trainees," J. Econometrics, Aug. 1978, 8(1), pp. 111-25.

.."The Economic Benefits from Four Gov- ernment Training Programs," in Research in Labor Economics, Supplement 1. Ed.: FARRELL E. BLOCH. 1979, pp. 159-86.

LALONDE, ROBERT J. "Evaluating the Econo- metric Evaluations of Training Programs with Experimental Data," Amer. Econ. Rev., Sept. 1986, 76(4), pp. 604-20.

- . "The Promise of Public Sector-Sponsored Training Programs. J. Econ. Perspectives, Spring 1995, 9(2), pp. 149-68.

LALONDE, ROBERT J. AND MAYNARD, REBECCA. "How Precise are Evaluations of Employment and Training Programs?: Evidence from Field Experiment," Evaluation Review, Aug. 1987, 11(4), pp. 428-51.

LONG, SHARON K. AND WISSOKER, DOUGLAS. "Welfare Reform at Three Years: The Case of Washington State's Family Independence Pro- gram," J. Human Res., Fall 1995, 30(4), pp. 766-90.

MALLAR, CHARLES D. "Alternative Econometric Procedures for Program Evaluations: Illustra- tions from an Evaluation of Job Corps," Ameri- can Statistical Association, Proceedings of the Business and Economics Statistics Section, 1979, pp. 317-21.

MALLAR, CHARLES ET AL. Evaluation of the economic impact of the Job Corps program: Third follow-up report. Princeton: Mathematica Pol- icy Research, Sept. 1982.

MANSKI, CHARLES F. "What Do Controlled Ex- periments Reveal About Outcomes When Treatments Vary?" Institute for Research on Poverty Discussion Paper # 1005-93. U. of Wis- consin-Madison, June 1993.

- . "Learning About Social Programs From Experiments With Random Assignment of Treatments," Institute for Research on Poverty Discussion Paper # 1061-95. U. of Wisconsin, Madison, Mar. 1995.

MANSKI, CHARLES F. AND GARFINKEL, IRWIN, eds. Evaluating welfare and training programs. Cambridge and London: Harvard U. Press, 1992a.

. Introduction," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds., 1992b, pp. 1-22.

MOFFITT, ROBERT A. "Program Evaluation with Nonexperimental Data," Evaluation Review, June 1991, 15(3), pp. 291-314.

- . "Evaluation Methods for Program Entry




Effects," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 231-52.

. "The Effect of Employment and Training Programs on Entry and Exit From the Welfare Caseload," J. Pol. Anal. Manage., Winter 1996, 15(1), pp. 32-50.

NIGHTINGALE, DEMETRA SMITH ET AL. Evalu- ation of the Massachusetts Employment and Training (ET) Program. Urban Institute Report 91-1. Washington, DC: Urban Institute Press, 1991.

ORR, LARRY L. ET AL. Does training for the disadvantaged work? Evidence from the National JTPA Study. Washington, DC: Urban Institute Press, 1996.

PERRY, CHARLES R. ET AL. The impact of government manpower programs in general, and on minorities and women. Philadelphia, PA: Indus- trial Research Unit, Wharton School, U. of Pennsylvania, 1975.

PHILLIPS, ELIZABETH H. "The Effect of Manda- tory Work and Training Programs on Welfare Entry: The Case of GAIN in California." Un- published Ph.D. dissertation. Madison: U. of Wisconsin, 1993.

PUMA, MICHAEL J. AND BURSTEIN, NANCY R. "The National Evaluation of the Food Stamp Employment and Training Program," J. Pol. Anal. Manage., Spring 1994, 13(2), pp. 311- 30.

QUINT, JANICE C. ET AL. New Chance: Interim findings on a comprehensive programfor disadvantaged young mothers and their children. New York: Manpower Demonstration Research Corporation, Sept. 1994.

RICCIO, JAMES; FRIEDLANDER, DANIEL AND FREEDMAN, STEPHEN. GAIN: Benefits, costs, and three-year impacts of a welfare-to-work program. New York: Manpower Demonstration Research Corporation, 1994.

RICCIo, JAMES ET AL. Final report on the Virginia Employment Services Program. New York: Manpower Demonstration Research Corpora- tion, Aug. 1986.

ROSENBAUM, PAUL AND RUBIN, DONALD B. "The Central Role of the Propensity Score in Ob- servational Studies for Causal Effects," Bio- metrika, Apr. 1983, 70(1), pp. 41-55.

RUBIN, DONALD B. "Matching to Remove Bias in Observational Studies," Biometrics, Mar. 1973, 29(1), pp. 159-83.

. "Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observation Studies," J. Amer. Statist. Assoc., June 1979, 74(366), pp. 318-28.

SANDELL, STEVEN H. AND RuPP, KALMAN. Who is served in JTPA programs: Patterns of participation and intergroup equity. Washington, DC: National Commission for Employment Policy, Feb. 1988.

SCHILLER, BRADLEY R. AND BRASHER, C. NIEL- SEN. "Effects of Workfare Saturation on AFDC Caseloads," Contemporary Policy Issues, Apr. 1993, 11(1), pp. 39-49.

STAFFORD, FRANK P. "A Decision Theoretic Ap- proach to the Evaluation of Training Programs," in Research in Labor Economics, Supplement 1. Ed.: FARRELL E. BLOCH. Greenwich, CT, 1979, pp. 9-35.

STROMSDORFER, ERNST W. "Determinants of Economic Success in Retraining the Unem- ployed: The West Virginia Experience," J. Hu- man Res., Spring 1968, 3(2), pp. 139-58.

STROMSDORFER, ERNST ET AL. Recommendations of the Job Training Longitudinal Survey Re- search Advisory Panel. Report prepared for the Office of Strategic Planning and Policy Devel- opment, Employment and Training Administra- tion. Washington, DC: U.S. Department of La- bor, 1985.

THISTLETHWAITE, DONALD L. AND CAMPBELL, DONALD T. "Regression-Discontinuity Analysis: An Alternative to the ex post facto Experiment," J. Educational Psychology, 1960, 51, pp. 309- 17.

U.S. GENERAL ACCOUNTING OFFICE. Multiple employment training programs: Information crosswalk on 163 employment training programs. Washington, DC: U.S. GPO., Feb. 14, 1995.

- . Job Training Partnership Act: Long-term earnings and employment outcomes. Washing- ton, DC: U.S. GPO, Mar. 1996.

WESTAT, INC. Continuous longitudinal manpower survey: Summary of net impact results. Report MEL 84-02 prepared for the U.S. Department of Labor under contract No. 23-24-75-07. Rockville, MD: Westat, Inc., Apr. 1984.

WISSOKER, DOUGLAS A. AND WATTS, HAROLD W. The impact of FIP on AFDC caseloads. Washington, DC: Urban Institute, June 1994.

WOLFHAGEN, CARL F. Job search strategies: Les- sons from the Louisville WIN laboratory. New York: Manpower Demonstration Research Cor- poration, 1983.

ZAMBROWSKI, AMY AND GORDON, ANNE. Evalu- ation of the Minority Female Single Parent Demonstration: Fifth-year impacts at CET. Princeton: Mathematica Policy Research, Inc., Dec. 1993.



Evaluating government training programs for the economically disadvantaged

Documents

Transcript of Evaluating government training programs for the economically disadvantaged