Evaluating government training programs for the economically disadvantaged

48
American Economic Association Evaluating Government Training Programs for the Economically Disadvantaged Author(s): Daniel Friedlander, David H. Greenberg and Philip K. Robins Source: Journal of Economic Literature, Vol. 35, No. 4 (Dec., 1997), pp. 1809-1855 Published by: American Economic Association Stable URL: http://www.jstor.org/stable/2729880 . Accessed: 11/11/2014 11:06 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Economic Literature. http://www.jstor.org This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AM All use subject to JSTOR Terms and Conditions

Transcript of Evaluating government training programs for the economically disadvantaged

American Economic Association

Evaluating Government Training Programs for the Economically DisadvantagedAuthor(s): Daniel Friedlander, David H. Greenberg and Philip K. RobinsSource: Journal of Economic Literature, Vol. 35, No. 4 (Dec., 1997), pp. 1809-1855Published by: American Economic AssociationStable URL: http://www.jstor.org/stable/2729880 .

Accessed: 11/11/2014 11:06

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof Economic Literature.

http://www.jstor.org

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Journal of Economic Literature Vol. XXXV (December 1997), pp. 1809-1855

Evaluating Government Training Programs for the Economically

Disadvantaged

DANIEL FRIEDLANDER MDRC

DAVID H. GREENBERG University of Maryland, Baltimore County

and

PHILIP K. ROBINS University of Miami

This paper has benefitted greatly from helpful comments on an earlier draft by Stephen Bell, Howard Bloom, David Card, Judith Gueron, Robert LaLonde, Robert Moffitt, Ernst Strornsdorfer, and three anonymnous referees. For support in the preparation and production of this paper, the authors gratefully acknowledge the funders of MDRC's Public Policy Out- reach project: the Ford Foundation, the Amnbrose Monell Foundation, the Alcoa Foundation, and the James Irvine Foundation. The findings and conclusions presented in this article do not necessarily represent the official positions or policies of the funders.

I. Introduction

IN AN EFFORT to increase the earn- ings of low-income individuals-that

is, poor or near poor persons-who have ended their formal education, fed- eral and state governments fund a num- ber of training programs. Since the 1960s, these programs have been seen as instruments for combating poverty. In- terest in them has heightened recently, fed by concerns about rising income inequality and falling real earnings among workers with limited skills. Such programs are also viewed as integral to recent efforts to reform a welfare svstem

that is widely perceived as discouraging work. 1

Evaluations of the effectiveness of government training programs for the economically disadvantaged have been accumulating for more than three dec- ades. Additionally, in the past decade, a rapidly expanding literature has focused on the methodology of training program evaluation. The time is ripe for collect-

1It is also sometimes argued that by better matching workers to job vacancies, government training programs can improve macroeconomic tradeo fs between inflation and unemployment (Malcom Cohen 1969; Daniel Hamermesh 1972; and Lawrence Katz 1994).

1809

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1810 Journal of Economic Literature, Vol. XXXV (December 1997)

ing and interpreting the empirical find- ings from this literature and for assess- ing the status of evaluation methodology.

The broadest generalization about the current knowledge of government train- ing programs for the disadvantaged is that they have produced modest posi- tive effects on employment and earn- ings for adult men and women that are roughly commensurate with the mod- est amounts of resources expended on them. The positive effects for adults are not large enough to produce ma- jor aggregate effects on employment and earnings among low-income target groups, and the programs have not made substantial inroads in reducing poverty, income inequality, or welfare use. Moreover, they have failed to pro- duce positive effects for youth.

In this article, we investigate the methodological foundations and empiri- cal support for this view, suggest possi- ble modifications of it, and identify potentially fruitful areas for future re- search by economists. We argue that, despite a large number of evaluations of government training programs and the development of a variety of sophisti- cated evaluation methods, considerable uncertainty remains about the kinds of training that work best, the effective- ness of training for certain demographic groups, and the appropriate policies for increasing aggregate program effects.

The remainder of this article is orga- nized as follows. Section II describes the activities sponsored by government training programs and the rationale be- hind government funding for them. A brief sketch of government training pro- grams for the economically disadvan- taged in the United States is presented in Section III, followed by a discussion of the theory of training program evalu- ation in Section IV. The next three sec- tions constitute the core of our essay: Sections V and VI cover methodology

and findings for voluntary and manda- tory training programs, respectively, and Section VII examines the broader implications of the findings for society. We conclude in Section VIII with our assessment of what we do and do not know from previous training program evaluations, and offer an agenda for fu- ture research.

II. Training Program Activities and Economic Rationale

Although the mix of activities in gov- ernment training programs changes over time and differs from one program to the next, all training programs in- clude one or more of the following: re- medial education in reading and math, vocational training in specific occupa- tional skills, subsidies paid to private sector employers to hire program par- ticipants for a specified period of time in order to provide them with on- the-job training, short-term subsidized "work experience" positions (paid or un- paid) at government or nonprofit agen- cies to give participants an opportunity to build an employment record and ac- quire general work skills, and job search assistance (including training in resume preparation and interviewing, help in job finding, and direct job placement). In addition, financial support, child care, personal and career counseling, and expense reimbursements during training are sometimes provided. (Burt Barnow, 1989, provides a more detailed discussion of the different kinds of training and support services.)

Space constraints prohibit us from covering all the programs that provide the services listed above. As already in- dicated, we focus more narrowly on those training programs that are primar- ily targeted at economically disadvan- taged persons who are no longer in school. We do not examine evaluation

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1811

findings for programs targeted mainly at the elderly, persons with disabilities, persons still in school, and dislocated workers. Notwithstanding the exclusion of these groups, the methodological dis- cussion in this article applies to evalu- ations of most government training programs, regardless of their target population. We do not cover job crea- tion programs in this article because their main objective is not to increase unsubsidized employment.2 We also ex- clude policies that affect earnings or hours of work through nontraining mechanisms (e.g., wage subsidies and tax credits).

Why is it desirable for the govern- ment to provide training for the eco- nomically disadvantaged? The answer is not obvious because many training op- portunities are available elsewhere. For example, private sector employers are a major source of on-the-job training, even in the absence of government sub- sidies. In addition, classroom training of several kinds-Adult Basic Educa- tion, General Educational Development (GED) preparation, and vocational edu- cation-is available through community colleges and adult education schools, and low-income persons can obtain some financing for certain activities through federal Pell grants without as- sistance from training programs. In fact, training programs often provide class- room instruction by sending program

enrollees to the same schools and the same courses used by persons not en- rolled in training programs. Similarly, job search assistance is available outside training programs for those who seek it.

Given the wide availability of alterna- tive sources of services similar to those provided by training programs, the ma- jor economic rationale for funding training programs revolves around as- sertions of market or institutional fail- ure. Low-income people may not have the capital resources to invest in certain kinds of training, such as classroom vo- cational training, that are not usually subsidized outside government training programs. Their access to private fi- nancing may be limited by a lack of col- lateral and a high risk of default. Public training programs may also be justified as compensating for inadequacies in the public education system or as providing a second chance to those who prema- turely terminate formal schooling be- cause of imperfect foresight or a high subjective rate of time preference. The economic rationale for these programs may also hinge partly on the existence of imperfect information about avail- able training opportunities and their likely returns. That is, government pro- grams may serve as a training "broker," guiding program enrollees into activi- ties that yield the highest payoff for them. Given the presence of market or institutional failure, it is possible for the social rate of return to government investment in training for low-income individuals to be quite high. Even if the social rate of return is below the market rate of return, however, using public funds to support training programs for the economically disadvantaged may still be more efficient than using such funds to provide direct transfers to the poor or to operating alternative programs intended to decrease poverty. None- theless, as discussed below, program op-

2 Job creation programs are substitutes for regu- lar employment. Examples include the Works Projects Administration during the 1930s and the Public Service Employment component of the Comprehensive Employment and Training Act during the 1970s. The work experience activities used in training programs differ from job creation because they are of limited duration, often do not pay wages, are for the stated purpose of allowing the participant to build an employment record and general work skills that will be of value in finding regular, unsubsidized employment, and are not viewed by training program administrators as a substitute for regu1ar employment.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1812 Journal of Economic Literature, Vol. XXXV (December 1997)

erators cannot easily monitor the ef- fectiveness of their efforts, and it is therefore possible for the returns to public funds expended on government training programs to be negligible.

Two additional goals of government training programs for low-income indi- viduals are to reduce government wel- fare expenditures and to increase the time that welfare recipients spend working. To achieve these objectives, increasingly stringent requirements to participate in training programs have been imposed on welfare recipients in recent years. Still, many "welfare-to- work" program administrators see par- ticipation requirements as a way of se- curing participation in programs aimed primarily at increasing income and re- ducing poverty rather than reducing welfare payments and increasing work. Regardless of the ultimate objective, the rationale for participation require- ments is obvious-namely, to secure co- operation by welfare recipients who may not see training or employment as being in their immediate best economic interests.

III. Training Programs for the Economically Disadvantaged

Government training programs may be broadly classified into two basic categories: voluntary and mandatory. Voluntary programs provide training for individuals who apply for them and meet certain criteria of need, such as having income below a certain level or lacking a high school diploma. The first major post-World War II national voluntary training program for the disadvantaged in the United States was funded in 1962 under the Man- power Development and Training Act (MDTA). Although initially enacted to retrain technologically dislocated work- ers, MDTA soon shifted resources to-

ward serving economically disadvan- taged persons, reflecting new priorities established by the 1964 Economic Op- portunity Act. In 1964, also, the Job Corps was created. The Job Corps, which still operates today, provides training for disadvantaged youth at 110 urban and rural residential centers throughout the United States. Since its inception, the Job Corps has served more than 1.7 million youth.

In 1973, MDTA was replaced by new legislation, the Comprehensive Employ- ment and Training Act (CETA). CETA differed from MDTA in two important respects. First, states and local govern- ments were given authority to operate training programs using grants from the federal government. Second, CETA had a job creation component, "public ser- vice employment" (PSE), that grew quite large during the Carter admini- stration.

As a result of charges that PSE was corrupt and mismanaged, along with a desire to have the private sector play a bigger role in the operation of training programs, CETA was replaced during the early years of the Reagan admini- stration by the Job Training Partnership Act (JTPA), passed in 1982. JTPA elimi- nated the PSE component of CETA, but enhanced its decentralized adminis- trative structure. JTPA, which currently serves close to one million economically disadvantaged persons annually, re- mains the principal voluntary national training program today for the disad- vantaged. Like MDTA and CETA, JTPA also provides separate funding for train- ing persons who are not classified as economically disadvantaged.

Mandatory training programs are di- rected at public assistance recipients. In this article, we examine mandatory programs directed at welfare recipients, including recipients of the former Aid to Families with Dependent Children

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1813

(AFDC) and Food Stamps.3 A pro- gram's mandatory nature stems from its statutory authority to penalize or "sanc- tion" recipients who do not cooperate by reducing (or in some cases terminat- ing) their welfare payments. Mandatory training programs for welfare recipients were first established in 1967 under the Work Incentive (WIN) Program. Under WIN, participation could be required of heads (mostly female) of single-parent AFDC families without preschool-age children and by the much smaller num- ber of heads (mostly male) of two-par- ent AFDC-U families. In practice, WIN was never given enough funding to es- tablish an effective mandate for more than a small minority of the targeted population. The 1981 Omnibus Budget Reconciliation Act (OBRA) allowed states additional options and flexibility in designing mandatory training and work programs for welfare recipients. These programs became known as "wel- fare-to-work programs." OBRA and sub- sequent legislation, together with a growing political momentum toward welfare reform, stimulated a number of states to experiment with the design of welfare-to-work programs and to strengthen the requirements to partici- pate in them. In 1988, the Family Sup- port Act (FSA) was passed, replacing the WIN program with the Job Oppor- tunities and Basic Skills Training (JOBS) Program. JOBS expanded the mandatory population to include single- parent AFDC recipients with pre- school-age children (down to age three or, at state option, to age one), esta,b- lished minimum-participation-rate tar- gets for states, increased the grant re- duction penalties for nonparticipation, and, for the first time, committed fed-

eral funds to education in welfare-to- work programs. A modestly funded Food Stamp Employment and Training Program was authorized in 1985 and be- came fully operational in 1987.

The distinction between voluntary and mandatory training programs is not considered meaningful by all policy ana- lysts. Most of the program activities in the two kinds of programs are similar and the institutions providing the train- ing can be the same. Enforcement among mandatory programs is often downplayed by local program adminis- trators, making participation seem vol- untary. Additionally, the target popula- tions partially overlap: for example, a significant proportion of JTPA partici- pants are welfare recipients, and some of them are there in fulfillment of a JOBS program participation obligation. Nonetheless, despite the similarities, we believe the differences are suffi- cient to warrant separate treatment in this article. For one, the evidence sug- gests that pressure to participate or to work has been increasing in recent years for enrollees in mandatory pro- grams in many states. In addition, only mandatory programs purport to have a direct effect on some nonparticipants, whether through financial sanctions or simply by prompting some people to find a job in order to avoid what may be perceived as an onerous participation requirement. As discussed later, the possibility of direct program effects on nonparticipants under mandatory pro- grams imposes restrictions on the way results from studies of these programs can be interpreted.

According to the United States Gen- eral Accounting Office (1995), in fiscal year 1995 just over $3.8 billion was ap- propriated for services to the economi- cally disadvantaged under JTPA Title IIA (disadvantaged adults, $947 mil- lion), JTPA Title IIC (disadvantaged

3 AFDC was replaced in August 1996 by legisla- tion that create, welfare block grants to states (Temporary Assistance to Needy Families, or TANF).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1814 Journal of Economic Literature, Vol. XXXV (December 1997)

youth, $260 million), JOBS ($1.3 bil- lion), Job Corps ($1.1 billion), and the Food Stamp Employment and Training Program ($165 million). An additional $10.4 billion was appropriated for ser- vices under a variety of other "training" programs, although only a portion of this amount was for the disadvantaged and for services traditionally defined as out-of-school training. In total, the $14 billion of training expenditures (broadly defined) constituted less than 0.2 per- cent of Gross Domestic Product. The states also spend modest amounts on training programs and, like the federal government, state programs serve other groups in addition to the disadvantaged. Thus, as a fraction of national income, funds expended on training the disad- vantaged are very small.

IV. Theory of Training Program Evaluation

A complete theory of program evalu- ation would specify the information re- quired and the appropriate decision rule to apply in using that information to advise policy makers on the desirabil- ity of increasing or decreasing the scale of a particular training program. Econo- mists have done surprisingly little work toward developing a complete theory of training program evaluation.4 More-

over, the information provided by cur- rent training program evaluations is quite limited. Nearly all training pro- gram evaluations are "black box," in- dicating only whether a particular program "works," on average, for a par- ticular sample under a particular set of circumstances (including labor market conditions and service delivery sys- tems). Such information, although use- ful, may not be readily generalizable to other programs, circumstances, or pop- ulations. Indeed, a major recent criti- cism of the evaluations of the past 30 years is that they have failed to contrib- ute to the accumulation of knowledge because, it is alleged, they have not sys- tematically gathered empirical informa- tion under the guidance of a broad theoretical framework (James Heckman 1992; Charles Manski 1995; Heckman and Jeffrey Smith 1995). Ideally, if the parameters of an underlying structural model could be estimated, policy mak- ers would be able to identify the most effective policy option under differing circumstances and would be able to pre- dict its outcome (see Manski and Irwin Garfinkel 1992a; Heckman 1992; and Greenberg, Robert Meyer, and Michael Wiseman 1993, 1994). Such estimation, however, has never been accomplished.5

In this article, we distinguish the ef- fects of training programs on three suc- cessively inclusive groups: program par- ticipants (those who actually receive the training services), the broader target population eligible for the program (participants and nonparticipants), and

4 The principal contribution has been to apply a Bayesian decision-theoretic framework, with a loss function derived by positing a prior distribution of belief about the effectiveness of a training pro- gram (Frank Stafford 1979; Gary Burtless and Larry Orr 1986). One formal model implies that evaluation information is most valuable when (a) the aggregate program benefits are expected to be large, (b) aggregate program benefits minus ag- gregate costs are close to zero, and (c) prior opin- ion about the magnitude of the benefit minus cost difference is uncertain. Seemingly minor changes in assumptions can, however, yield diametrically opposite conclusions about what kind of evaluation research is most valuable. Moreover, potentially important extensions of this approach, such as in- corp orating sequential evaluation and decision ma in and accounting for the value of long-run

increases in basic knowledge (John Conlisk 1979), have not been well explored in the training pro- gram literature.

5 One reason for this is that a very large number of evaluation sites would be required to provide sufficient variation in program designs, chent characteristics, and local environmental conditions to estimate the parame- ters of a structural model with precision (Greenberg, Meyer, and Wiseman 1993).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1815

society (participants, nonparticipants, and ineligible individuals). Our discus- sion of voluntary programs focuses on effects on participants. In the voluntary program evaluation literature, analysis centers on participants because nonpar- ticipants are, presumably, unaffected by the program. We bring in effects on program-eligible nonparticipants in our review of mandatory programs. In mandatory programs, nonparticipants may experience program effects result- ing from penalties for nonparticipation or from changes in behavior to avoid participating in the training program (a "deterrent" effect). Finally, at the soci- etal level, we discuss training program effects on persons ineligible for the pro- gram. The behavior of some ineligibles is affected if they attempt to become eligible for the training program (for example, by lowering their income). This is termed an "entry" effect (Moffitt 1992, 1996). Others could lose jobs in competition with training program graduates. This is termed a "displace- ment" effect (Hamermesh 1972; George Johnson 1979). Because the literature on social effects is scant and mostly the- oretical, we only briefly summarize it.

V. Evaluating Voluntary Training Programs

A. Estimating the Effects of Participating

The behavior of participants in a vol- untary training program can be de- picted by the following model:

Yit = ctXi + btPio + uit, t > 0, (la)

Pio = aoZi + eio. (lb)

In this model, Yit is the outcome of inter- est (say, earnings) for the ith person in period t, where t = 0 is the period in which the training occurs; Xi and Zi are sets of (perhaps overlapping) exogenous factors and personal characteristics for individual i (usually measured before the

program begins);6 Pio is a binary variable, with zero indicating no participation in training program activities and unity in- dicating participation;7 and uit and e1o are random error terms. In this formulation, the mean effect of training program par- ticipation in period t is bt, which may vary over time.8 Equation (lb) is some- times called an index function (Heckman and Richard Robb 1985) or a propensity score (Heckman and V. Joseph Hotz 1989),9 to denote that the decision to participate in training program activities may be made by a program administra- tor, a prospective trainee, or both.

The fundamental strategy for estimat- ing bt is to compare a sample of persons who receive services from a training program that is being evaluated with a sample of persons who do not. This comparison between participants and nonparticipants is made in two basic ways: a nonexperimental approach and an experimental approach. Under cer- tain conditions, the nonexperimental approach is adequate to yield an un- biased estimate of bt. When such con- ditions do not hold, the experimental approach is an alternative.

6 In principle, X, and Zi can be measured during or after the program, but if this is done, they may be affected by participation and, hence, be en- dogenous. It is assumed that X, and Zi are uncorre- lated with uit and e,o, respectively.

7 Ideally, it is desirable to differentiate among several kinds of training activities, by defining Pio as a vector, and to account for the level and intens- ity of participation by defining Pio as a continuous variable. For one recent attempt to do the latter, see Louis Jacobson et al. (1994).

8 In practice, bt is also often allowed to vary over certain kinds of individuals (subgroup analysis). In addition, evaluators have recently become inter- ested in examining effects on the entire distri- bution (as opposed to the mean) of the outcomes. See, for example, Anders Bjorklund and Moffitt (1987), Nancy Clements, Heckman, and Smith (1994), Manski (1995), and Friedlander and Robins (1997).

9 The term "propensity score" comes from the statistics literature (Donald Rubin 1973; Paul Rosenbaum and Rubin 1983).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1816 Journal of Economic Literature, Vol. XXXV (December 1997)

1. Nonexperimental Evaluations. Non- experimental evaluations usually involve the selection of a comparison group by the evaluator. The comparison group is intended to provide a counterfactual for the program participant group. Com- parison groups for estimating training program effects for participants have been variously drawn from among appli- cants who dropped out or were turned away without receiving program services, target group members who did not ap- ply, individuals outside the geographic area covered by the program, and non- participants drawn from national micro- data sets. Training program participants have also served as their own compari- son group in periods prior to participat- ing; that is, their pre- and post-program behavior is compared.10

In a nonexperimental evaluation, if E(Piouit) = 0, an unbiased estimate of bt can be obtained by regressing Yit on Xi and Pio. There is no guarantee, how- ever, that this condition will hold in practice. In fact, if Yit is earnings, it is quite possible that Pio and uit are posi- tively correlated. This would occur, for example, if more motivated individuals chose to participate in the training pro- gram and motivation was not captured in the X variables.

Correlation between Pio and uit can arise in two ways, through Zi or through eio. If E(Ziuit) ? 0, but E(uiteio) = 0, there is selection on observables (Heck- man and Robb 1985; Heckman and Hotz 1989). What this generally means in practice is that program administra- tors are selecting applicants for a pro- gram on the basis of a set of known characteristics. For example, persons might be admitted into a program if they have dropped out of high school, or if they are unemployed, or if they

satisfy a ranking based on a set of ob- servable characteristics. In the case of a simple linear specification of equation set (1), Barnow, Glen Cain, and Arthur Goldberger (1980) show that when there is selection only on observables, consistent effects of the program can be obtained by including the selection vari- ables, Zi, as regressors. Linearity may not be a very good assumption, how- ever, and, as described below, a number of fairly sophisticated methods have been proposed to deal with the general problem of selection on observables.

If E(Ziuit) = 0, but E(uiteio) ? 0, there is selection on unobservables. This is a much more serious problem than selec- tion on observables because solutions require strong (largely untestable) be- havioral assumptions, complex nonlinear estimation models, or difficult-to-obtain data. Selection on unobservables can oc- cur when individuals are prompted to participate in program activities by some underlying factor, such as motiva- tion, that is difficult to measure. Selec- tion on unobservables can also occur if program administrators use subjective or objective criteria to select program participants, and their ratings of indi- viduals are not recorded.1"

a. Addressing the Problem of Selec- tion on Observables. Early nonexperi- mental evaluations of voluntary training programs implicitly assumed that selec- tion into the program was based on observables (see Goldberger 1972; and Cain 1975, for discussions). One ap-

10 An overview of nonexperimental evaluation methods is given in Moffitt (1991) and Bell et al. (1995).

11 To a degree, making the distinction between selection on observables and selection on unob- servables is artificial (and perhaps even mislead- ing), because unobservables may represent mainly factors that are difficult, but not necessarily im- possible, to measure. Heckman and Smith (1995) argue that the collection of better data can mini- mize (and, perhaps, even eliminate) problems caused by se ection on unobservables, thereby re- ducing the evaluation problem to one of finding an appropriate method Eor controlling for selection on observables.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1817

proach taken in several early nonexperi- mental studies was to use an "internal" comparison group (for example, nonpar- ticipating training program applicants) to draw inferences about the effects of a program (Michael Borus 1964; Cain 1968; Ernst Stromsdorfer 1968; Thomas Cooley, Timothy McGuire, and Edward Prescott 1979). It was thought that in- ternal comparison groups were appro- priate because these individuals pos- sessed many of the same characteristics as participants.

The use of internal comparison groups never achieved great popularity because it was quickly recognized that nonparticipants are likely to be quite different from participants by virtue of the fact that they have chosen not to participate or have been excluded by program staff. Recently, Bell et al. (1995) have proposed using a variant of this approach, based on the "regression discontinuity" model, to evaluate a train- ing program for welfare recipients. 12

They argue that "screened out" appli- cants-those excluded because of deci- sions made by an intake staff-by defi- nition differ from participants only on factors (both objective and subjective) observable to staff. Their regression dis- continuity approach attempts to control fully for these differences using intake workers' ratings of applicant potential.

Whether the Bell et al. study will stimulate more nonexperimental evalu- ations utilizing internal comparison groups remains to be seen. One alterna- tive that has proven quite popular for a number of years utilizes "external" com- parison groups, consisting of a sample of individuals whose observed charac- teristics resemble those of program par-

ticipants, but are drawn from a differ- ent source (often a national data base, such as the Current Population Survey or the Panel Study of Income Dynam- ics, or special samples from geographic areas that have not implemented the program). The use of external compari- son groups became prevalent in evaluat- ing the effects of the MDTA and CETA programs in the 1970s and 1980s (Orley Ashenfelter 1978; Barnow 1987), and are still being used in the 1990s (Sharon Long and Douglas Wissoker 1995; Heckman, Smith, and Christopher Ta- ber 1994; and Rajeev Dehejia and Sadek Wahba 1995).13

The use of external comparison groups sometimes involves searching for a group of individuals who are matched statistically to members of the program group. One procedure, known as "cell matching," was used by Edward Bryant and Kalman Rupp (1987), among others, to evaluate the CETA program. Under this procedure, sub- groups of individuals are created based on certain observed characteristics (such as age, education, and race) and are then matched to other individuals with the same characteristics. Another procedure, known as "distance function matching," matches individuals based on a weighted function of observed characteristics. The first application of distance function matching in the train- ing program evaluation literature was by Katherine Dickinson, Terry Johnson, and Richard West (1986, 1987a) in their evaluation of the CETA program. Their application was based on Rubin's (1979) "nearest neighbor" technique. Under

12 The "regression discontinuity" model was first proposed as an evaluation model in the field of education by Donald Thistlethwaite and Donald Campbell (1960), but has received scant attention in the training program evaluation literature.

13 Some evaluators claim to have solved the problem of selection on unobservables by using external comparison groups. External comparison groups are usually chosen on the basis of a set of observed characteristics, however, and, hence, technically their use addresses only the problem of selection on observables.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1818 Journal of Economic Literature, Vol. XXXV (December 1997)

this technique, the Mahalanobis dis- tance is calculated between a training program participant and each potential comparison group member,'4 and then a match is accepted for the pair with the smallest distance between them.15

Recently, a variation on statistical matching has been proposed by Dehejia and Wahba (1995) and lieckman et al. (forthcoming). Based on the methodol- ogy developed by Rosenbaum and Ru- bin (1983), these authors use the pro- pensity score as the matching variable. The propensity score suinmarizes the information in a set of observable vari- ables into a single index function. Treatmrent group and comparison group observations with similar propensity scores (simnilar predictions of being in the treatmnent group) are considered good matches for each other.'6 Dehejia and Wahba (1995) argue that the pro- pensity score method can serve as a good approximation to a wide variety of linear and nonlinear econometric re- sponse functions when there is selec- tion on observables.

Statistical matching is to be distin- guished from econometric matching. Ecoinometric mnatching denotes the standard behavioral modeling tech- niques and specification tests that use observed characteristics as regressors, without necessarily restricting the com- position of the estimation sample through

the use of matching. As noted by Fried- lander and Robins (1995), statistical matching and econometric matching with the same data set can produce very similar estimates of program effects. Essentially, both methods adjust esti- mates of program effects for the influ- ence of a given set of observable covari- ates. Thus, they differ only in the way they specify the functional relationship between the observed characteristics and the relevant outcome variable.

b. Addressing the Problem of Selec- tion on Unobservables. A number of early training program evaluation stud- ies proposed methods for dealing with selection on unobservables. All of these studies make certain assumptions about the nature of the dependence between uit and Pio (or uit and eio). One of the first to propose an econometric solution was Ashenfelter (1978), who hypothe- sized an "autoregressive" model of the earnings generation process based on a simple model of human capital invest- ment. In Ashenfelter's model, prepro- gram earnings histories play a crucial role in estimation.17 The key assump- tion in Ashenfelter's model is that earn- ings contains an unobserved "fixed ef- fect" that can be accounted for in esti- mation by making appropriate transfor- mations of the earnings outcome data'18

14 The Mahalanobis distance is given by (X1 - X2)'S-'(X1 - X2), where XI and X2 are column vectors of the matching variables for two observa- tions and S is the covariance matrix of the match- ing variables.

15 In the typical application of the nearest neighbor technique, comparison group members are sampled wAithout replacement; that is, once an observation is selected it is not used again. Thus, the results produced by the technique are not in- variant to the order in which the data are sorted for matching.

16 Dehejia and Wahba (1995) propose using good" matches muore thaii once (in effect, match-

ing them to more than one program group mem- ber).

17 Ashenfelter noted that trainees tend to suffer a sharp decline in earnings just prior to program entry. This "preprogram dip"undoubtedly reflects adverse economic circumstances that are, at least, partly responsible for the individual's decision to enter training. Knowing whether the preprogram level of earnings is transitory or permanent is criti- cal for developing an appropriate statistical model to account for selection on unobservables.

18 For example, in the case of a simple fixed ef- fect, where u,t = ik and e,O = jt (i.e., the correla- tion between u,t and P0O arises from a common component, RI,), an unbiased estimate of the pro- gram effect can be obtained by estimating a first difference model for earnings (see Barnow 1987, for details). A transitory "earnings dip," however, will make the results sensitive to the base year used to construct the first difference model.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1819

Ashenfelter found that training effects tend to decay over time, an aspect of his results that stimulated a number of other papers, including a lively ex- change between Bloom (1984b) and Laurie Bassi (1987). Bloom argued that Ashenfelter failed to correct for a time- varying bias in the fixed effect model, and that when such a correction is made the effects of training do not decay over time. Bassi countered that Bloom's esti- mates were no more credible than Ashenfelter's because they, also, were based on a set of strong assumptions about the nature of the selection on un- observables. Debates about nonexperi- mental evaluations such as these rarely produce a winner because there is no clear-cut way of determining whose as- sumptions are valid.

In addition to the fixed-effect model, two other methods have been used in the training program evaluation litera- ture for addressing selection on unob- servables. One is instrumental variables (Heckman and Robb 1985; Joshua An- grist, Guido Imbens, and Rubin 1996). In effect, equation (lb) is estimated and is used to construct an instrument for Pio that is uncorrelated with uit. Al- though the use of instrumental variables has sometimes proved successful, it has not been popular because of difficulties in finding an appropriate instrument- that is, a variable that influences Pio but not Yit.19 In addition, as discussed by Heckman (1996), the instrumental vari- able method has important limitations when the program effect varies across people (so-called random coefficient models).

Another method, proposed by Bar- now, Cain, and Goldberger (1980) and

based on the procedure developed by Heckman (1978) to deal with censored samples, is based on the assumption that uit and eio are jointly normally distributed. As in the case of instru- mental variables, equation (lb) is esti- mated, in this instance using a probit model. An appropriate Mills ratio ad- justment term is then constructed as a weighted average of predicted Pio and 1-Pio and is included as an additional variable in equation (la), which is then estimated using conventional re- gression techniques. This two-step method avoids the need for an instru- ment by relying on the functional form of (lb) for identification, but suffers from low reliability associated with specification uncertainty and low vari- ability in the newly constructed adjust- ment term.

2. Experimental Evaluations. Be- cause training program participation contains a large unexplained compo- nent, both instrumental variables and two-step procedures tend to produce statistically imprecise estimates of the effect of training. In addition, a number of studies (Bassi 1983, 1984; Ashenfel- ter and Card 1985; LaLonde 1986; Thomas Fraker and Rebecca Maynard 1987; LaLonde and Maynard 1987; and Friedlander and Robins 1995) find that these and other nonexperimental proce- dures designed to deal with the prob- lem of selection bias can produce widely varying estimates of program effects, often quite different from ex- perimentally based estimates from the same data set, depending on the as- sumptions made about the nature of the dependence between uit and po.20 Rec- ognition of this sensitivity made ex- perimental evaluations of training pro-

19 Perhaps one of the more successful applica- tions of the instrumental variable technique was performed by Charles Mallar (1979), who used proximity to the training site as an instrument for participating in the Job Corps program.

20 LaLonde (1986) was the first to develop pro- cedures for assessing nonexperimental estimators using experimental data.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1820 Journal of Economic Literature, Vol. XXXV (December 1997)

grams more popular in the 1980s and 1990s.21

In contrast to nonexperimental evalu- ations, experimental evaluations are based on random assignment of indi- viduals into a treatment (or "program") group and a control group. Randomiza- tion is intended to produce zero corre- lation between Pio and Xi, between Pio and Zi, and between Pio and uit, so that E(PioXi) = E(Piouit) = E(uiteio) = ao = 0. In an ideal experiment, only members of the treatment group participate in program activities and their participa- tion rate is 100 percent. An unbiased estimate of the effect of the treatment, bt (the effect of training versus no train- ing), can therefore be obtained by sim- ply taking the difference between mean earnings of the treatment and control groups. In practice, equation (la) is usually estimated using ordinary least squares. The Xs are included to in- crease the statistical precision of the es- timates, but such gains are usually small. As discussed later, this ideal ex- periment is never fully realized in ac- tual evaluations because not all mem- bers of the treatment group participate in training and some members of the control group seek out and engage in training similar to that provided by the program being evaluated.

Moreover, even an "ideal" experi- ment has inherent limitations. Although it provides an unbiased estimate of the

average program ettect per participant, it cannot provide unbiased estimates of the distribution of program effects across participants. Thus, an experi- ment cannot estimate the percentage of participants who actually benefit from a training program (Clements, Heckman, and Smith 1994; Manski 1995). As Heckman and Smith (1995) point out, experimental estimates of the "average effect per sample member" cannot dis- tinguish between two possibilities: (a) most people gained about the average, and (b) a few people gained much but most gained nothing or perhaps even lost. In addition, experimental data can provide unbiased estimates of the ef- fects of training programs on employ- ment rates, but not on effects that are conditional on employment status, such as the effects of training on the hazard rates of entry into or exit from employ- ment (Heckman and Smith 1995, John Ham and LaLonde 1990, 1996; Card and Daniel Sullivan 1988) or on hourly wage rates or weekly work hours. All of these limitations, however, apply equally to the common nonexperimen- tal techniques. Thus, because the ex- perimental approach solves the basic problem of selection bias in estimating mean treatment effects, it has come to be seen by many analysts as the more attractive method of program evalu- ation.

Given the attractiveness of experi- ments, it may seem surprising that non- experimental evaluations of training programs continue to be conducted. One reason, discussed below, is that ex- perimental evaluations may be inappro- priate if a program is expected to have large community or macro effects or large entry effects. In addition, experi- ments incur costs for implementing and monitoring randomization. These addi- tional research costs must be weighed against the costs of the misallocation of

21 An advisory panel that convened to make rec- ommendations concerning how to evaluate the JTPA program concluded that an experimental evaluation was the preferred method (Stromsdor- fer et al. 1985). Similar conclusions regarding the preferability of experimental evaluations of train- ing programs are reached by Ashenfelter and Card (1985), Burtless and Orr (1986), Barnow (1987), and Burtless (1995). Eventually, the federal gov- ernment decided to fund an experimental evalu- ation of JTPA, with an associated nonexperimental research component. The earlier generations of federal training programs (MDTA, CETA) had been evaluated exclusively using nonexperimental methods.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1821

social resources from decisions based on less reliable nonexperimental de- signs. Finally, nonexperimental evalu- ation may, in some cases, produce re- sults faster than an experiment. For evaluations of ongoing training pro- grams, data on earnings and welfare re- ceipt can often be obtained retrospec- tively. Data collection and statistical analysis can therefore be completed relatively quickly following the start of the evaluation, as long as researchers are interested only in a program as it operated in the immediate past. Experi- mental evaluations typically take much longer to complete, because the period of evaluation must include up to a year or more of sample intake during ran- dom assignment and a two- to five-year follow-up period plus the time required for data collection and analysis. On the other hand, experimental evaluations of new training programs are not necessar- ily more time consuming than nonex- perimental evaluations if the evaluation is initiated as soon as the program be- gins. As Orr et al. (1996) suggest, even with an ongoing program, much of the relatively higher cost and lack of timeli- ness of an experimental evaluation can be overcome if a fraction of program applicants are assigned to a control group on a continuing basis.

3. The External Validity of Estimated Program Effects. A critical issue in the evaluation of training programs is "ex- ternal validity." External validity refers to the extent to which estimated pro- gram effects can be generalized to dif- ferent locations and populations, to dif- ferent time periods, and to different variants of the program being studied. The external validity of specific esti- mates of program effects may be ques- tioned for a number of reasons. Most of these have been offered recently as criticisms of experimental evaluations (Heckman and Smith 1995), but nearly

all apply as well to nonexperimental evaluations.

First, and most obviously, social atti- tudes, government institutions, the busi- ness cycle, the relative demand for un- skilled and skilled labor, and other relevant factors may change in the years following an evaluation. Likewise, dif- ferent locations may have dissimilar trainee characteristics, social attitudes, state and local government institutions, labor market conditions, and so forth.

Second, training program evaluations are often performed at a small number of sites that are rarely selected ran- domly, raising questions about how well they represent administrative capacity and other unobservables for the uni- verse of sites (see Heckman and Smith 1995; Hotz 1992; and Heckman 1992). Difficulties in obtaining a represen- tative sample of program sites are espe- cially acute when the cooperation of lo- cal administrators is essential and they cannot be compelled to participate in the evaluation. It has been argued, for example, that sites participating in the recently completed national evaluation of the JTPA program are not repre- sentative of all JTPA sites because par- ticipation in the experiment occurred in only a few of the sites from which it was sought (Heckman and Smith 1995) 22 The evaluators (Orr et al. 1996) argue that the participating sites are repre- sentative, judging by observable charac- teristics. An important area of future research is to determine the degree to which site selectivity translates into bias in generalizing the estimated effects to other sites.

Third, external validity may be com- promised by "scale bias." Training pro-

22 A standard argument is that only sites operat- ing superior programs will agree to an evaluation. There may be, however, only minimal correlation between local operators' self-appraisals and the re- sults of a rigorous third-party evaluation.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1822 Journal of Economic Literature, Vol. XXXV (December 1997)

gram innovations are often tested a small demonstrations or pilot programs Even ongoing programs such as JTPI do not achieve universal participatioi within the program-eligible population Manski and Garfinkel (1992a) and Gar finkel, Manski, and Charles Michalopou los (1992) suggest that scaling up tc universal participation could chang( community norms or combine with pat terns of social interaction or informa tion diffusion in ways that will fee( back and influence the success of thc policy innovation. These community OI "macro" effects, they argue, will be ab. sent in small-scale pilot programs o0 partially scaled programs.23 In addition testing a program on a small scale may cause the composition of the program participants to differ from what it woulc be in an ongoing training prograrr by inhibiting diffusion of informatior about the program to potential appli- cants; by limiting the number of pro- gram slots and thereby encouraging program administrators to restrict par- ticipation to "higher quality" applicants: or, in an experiment, by discouraging risk-averse individuals from applying tc a program when they could be randomly assigned to a no-services control group (see Heckman 1992; Heckman and Smith 1995; and Manski 1993, 1995).24

At present, little is known about the practical importance of community ef- fects, although in principle their pres- ence could greatly multiply, or seriously impede, the effectiveness of govern- ment training programs. An important area for future theoretical and empiri- cal research may lie in adapting methods from sociology, urban anthro- pology, ethnography, and community psychology to study the community ef- fects of large-scale, permanent train- ing programs.25 Similarly, although the possibility of bias caused by distor- tion of the participant sample in small- scale selective voluntary programs has strong theoretical appeal, its em- pirical importance is yet to be demon- strated.26

One nonexperimental approach for avoiding biases caused by testing policy innovations on a small scale is to imple- ment them on a site-wide, fully scaled basis in some locations and, for com- parison, use other sites (perhaps statis- tically matched) that have not adopted the innovation. Although this "satura- tion" evaluation design does, in princi- ple, allow feedback effects to be cap- tured, the program may have to be kept in place for many years, with firm guar- antees of permanency, before these ef- fects reach full potency. Moreover, cross-site comparison designs will pro- duce unreliable estimates of program effects if the program and comparison sites differ in ways that are inade- quately controlled for in the evaluation (see Robinson Hollister and Jennifer Hill 1995; Friedlander and Robins 1995). Indeed, even if sites are ran-

23 In at least two recent instances-the Massa- chusetts Employment and Training Choices Pro- gram (Demetra Nightingale et al. 1991) and the Washington State Family Independence Program (S. Long and Wissoker 1995)-these issues were considered so important that a deliberate decision was made against using a random assignment evaluation design that would create a no-program control group and would therefore interfere with site-wide program coverage.

24To illustrate the potential bias caused by distor- tions in the participant population, Manski in- vestigates nonparametric bounded estimators that are virtually assumption-free (see, also, Angrist and Imbens 1991). As Manski shows, the bounds can often be quite large, and can be narrowed only if the evaluator is willing to impose strong and un- testable assumptions about behavior of individuals outside the specific program being evaluated.

25 Hypotheses concerning community effects are currently being tested in the evaluation of the Youth Fair Chance Demonstration,

26In a study of "creaming" in the JTPA Title IIA program, Kathryn Anderson, Richard Burkhauser, and Jennie Raymond (1993) find that the problem of nonrandom selection of participants is not as serious as some critics suggest.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1823

domly assigned to program and control status, there may simply be too few sites to assure that the two groups of sites do not differ in some unobserved way.

Fourth, as suggested in Section VA.2, it is often the case that some members of control or comparison groups receive services similar to those received by program group members. The possibil- ity of substituting training program activities for similar activities provided elsewhere first gained empirical atten- tion in the WIN evaluations of the 1980s, when it was found that participa- tion in education and training activities observed in a randomly assigned pro- gram group also took place among members of the control group (James Riccio et al. 1986; Gayle Hamilton and Friedlander 1989). It was confirmed in several later evaluations, notably in the JTPA evaluation (Orr et al. 1996; Heck- man and Smith 1995). Some control group members engaged in activities through adult schools, community col- leges, or other local institutions, and they did so without special program as- sistance. Moreover, in order to provide education and training, government programs often send their enrollees to the same local institutions, where they attend classes side-by-side with indi- viduals who are in the target population but who are not enrolled in the govern- ment training program.

Under such circumstances, when Pio is defined as zero for all sample mem- bers not in the government training program, bt in equation (la) does not measure the pure effect of participating in training versus not participating in training. Rather, it measures the incre- mental effect of the additional partici- pation in training stimulated by the pro- gram being evaluated. This measure is clearly policy-relevant, but it is not what is implied by Model 1. In addition,

the fact that comparison group mem- bers, as well as program group mem- bers, engage in training is a source of at least two threats to external validity. First, not only will the evaluated pro- gram differ over time or from one place to another, but the array of activities available to comparison group members will also differ, complicating the prob- lem of generalizing the evaluation re- sults. Second, the very existence of the program being evaluated may change the number of training opportunities available to the comparison group. This second threat to external validity, which Heckman and Smith (1995) call "substi- tution bias," could occur if, by absorb- ing some persons who desire.training, the evaluated program frees up more nonprogram training slots for others who want training. Or, if the evaluated program is large enough, it may induce state and local governments to refrain from funding training activities they would normally provide in the absence of the program.

B. Estimating the Effects of Policy

Participation in similar or identical training activities by persons who are in the program-eligible population but are not enrolled in the government training program proper introduces a profound conceptual problem for Model 1. As we have just indicated, estimates of bt no longer represent the effect of training on participants. Moreover, any realistic model must allow for the possibility that training program activities that suc- cessfully increase skills and help partici- pants find jobs may nevertheless fail to produce an effect on the earnings of training program graduates if partici- pants would have participated in similar activities on their own in the absence of the special program. The net effect of the program on participation in training in that case is zero, and the net effect

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1824 Journal of Economic Literature, Vol. XXXV (December 1997)

on labor market outcomes must also be zero, unless the training program im- parts some extra efficiency to the activi- ties in question.

To make the model more realistic and to allow for duplication of activities in- side and outside the training program, we begin by positing that an individual, i, in the population eligible for the gov- ernment training program faces a num- ber of opportunities for training-of which the government training program is only one. Under these real-world con- ditions, the government program does not "provide training" but rather makes an "offer of training" consisting of a bundle of services to facilitate partici- pation in certain activities plus supports and incentives to individuals to partici- pate in those activities. One of the ser- vices offered may be the actual training, or it may be referral and access to train- ing provided elsewhere in the commu- nity. Some training activities, such as remedial reading and math courses, will be virtually identical for training par- ticipants whether they are enrolled in the government training program or in some other program. Other activities, notably subsidized employment, will be available only to government training program enrollees. The program offer may also be called the government "pol- icy" with respect to the training for the program in question. The prospective program enrollee considers whether to accept that offer, a competing offer, or none at all.

In our model, participation, Pio, can thus no longer be defined as unique to the training program in question. In- stead, Pio must be redefined to repre- sent participation in any training activi- ties similar to those offered by the special program. A training "partici- pant" is defined as a person engaged in one of these activities, whether enrolled in a government training program (an

"enrollee") or acting outside the pro- gram on his or her own initiative. The distinction between program "enrollee" and training "participant" is important and is maintained throughout the rest of this article.

Returning to the formal model, we assume, for simplicity, that Pio is a sca- lar representation of a single training activity and that there is no difference in the efficiency of similar activities be- tween the government program and nongovernmental providers. We then rewrite Model 1 as Model 2 by adding a term to the second equation:

Yit = ctXi + btPio + ujt, (2a)

Pio = aoZi + goTi + eio. (2b)

In this model, Ti is a binary scalar that takes on the value of unity if the pro- gram offer of training is in effect for in- dividual i and zero if the offer is not in effect. Thus, go will measure the pro- gram's incremental effect on participa- tion-that is, the change in training par- ticipation induced by the program. Under this formulation, bt is restored to its original meaning as the measure of the effect of participating in training on participants. In this case, bt applies both to those who receive training as en- rollees in the government program and those who receive it on their own.27

27To generalize Model 2, Ti could be specified as an array of government services and incentives characterizing training "policy." In addition, Model 2, as we have written it, makes no provision for decreasing marginal returns to training, so that bt does not decline as the scale of the training program increases (i.e., for programs for which go is large). Incorporating scale effects could be done by making bt a function of P averaged across the

opulation. Allowing personal characteristics to af- fect the returns to training and the training deci- sion could be accomplished by making bt a func- tion of Xi and go a function of Z,. Finaly, a further generalization of Model 2 would be to distinguish the efficiency of training participation that does or does not come through the government program. This could be accomplished by creating separate P variables and b coefficients for the government program and other training.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1825

By substituting for Pio from equation (2b), equation (2a) can be rewritten as

Yit=ctXi + (btao)Z1 + (btgo)Ti + bteio + uit.

(2a')

Thus, for person i, the total effect of the program being evaluated on the out- come variable is btgo, which is the prod- uct of the effect of participating on participants (bt) and the incremental ef- fect on participation (go). The aggregate program effect (ignoring macro effects) is btgo multiplied by the number of persons in the eligible population. As should be evident, a training program with a large effect of participating on participants (bt) could yield a rather low total program effect if the incremental effect on participation (go) is small. Con- versely, a modest effect per participant may lead to a larger total effect if the increase in participation from the new program is large. Whenever participa- tion in a government training program mostly duplicates participation that would have occurred anyway, go will be close to zero and the aggregate effect of the program will be small, even if the training is effective. The point of Model 2 is that a full assessment of the effects of a government training program re- quires not only an estimate of the ef- fects of training on training participants, but also an understanding of program ef- fects on overall participation in training in the context of existing training oppor- tunities.

The revised model may be used to il- lustrate a fundamental noncomparabil- ity between estimated program effects that come from participant/nonpartici- pant designs on the one hand and from comparison group designs (including randomized experiments) and compari- son area designs on the other. A partici- pant/nonparticipant design estimates bt in equation (2a) by comparing outcomes

for training participants and nonpartici- pants, in exactly the same fashion as in equation (la). In contrast, under com- parison group designs, a program group represents condition Ti = 1, and a com- parison group (or comparison area sam- ple) represents condition Ti = 0 in equa- tion (2a'). Not all members of the program group are participants and not all members of the comparison group are nonparticipants. The coefficient of T1 is btgo, not bt. In examining estimates from the empirical literature in this ar- ticle, it is therefore invalid to compare the magnitude of estimated effects from the earlier participant/nonparticipant designs with those from the later ex- perimental designs and with other com- parison group or area designs. Because go is less than unity, participant/nonpar- ticipant designs will yield more positive estimated effects for a given program or for programs with similar activities, even when selection bias has been cor- rected. The statistical significance of es- timates from the two kinds of designs may also not be comparable if go is small. Only the signs should be the same (as long as go is positive).

For the estimated effects we present in this article, however, valid compari- sons of magnitude can be made across all evaluations using the internal rate of return (IRR). Estimates of the IRR compare the program effect with the cost of achieving that effect. Cross- study comparability of IRRs is pre- served, whether a particular program effect is measured as bt or btgo, as long as that program's cost is measured on the same basis (that is, as a per-partici- pant cost for the former or as a net dif- ference between cost per program group member and cost per comparison group member for the latter).

Participant/nonparticipant designs can- not provide estimates of the program effect on the amount of training activ-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1826 Journal of Economic Literature, Vol. XXXV (December 1997)

TABLE 1 TRAINING PROGRAM EVALUATION STUDIES

VOLUNTARY PROGRAMS

Years of Program Scope Operationa Target Group

MDTA NAT 1962-1974 Disadvantaged adults and youth

NYC NAT 1964-1974 Disadvantaged youth

JOBS68 DEMb 1968-1974 Disadvantaged adults

JC NAT 1964-present Disadvantaged youth

CETA NAT 1974-1983 Disadvantaged adults and youth

SW DEM 1975-1978 Long-term AFDC recipients, ex-addicts, ex-offenders, high school dropouts

HHA DEM 1983-1986 AFDC recipients TOPS DEM 1983-1986 AFDC recipients NJGD DEM 1984-1987 AFDC recipients MFSP DEM 1982-1988 Low-income minority single mothers

ET DEMc 1983-1989d AFDC recipients JS DEM 1985-1988 High school dropouts NC DEM 1989-1992 AFDC high school dropouts

JTPA NAT 1983-present Disadvantaged adults and youth

a "Years of Operation" do not necessarily coincide with dates of authorizing legislation. Calendar years are shown, not fiscal years. b JOBS68 was a national program but is classified as a demonstration because it was short-lived and featured only a single training activity. c ET was a state-run version of a national program but is classified as a demonstration because its research interest lies mainly in the large scale of its voluntary approach to training for welfare recipients. d In 1989, ET began operating under authority of the new federal Job Opportunities and Basic Skills Training (JOBS) Program.

ity, go. Under comparison group or area designs, go can be found by estimating equation (2b); that is, go is the coeffi- cient of Ti in a regression of Pio on Ti and Zi. Current evaluation practice un- der such designs is to report program and comparison group levels of partici-

pation in the various activities offered by the government training program. With this information, it is often possi- ble to determine whether a weak total effect results from a limited program effect on participation in training (go near 0) or from a small effect of the

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1827

TABLE 1 (Cont.) TRAINING PROGRAM EVALUATION STUDIES

VOLUNTARY PROGRAMS

Method of Program Main Activitiese Evaluation Study Evaluation

MDTA CT, OJT Ashenfelter (1978), Cooley, McGuire, & NXL Prescott (1979), Kiefer (1978, 1979),

Gay and Borus (1980), and Bloom (1984b) NYC CT, PWE Kiefer (1979), NXL

Gay and Borus (1980) JOBS68 OJT Kiefer (1979), NXL

Gay and Borus (1980) JC CT, PWE Cain (1968), Gay and Borus (1980), NXL

Kiefer (1979), Mallar et al. (1982) CETA CT, OJT, PWE, PSE Westat (1984), Ashenfelter and Card (1985), NXL

Bassi (1983, 1984) Bassi et al. (1984), Bloom (1987), Bryant and Rupp (1987), Dickinson,

Johnson, & West (1984, 1986, 1987a, 1987b), Finifter (1987)

SW PWE with training Hollister, Kemper, & Maynard (1984), XL Couch (1992)

HHA PWE with training Bell and Orr (1994) XL TOPS OJT, UWE Auspos, Cave, & Long (1988) XL NJGD OJT Freedman, Bryant, & Cave (1988) XL MFSP CT, OJT Burghardt et al. (1992) XL

ET CT, OJT, PWE, UWE Nightingale et al. (1991) NXL JS CT Cave et al. (1993) XL NC CT, PWE, UWE Quint et al. (1994) XL

JTPA CT, OJT Orr et al. (1996) XL

e Most programs with training components also provided assistance with job search. Key: MDTA = Manpower Development and Training Act; NYC = Neighborhood Youth Corps; JOBS68 = Job Opportunities in the Business Sector; JC = Job Corps Program; CETA = Comprehensive Employment and Training Act; SW = National Supported Work Demonstration; HHA = AFDC Homemaker-Home Health Aide Demonstrations; TOPS = Maine Training Opportunities in the Private Sector Program; NJGD = New Jersey Grant Diversion Project; MFSP = Minority Female Single Parent Demonstration; ET = Massachusetts Employment and Training Choices Program; JS = JOBSTART Demonstration; NC = New Chance Demonstration; JTPA = Job Training Partnership Act; NAT = national; DEM = special demonstration; AFDC = Aid to Families with Dependent Children; CT = classroom training (basic education and occupational skills training); OJT = on- the-job training; UWE = unpaid work experience; PWE = paid work experience; PSE = public service employment; NXL = nonexperimental; XL = experimental.

training itself (go substantially greater than 0).

C. Empirical Evidence for Voluntary Programs

The jumping-off point for our exami- nation of empirical results is the com-

pendium of past research in Charles Perry et al. (1975). Nearly all the stud- ies examined by this group of re- searchers was understood to be inade- quate. Of the 252 studies covered, many had no estimates of program effects; others judged effects from simple be-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1828 Journal of Economic Literature, Vol. XXXV (December 1997)

fore/after comparisons of wage rates or earnings. A minority of the studies used comparison groups but created these groups from program "no-shows" or dropouts, which today would be gener- ally recognized as inviting bias from se- lection on unobservables. "In almost every case in which a [comparison] group was used," the authors con- cluded, "there were valid reasons to question the comparability of the [com- parison group] and the treatment group" (p. 139). Random assignment was, however, considered impractical.

Despite the limitations in methodol- ogy, Perry and his colleagues were able to form opinions on several of the re- search issues that would figure promi- nently on the agenda of evaluation re- search for the next 20 years. The authors assigned a range of $1,000 to $2,000 (1996 dollars) to short-term ef- fects on annual earnings of skills train- ing (under MDTA), assigned smaller es- timates for effects of other program activities, and labeled work experience as the least effective. Program effects were judged largest for women, some- what smaller for men, and smallest for youth. Earnings increases were attrib- uted mainly to increased employment, rather than to increased earning power, raising doubts about the worth of skills development activities within training programs.

Changes in methodology were already in progress. One of the major com- plaints raised by the Perry et al. (1975) study was the absence of "systematically collected follow-up data" on training program enrollees. This absence was addressed under CETA by the estab- lishment of the Continuous Longitudi- nal Manpower Survey (CLMS). This survey of CETA participants was linked to earnings records maintained by the Social Security Administration (S SA) and supplemented by comparison

groups created from the Current Popu- lation Survey (CPS) to form the basis of a number of CETA evaluations. The widely differing nonexperimental ef- fects estimated using the CLMS data would, however, ultimately undermine this comparison group approach and lead to the abandonment of a similar evalu- ation strategy for JTPA.

The year 1975 also witnessed the start of the National Supported Work experiment, which was to set the stage for a dramatic shift from nonexperimen- tal to experimental (i.e., random assign- ment) evaluation of training programs. That shift was later given impetus by the conclusions of a National Academy of Sciences committee reviewing re- search on youth training programs (Charles Betsey, Hollister, and Mary Papageorgio 1985) and by the recom- mendation of an invited panel of ex- perts (Stromsdorfer et al. 1985) that the U.S. Department of Labor scrap plans for a nonexperimental evaluation of JTPA. They favored, instead, using a random assignment design. As shown in Table 1, a number of random assign- ment evaluations were conducted for voluntary programs during the 1980s- and more for mandatory programs (next section)-culminating in the recently completed multi-site random assign- ment evaluation of JTPA.

Table 1 lists in approximate chrono- logical order the voluntary training pro- grams that have been evaluated using individual-level data since 1975.28 The table classifies each program by its scope: either national or special dem- onstration. For each program, the ta- ble shows the years of program opera- tion (not necessarily the years covered

28 One exception, Cain (1968), is included in recognition of its importance in the evaluation his- tory of the Job Corps. For recent summaries of training program evaluation results, see Gueron and Edward Pauly (1991) and LaLonde (1995).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1829

by the evaluations), the demographic groups targeted, the principal activities, the major evaluation studies, and the evaluation method used. Although job search assistance is not indicated as a separate activity, most programs offered some help, formal and informal, in find- ing unsubsidized employment.

Because of methodological differ- ences and other factors, not all the studies in Table 1 should be considered on equal footing. Our top criterion for judging the reliability of estimated pro- gram effects is a random assignment re- search design. We consider the results from experiments to be generally less subject to bias and imprecision than the results from studies using nonex- perimental methods. We concur with Barnow (1987), who, in reviewing the accumulated nonexperimental effects es- timates for CETA, concluded that "the confidence interval surrounding these estimates must be considered quite large considering the sensitivity to al- ternative specifications and the lack of any strong reasons to accept findings from one study over those of another" (p. 189). Many nonexperimental ana- lysts also acknowledge the benefits of random assignment.

In our view, results from the recent JTPA evaluation (Orr et al. 1996; Bloom et al. 1997) are the most important. JTPA is national in scope-it is, in fact, the existing national program- and the evaluation design was experi- mental. In addition, the JTPA research team undertook extensive sensitivity analysis to examine possible underre- porting bias and survey nonresponse bias in the follow-up earnings data, which represent two of the principal threats to the internal validity of experi- mental research at present. In this paper, we often assess results of earlier studies against those found in the JTPA evaluation.

Table 2 summarizes findings from the 30 studies of the 14 voluntary programs listed in Table 1.29 Studies are orga- nized by demographic group and pro- gram scope. Adult men, adult women, and youth are shown in separate panels. Within each demographic panel, na- tional programs (MDTA, CETA, and JTPA) are shown first, then special demonstrations. We report effects on earnings in the second year after train- ing because the majority of studies fol- low trainees for at least this long.30 For studies in which second-year effects are unavailable, we report first-year effects. All earnings effects have been con- verted to 1996 Quarter 3 dollars, using the GDP chain-type price index.

For each program, the following summary statistics are reported: (a) the unweighted mean effect on annual earnings across evaluations;3' (b) the minimum and maximum estimated ef- fects; (c) the number of earnings effects

29 Two studies-Nicholas Kiefer (1979) and Robert Gay and Borus (1980)-evaluate more than one program.

30 A number of evaluations have provided estimates of effects on welfare payments and welfare dependency, but, owing to space limitations, we omit these results.

31 In the "Mean Annual Effect" column of Table 2, each evaluation of a particular program contributes one estimate of an earnings effect for that program. Because several studies do not re- port an overall estimate but only report estimated program effects separately by site or subgroup (for example, minority group status or ethnicity, or subgroups receiving different combinations of ser- vices), it was often necessary to compute a single earnings effect for a study by averaging the sub- group estimates, using the size of each of the sub- groups as weights, or for sites, using the un- weighted site estimates. If a given study reports estimates for more than one econometric specifi- cation, the author's preferred specification was used if given; otherwise, an unweighted average across specifications was used. Using the single ag- gregate weighted effect from each study for a par- ticuIar rogram, an unweighted mean effect was then cafculated across studies. The component site or subgroup estimates from each study were, how- ever, used in establishing the range and the fre- quency of statistically significant estimates in the "Range of Effects" column of the table.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

TABLE 2 EFFECTS OF VOLUNTARY TRAINING PROGRAMS ON PARTICIPANT EARNINGS BY DEMOGRAPHIC GROUP

Demographic Group Mean Range of Effects (if more than one) and Program Annual (num. negative and stat. sig./num. negative and not stat. sig./num.

(Num. of Studies) Effect positive and not stat. sig./num. positive and stat. sig.)

Adult Men National

MDTA (6) $151 -$2,127 to $2,605 (2/2/2/5) CETA (9) -$587 -$3,342 to $1,634 (3/4/3/3) JTPA (1) $970 (0/0/0/1)

OJT $1,275 (0/0/0/1) CT $1,032 (0/0/1/0)

Demonstration

JOBS68 (2) $344 -$1,274 to $2,013 (0/2/1/1) SW (1) $419 $402 to $440 (0/0/2/0)

Adult Women National

MDTA (5) $1,926 $942 to $3,527 (0/0/1/8) CETA (9) $1,797 $28 to $2,815 (0/0/1/13) JTPA (1) $960 $771 to $1,103 (0/0/0/2)

OJT $1,157 $693 to $2,234 (0/0/1/1) CT $414 $316 to $498 (0/0/2/0)

Demonstration

JOBS68 (2) $1,676 $428 to $3,150 (0/0/1/3) SW (2) $1,309 $554 to $2,064 (0/0/1/1) HHA (1) $1,849 $209 to $3,749 (0/0/2/5) NJGD (1) $1,017 (0/0/0/1) TOPS (1) $1,448 (0/0/0/1) MFSP (1) $793 $108 to $1,722 (0/0/3/1) ET (1) $999 (0/0/0/1)

Youth

National NYC (2) -$531 -$3,742 to $3,630 (2/3/1/2) JC (4) $586 -$3,994 to $1,902 (3/3/4/1) CETA (5) $450 -$2,475 to $3,715 (4/2/4/4) JTPA (1) -$171 -$724 to $184 (0/1/1/0)

Demonstration

SW (2) $269 $20 to $517 (0/0/2/0) JS (1) $553 $424 to $578 (0/0/3/0) NC (1) -$295 (1/0/0/0)

Notes: Program effects and costs are in 1996 dollars. Program effects pertain to the second year after training (or earlier if second year effects are not available). "Mean Annual Effect" is calculated using one estimate from each study, unweighted. When a single overall estimate is not presented in a study, one is calculated by averaging across sites or demographic subgroups. The range of effects and statistical significance is reported over whatever full- sample, site, or subgroup estimates are reported. "Net Cost" is calculated as an average, unweighted, across studies having cost estimates. "Real Rate of Return" is calculated from "Net Cost" (year 0) and "Mean Annual Effect" (years 1-3 and 1-10). Some studies estimated effects for more than one kind of training, but only total program effect is reported here, except where noted. In a few studies, statistical significance is not reported. n.a. = information not available.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

TABLE 2 (Cont.) EFFECTS OF VOLUNTARY TRAINING PROGRAMS ON PARTICIPANT EARNINGS BY DEMOGRAPHIC GROUP

Real Rate of Return Demographic Group Net Cost of Training If Mean Effect Lasts

and Program Per Participant (Num. of Studies) (Num. of Studies) 3 Years 10 Years

Adult Men National

MDTA (6) $6,053 (1) <0 <0 CETA (9) $8,919 (2) <0 <0 JTPA (1) $1,065 (1) 74% 91%

OJT $1,320 (1) 80% 97% CT $1,172 (1) 70% 88%

Demonstration

JOBS68 (2) n.a. n.a. n.a. SW (1) $13,425 (1) <0 <0

Adult Women National

MDTA (5) $6,053 (1) <0 29% CETA (9) $8,919 (2) <0 15% JTPA (1) $1,500 (1) 41% 64%

OJT $1,059 (1) 94% 109% CT $2,100 (1) <0 15%

Demonstration

JOBS68 (2) n.a. n.a. n.a. SW (2) $15, 244 (1) <0 <0 HHA(1) $9,741 (1) <0 14% NJGD (1) $870 (1) 103% 117% TOPS (1) $2,278 (1) 41% 63%

MFSP (1) $5,882 (1) <0 6% ET (1) $1,931 (1) 26% 51%

Youth National

NYC (2) n.a. <0 <0 JC (4) $11,010 (2) <0 <0 CETA (5) n.a. <Oa <Oa

JTPA(1) $2,006 (1) <0 <0 Demonstration

SW (2) $13,087 (1) <0 <0 JS (1) $6,4M1 (1) <0 <0 NC (1) n.a. <0 <0

a Assuming costs similar to those of adults, the real rate of return would be negative. Key: See Table 1.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1832 Journal of Economic Literature, Vol. XXXV (December 1997)

estimates that are negative and statisti- cally significant, negative and not statis- tically significant, positive and not sta- tistically significant, and positive and statistically significant (statistical sig- nificance defined as a probability level of 10 percent or lower); (d) the net cost of training per participant; (e) the number of studies with cost estimates available;32 and (f) our estimate of the real internal rate of return under two alternative assumptions about how long the estimated mean effect on earnings lasts (three years versus ten years).33

In the JTPA study, researchers pro- vided effects estimates in two formats: "per sample member" and "per partici- pant."34 Because the participation rate within the program group is less than 100 percent, these two estimates differ. The per-sample-member estimate is the basic estimated effect generated by the

random assignment design, calculated as the simple (regression-adjusted) dif- ference in means between the program group and the control group. The per- participant estimate is obtained by di- viding the per-sample-member estimate by the program participation rate, fol- lowing Bloom (1984a). Net cost esti- mates are transformed in the same man- ner, so the rate-of-return calculations are unaffected. To increase comparability of estimates across experiments, we utilize the per-participant estimates from JTPA and adopt the same convention whenever possible for the other experimental stud- ies, making the division using participa- tion rates reported in the evaluations. This convention also makes the experi- mental estimates more comparable with those from the nonexperimental partici- pant/nonparticipant comparison designs.

A broad overview of the results re- veals wide variation among the esti- mates. The estimates of effects on an- nual earnings presented in Table 2 range from a low of -$3,994 for non- black female youth in the Job Corps (re- ported by Gay and Borus, 1980) to a high of $3,749 for female AFDC recipi- ents in the Texas site of the Home- maker-Home Health Aide Demonstra- tions (reported by Bell and Orr 1994). In total, 91 out of the 123 estimates (about three-fourths) are positive. Some 55 estimates are positive and statisti- cally significant, more than might be ex- pected from chance alone, but 15 are negative and statistically significant, also more than might be expected from chance.

Much of the variation in estimated program effects is associated with non- experimental designs. Part of this non- experimental variation stems from a lack of sufficient information to adjust fully for pre-program differences be- tween participants and nonparticipants. For example, the limited prior earnings

32The quality and completeness of cost infor- mation varies considerably across studies. We attempted, whenever possible, to obtain net ad- ministrative cost per participant (excluding oppor- tunity costs to trainees), subtracting out training costs expended on comparison group members. Such estimates represent the additional real re- sources consumed by the program. Net cost esti- mates were, however, generally not available in participant/nonparticipant studies. For some kinds of training (primarily paid work experience), we often could not remove payments to participants from the published cost estimates.

33 In calculating the rate of return, training costs are assumed to be incurred in year 0 and the benefits are assumed to be received in years 1 through 3 or, alternatively, years 1 through 10. If a program exhibits a negative effect on earnings, we assign a negative rate of return. Our rate of return estimates are obviously quite crude and intended to give only a rough sense of whether the govern- ment has made a good investment in the training program. Full-blown benefit-cost studies often al- low benefits to vary over time, consider other benefits besides gains in earnings (e.g., reductions in crime and improvements in health), and other costs (e.g., the opportunity costs of participating in a training program).

34 Orr et al. (1996, pp. 35-37) use the term "as- signees" to denote the full sample of persons ran- domly assigned (whether or not they participate) and the term "enrollees" where we use "partici- pants."

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1833

histories for youths provide only weak explanatory power in predicting their future earnings. Thus, quite large posi- tive and negative estimates of earnings effects for this group have been ob- tained in nonexperimental evaluations.

Another factor contributing to the variation associated with the nonexperi- mental methods is specification uncer- tainty. One of the chief difficulties has been identifying permanent and transi- tory components of variation in pre- program earnings in controlling for differences between participants and nonparticipants. Wide variation among estimates provided by different nonex- perimental specifications applied to the same data set for a single program is documented by Ashenfelter and Card (1985), Barnow (1987), and Dickinson, T. Johnson, and West (1987b) for CETA and by LaLonde (1986) and Fraker and Maynard (1987) applying nonexperi- mental methods to the experimental data of the National Supported Work Demonstration. Heckman and Hotz (1989) argue that appropriate utilization of specification tests can narrow this wide variation, but Friedlander and Robins (1995) find that such tests do not narrow the range of nonexperimen- tal estimates by very much in practice.

Consistently strong evidence has ac- cumulated that government training programs have been effective for adult women. The experimental estimates of JTPA's effects on earnings are positive and statistically significant, and the rate of return on cost in JTPA is large even in the short term. Both MDTA and CETA also exhibited large positive av- erage earnings effects, with all esti- mates positive and nearly all statistically significant. Among all ten programs evaluated for adult women, the mean effect is positive in every one. About three-quarters of the 49 estimates for women in the "Range of Effects" col-

umn of Table 2 are positive and statisti- cally significant, and none is negative. Furthermore, the average effect across programs for adult women is close to $1,400 per year, and in many cases the programs yielded a substantial positive rate of return. Nevertheless, as pointed out by Heckman, Rebecca Roselius, and Smith (1994), as well as others, such earnings effects, although substantial, are not large enough to lift most fami- lies out of poverty. Moreover, for women who head families on welfare, the earnings gains are often partially offset by reductions in transfer benefits.

Evidence has been accumulating for a number of years that training programs have been ineffective in producing last- ing earnings effects for youth. Perry et al. (1975) viewed expenditures on youth in the programs of the 1960s-espe- cially the large expenditures for work experience wages in the Neighborhood Youth Corps (NYC), whose budget grew to exceed even that of MDTA-to a great extent as income transfers, provid- ing little in the way of enduring en- hancements to earning power. As shown in Table 2, studies of the NYC and the Job Corps (JC) have yielded a scatter of positive and negative estimates, with somewhat more of the latter. One non- experimental CETA study, Dickinson, T. Johnson, and West (1984), produced quite large positive estimated earnings effects for youth. The experimental es- timates from the JTPA evaluation, how- ever, are small and bracket zero. This result is especially important because the net costs of working with youth in that program were similar to those for adult men and women, making it there- fore difficult to conclude that youth were not receiving attention from the pro- gram. Moreover, no significant positive earnings effects were found for either male or female youth in any of three program activity clusters or 39 subgroups

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1834 Journal of Economic Literature, Vol. XXXV (December 1997)

examined by the JTPA evaluators. In re- cent years, none of the experimental es- timates from special demonstrations have been positive and statistically sig- nificant for youth. These results support the view that training programs have not been effective in increasing the post-program earnings of youth.

The history of the all-youth Job Corps has been especially interesting. That program's residential facilities have made for a high per-participant price tag. Perry et al. (1975) judged the program's earnings effects to be "mar- ginal" (p. 68), and a large negative esti- mate (the bottom of the range shown in Table 2) was obtained by Gay and Borus (1980). Nonetheless, the Job Corps, be- gun in 1964, is the oldest program in Table 2 still in operation and retains a high degree of popularity with Con- gress. Frequently cited in defense of the Job Corps is the large positive earn- ings effect estimated by Mallar et al. (1982, the top of the range shown in Ta- ble 2), which was based on what some analysts consider to be a relatively strong comparison area design. Re- cently, however, the JOBSTART evalu- ation experimentally tested a nonresi- dential version of the Job Corps and found weak earnings effects. Because of the continued uncertainty about the true effects of the Job Corps, a multi- site, experimental evaluation of this program is currently being conducted.

In contrast to the conclusions about youth, conclusions about men were given a boost by the JTPA evaluation. The estimated average earnings effect shown in Table 2 for men in MDTA is small. In CETA, more than half the ef- fects estimated for men are less than zero, and the average is negative. Esti- mated earnings effects for JOBS68 are half negative and half positive, with a small mean positive effect. The experi- mental estimates from Supported Work

are small but cover special groups (ex- offenders and ex-addicts) that are not fully representative of the main popula- tion of economically disadvantaged adult men. The JTPA earnings effect for men, however, is as large as the average ef- fect for women, and likewise has a high rate of return, even in the short run. The JTPA finding for men, therefore, represents a significant break with the results of past evaluations.

A longstanding problem in program evaluation has been distinguishing the effects of different program activities. The presumption commonly held from the earliest days of federal involvement, and a view expressed by Perry et al. (1975, esp. pp. 6, 28), is that the more skills development an activity achieves, the larger and longer-lasting will be its earnings effects. Skills development is often implicitly associated with the in- tensity and cost of an activity, with greater skills development seen as re- quiring greater effort by participants and greater costs to programs. A key re- search question has been: How much do activities intended to enhance skills ac- tually increase skills, earning power, and long-term earnings? In our view, the evidence is mixed. A link between increased cost and intensity of training and greater earnings effect has not been firmly established.

Activity-specific earnings effects esti- mated for MDTA and CETA do not provide widely credible guidance on this issue. Barnow (1989, p. 121) re- ported a general consensus within the evaluation community that earnings effects of on-the-job training were slightly larger than those for classroom training, but also concluded that there was "only weak evidence that CETA training programs increased the skill levels of participants" (p. 126). Activ- ity-specific earnings effects estimates found in the CETA evaluations suggest

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1835

low rates of return to training in gen- eral and reveal no positive relationship between the cost of an activity and the magnitude of the earnings effect.

Work experience stands out in the pre-CETA and CETA literature as be- ing judged quite costly, owing in part to the stipends paid to participants, but having little actual training content and only a small effect on post-program earnings. During the late 1970s and early 1980s, two kinds of work experi- ence designed specifically to incorpo- rate strong training components were implemented as special demonstrations (Supported Work [SW] and Home- maker-Home Health Aide [HHA]) and were evaluated using random assign- ment designs. The results (Table 2) sug- gest that work experience that provides training can be effective for adult wel- fare women, but they affirm the high- cost view. Moreover, the activity was found to be ineffective for the particu- lar groups of adult men and youth stud- ied. Work experience has become one of the lesser utilized activities in pre- sent-day voluntary training programs.35

Early evaluations of on-the-job train- ing based on nonexperimental analyses of JOBS68 data produced mixed results,

implying positive earnings effects pri- marily for women. During the 1980s, ran- dom assignment evaluations of special demonstrations of on-the-job training for adult welfare women (New Jersey Grant Diversion [NJGD] and Maine TOPS) found positive earnings effects and high rates of return on net cost, even in the short run (Table 2). Results from the JTPA experiment (Table 2) also found positive earnings effects for on-the-job training and suggest high rates of return on cost even in the short run for both adult men and adult women. Whether these positive effects come from skills acquisition, from the wage subsidy given employers as part of the activity, or from program assistance given to en- rollees in finding on-the-job training opportunities has not been established.

Classroom training was not tested by random assignment until the JTPA eval- uation. Results for classroom training in JTPA, primarily in occupational skills rather than remedial basic education, were mixed (Table 2). Positive earnings effects were found for men and women, but these were statistically significant only when the two groups were com- bined, and rates of return were high only for men. Again, it is not clear how much of the earnings effects can be at- tributed specifically to skills acquisi- tion. For youth, the JTPA researchers found no evidence that either on-the- job or classroom training improved earnings (Orr et al. 1996, p. 177).36

35 Work experience without intensive training has retained some appeal in mandatory programs for welfare recipients, where government budget costs can be kept low by substituting welfare pay- ments for work experience wages and where the production of pub ic goods and services by work experience participants is accorded a high value. From the social cost perspective, work experience wages are not counted, because they are merely transfers and do not represent real resources used up. We have excluded them in SW and HHA costs and wherever else the data allow. In addition, the value of goods and services produced by work ex- perience participants in their subsidized jobs may be substantial and would be counted as a social benefit in a complete benefit-cost analysis. For ex- ample, the value of in-program product was large enough to make the total net benefits to society exceed social net costs for two of the four SW sub- groups and for as many as six of the seven HHA sites.

36 In the JTPA evaluation, random assignment to particular kinds of training occurred after indi- viduals were recommended for the training, but before they received it. Thus, the JTPA results re- ported in Table 2 are for the kind of training to which individuals were assigned, not necessarily the training they actually received. Because not all persons received the training to which they were assigned, the results by kind of training for JTPA mix more than one kind of training. The im- plications of this "post-randomization treatment choice" are discussed in Hotz and Seth Sanders (1994).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1836 Journal of Economic Literature, Vol. XXXV (December 1997)

The effectiveness of classroom train- ing may depend crucially on the relative emphasis placed on upgrading general academic skills versus training for a spe- cific occupation. That conclusion is sug- gested by results from the Minority Female Single Parent (MFSP) demon- stration, a random assignment test of intensive education and training for a largely welfare target population. Among the four evaluation sites, post- program earnings effects were small at sites that attempted to correct long- standing reading and math deficits with an initial stint in remedial basic educa- tion. Statistically significant positive earnings effects were found only for the one site-the San Jose Center for Employment Training (CET)-that put participants immediately into skills training for particular jobs, regardless of their previous educational attainment, and followed up with job placement after training (John Burghardt et al. 1992). CET was also the only site with positive earnings effects in the JOB- START demonstration.

The absence of long-term follow-up in most studies is a critical problem in assessing the effectiveness of lengthy and costly skills development activities. The limited evidence available suggests that earnings effects may persist. Ken- neth Couch (1992) found that the ef- fects of the National Supported Work Demonstration lasted for at least eight years after training. Amy Zambrowski and Anne Gordon (1993) found that earnings effects for CET in MFSP per- sisted through at least five years. In analyzing extended follow-up data from the JTPA experiment, the U.S. General Accounting Office (1996) found that ef- fects for adults continued over five years of follow-up, although the later- year effects were smaller than the peak effect and were not generally statisti- cally significant. Friedlander and Bur-

tless (1995) found that the effects of three mandatory welfare-to-work pro- grams emphasizing rapid employment- two low-cost and one middle-cost program -peaked and then declined substantially by the fifth year after the training. The effects of a fourth program, which placed more emphasis on skills upgrad- ing, persisted. A link between the train- ing content of activities and the durabil- ity of earnings effects has not been demonstrated conclusively, however.

Did the shift toward random assign- ment research designs after 1975 yield results that ought to alter opinion about program effects? The experimental evi- dence should reduce concerns that par- ticipating in voluntary training pro- grams may result in significant lost earnings opportunities: the negative and statistically significant estimates in Ta- ble 2 are all nonexperimental except for one (New Chance). The view that pro- grams are relatively effective for adult women remains unchallenged, but the supporting evidence has been strength- ened. For adult men, however, the find- ings of the JTPA experiment suggest more favorable prospects in training than did past nonexperimental esti- mates. In addition, for both men and women, the large estimated rates of re- turn on net cost calculated for the JTPA experiment provide a much more opti- mistic picture than do the nonexperi- mental MDTA and CETA findings. For youth, in contrast, concern about training program effectiveness can only be heightened by the experimental evidence. Generally speaking, experi- mental estimates have exhibited less variation than have nonexperimental estimates, most notably for youth.

One clear contribution of the experi- mental approach has been to advance awareness of substitution and duplica- tion of training activities inside and out- side government training programs. Es-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1837

timates of the net cost of resources used up in training programs have decreased substantially as greater at- tention has been paid to improving es- timates of participation in similar activi- ties by control group members. On the other hand, the experimental approach has made less progress in quantifying the training content of the various program activities and in systematically investigat- ing the critical links between training content and the enhancement of skills, earning power, and actual earnings.

At present, the most important unre- solved issue concerning voluntary train- ing programs for adults is the efficacy of various policy tools intended to in- crease program scale by increasing the number of participants and the intensity and expense of the activities provided to them.37 By how much would changes in incentives, supports, targeting strate- gies, and program operating practices increase participation? How large would be the resulting increases in aggregate program costs and earnings effects?38 The JTPA experiment has provided the most credible evidence to date about

training program effectiveness, but the estimated effects pertain only to the practices and scale typified by the pro- grams actually evaluated. It is unclear whether the high rates of return ob- served in the JTPA experiment would still be observed if the scale of partici- pation were substantially increased.

The apparent difficulty in generaliz- ing effects estimated in a random as- signment evaluation to a range of policy options has been an important and trou- bling theme in recent critiques of the experimental evaluations of training programs (Section V.A.3). In our view, one particularly telling ramification of these critiques has been the failure of the experimental method thus far to address systematically and rigorously the scale issue. We think it would be an error to read these critiques as a call for a return to the nonexperimental methods utilized in the past. The contribution of these critiques has been, instead, to identify serious problems in generalizability that random assignment does not solve.

VI. Evaluating Mandatory Training Programs

Analytically, the most important as- pect of mandatoriness in a training program is the possibility of program effects on enrollees who do not partici- pate in formal program activities. Such effects may be produced directly on welfare income by financial penalties, called sanctions, which typically amount to about 20 percent of the monthly wel- fare check. Effects on nonparticipants may also be produced indirectly as indi- viduals find employment or leave wel- fare to avoid pressure from program staff to comply with a time-consuming participation requirement, the so-called deterrent effect. In this connection, it is important to note that mandatory wel- fare-to-work programs generally permit

37 Steven Sandell and Rupp (1988) find that, among adults, as few as 2.3 percent of the target group defined by law (i.e., persons fitting the defi- nition of "economically disadvantaged") are en- gaged in JTPA activities, although the percentage is much greater (about 13 percent) among the small minority of this population who are jobless and looking for work. Orr et al. (1996, p. 232, n. 10) cite another 2 percent estimate for voluntary participation among welfare recipients in the Homemaker-Home Health Aide (HHA) Demon- strations.

38 In considering the consequences of increas- ing program effort, Friedlander (1993) posits an inverted U relationship between the degree to which a subgroup is disadvantaged and the amount of program effort required to obtain an earnings effect of a given magnitude. For the most disadvantaged program enro lees, there may be a "threshold"of program effort below which no ef- fects on employment or earnings will be realized. If this threshold is high, then a Large amount could be spent to increase participation among the most disa vantaged without producing much effect on their earnings.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1838 Journal of Economic Literature, Vol. XXXV (December 1997)

enrollees to engage in part-time em- ployment while they remain on welfare as a substitute for participation in a training program activity. Indeed, em- ployment that is a substitute for partici- pation may occur as often as program participation itself in some programs, especially in states where welfare grant levels are high enough to permit sub- stantial concurrent mixing of work and welfare (see, e.g., Hamilton 1988, p. xviii). The potential for sizable program effects on nonparticipating program en- rollees makes the interpretation of esti- mated earnings effects for mandatory programs different from voluntary pro- grams.

A. Accounting for Program Effects on Nonparticipants

To incorporate the potential effects of mandatory programs on nonpartici- pants, equation system (2) is modified by adding an additional term to the first equation, creating system (3):

Yit = ctXi + btPio + htTi(1 - Pio) + uit, (3a)

Pio = aoZi + goTi + eio. (3b)

Again, the dummy variable Ti indicates whether the program "offer"-or, in the mandatory case, the program "require- ment"-is or is not in effect for individ- ual i. The new coefficient, ht, is the pro- gram effect on nonparticipants who are covered by the participation require- ment.

As with equation system (2), estimat- ing the full effect of the program in sys- tem (3) requires an evaluation sample that includes some individuals for whom Ti = 1, the program group, and others for whom Ti = 0, the comparison group. These groups may be created in areas with and without the program or within a single area among individuals who are and are not subject to the pro- gram participation requirement. The

latter kind of comparison group may be created by random assignment. As be- fore, both program and comparison groups will have participants and non- participants in training activities, al- though it is naturally assumed that par- ticipation will be greater when Ti = 1.

The total effect of the program on in- dividuals subject to program participa- tion requirements can be found by sub- stituting (3b) into (3a), evaluating Yit at Ti = 1 and Ti = 0, and taking the differ- ence (noting that Ti x Ti = Ti for the dummy variable), which is

btgo + ht[1 - Pio(Ti = 1)], (3c)

where Pio(Ti =1) represents aoZi + go, the probability of participation when Ti = 1. An estimate of the entire expression (3c) can be obtained as the coefficient of Ti in a regression of Yit on Ti and control variables. The first term in this expres- sion, btgo, is the same as the coefficient of Ti in equation (2a'). The new term is the program's effect on nonparticipants, ht, multiplied by the probability of non- participation, 1 - Pio(Ti = 1), among those covered by the participation re- quirement.

Although the entire expression (3c) may be estimated from a comparison group or comparison area design, unique estimates of bt and ht cannot be recovered, even if go and 1 - Pio(Ti = 1) have been estimated from equation (3b). Thus, in contrast to voluntary pro- grams, comparison area and comparison group (including random assignment) designs for mandatory programs typi- cally cannot provide valid estimates of the earnings effects of participating in activities, bt. Consequently, estimates of participation effects do not generally appear in mandatory program evalu- ation studies. Nor is dividing published effects estimates by the program par- ticipation rate a valid option, as it is for voluntary program evaluations. Direct

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1839

comparisons of the estimated effects of voluntary and mandatory programs, therefore, cannot be made, even when experiments are used to evaluate both. Valid comparisons can, however, be made using estimated internal rates of return.39

B. Empirical Evidence for Mandatory Programs

Table 3 lists the main mandatory training programs and the 16 evaluation studies of them. As indicated, over the last 20 years, all the major evalu- ations of mandatory programs have been based on random assignment de- signs. Table 4 shows estimated program effects. All the studies shown for adult welfare men were part of studies that included adult welfare women. All WIN and JOBS studies in Table 4 used simi- lar (random assignment) designs and data sources, providing an unusually high degree of comparability among themselves. Typically, about 50 percent of the samples of randomly assigned program enrollees actually participated in a formal program activity within about a year of entry into a WIN or JOBS program. Two categories are shown for WIN programs. The earlier WIN studies examined the least costly programs, which offered mainly super- vised job search, with unpaid work ex- perience included in some of them (WIN-JS/WE in Table 4). Two some- what higher-cost efforts that added re- medial education and vocational train- ing activities are classed as "WIN mixed

services" programs (WIN-MIXED). The net costs of the evaluated JOBS pro- grams are greater than those for WIN because JOBS programs assigned a sig- nificant share of enrollees to education and training.

The great majority of the earnings ef- fects shown in Table 4 for mandatory programs are positive. Within each WIN/JOBS category, effects are larger for women than for men, and rates of return are higher. Most estimated ef- fects for women are statistically signifi- cant; most estimated effects for men are not. These gender differences in fa- vor of women in mandatory programs are consistent with the gender results for voluntary programs prior to JTPA. The JTPA rates of return for men are clearly larger than those for men in mandatory programs.

Economists generally presume that mandatory programs will have smaller earnings effects per dollar spent than will voluntary programs because they include many enrollees who do not ex- pect much financial benefit from par- ticipation. The results for men in WIN/JOBS versus JTPA affirm this view, but otherwise we do not find strong and consistent support for it (see also Orr et al. 1996, pp. 203-05). Moreover, differences for men between WIN/JOBS and JTPA may result from the more disadvantaged population in mandatory programs or simply from the fact that WIN/JOBS enrollees are re- ceiving welfare and nearly all JTPA men are not. The earnings effects of manda- tory programs have also sometimes been constrained by the necessity of spreading a modest program budget quite thinly over a large mandated tar- get population, limiting the ability to promote significant and sustained pro- gram activity (e.g., FSETP).

Mandatory program evaluations, like those for voluntary programs, have pro-

39For the sake of completeness, it should be noted that bt for mandatory programs can be estimated within the context of a comparison- group/area design by a nonexperimental partici- pant/nonparticipant comparison in which nonpar- ticipants are not subject to the program mandate (i.e., for whom Ti = 0), providing, of course, that selection bias can be removed. If bt can be esti- mated in such a fashion, then ht can be recovered as well.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1840 Journal of Economic Literature, Vol. XXXV (December 1997)

TABLE 3 TRAINING PROGRAM EVALUATION STUDIES

MANDATORY PROGRAMS

Years of Main Program Scope Operation' Target Group Activities

WIN-JS/WE NAT, DEM 1967-1989 AFDC recipients JS, UWE

WIN-MIXED DEMb 1982-1987 AFDC recipients JS, UWE, CT FSETP NAT 1987-present Food Stamp recipients JS JOBS NAT 1989-1996c AFDC recipients JS, UWE, CT

a "Years of Operation" do not necessarily coincide with dates of authorizing legislation. Calendar years are shown, not fiscal years. b We classify Baltimore Options as a special demonstration because its organization differed from most WIN programs. c Since 1996, states have continued to operate mandatory welfare-to-work programs under legislation that created welfare block grants to states.

duced mixed evidence about the ability of more intensive and expensive skills development activities to increase skills, earning power, and long-term earn- ings.40 In general, larger dollar expen- ditures have not produced markedly larger earnings effects and have there- fore met with decreasing rates of return (see Table 4). For women, who have the more positive results, short-term rates of return calculated from mean effects are positive and very large for WIN- JS/WE and WIN-MIXED, but not for the more costly JOBS programs, for which only the longer-term rates are positive. The JOBS results may be asso- ciated with extensive use of remedial education as a first activity, rather than immediate assignment to job-specific training that was used effectively in the

voluntary CET program described ear- lier.

Particularly important in this connec- tion are findings from a random assign- ment evaluation of California's JOBS program, GAIN, which stands out na- tionally in the large scale and expense of its investment in remedial education. Earnings effects for those two-thirds of program enrollees who were specifically targeted for the education activities were relatively modest in light of the additional net costs incurred for them (Riccio, Friedlander, and Stephen Freedman 1994, p. 260). Further, a post-program test of basic reading and math skills, an innovation in training program research, revealed improve- ments in these skills in only one of five research counties (Friedlander and Karin Martinson 1996). Finally, the largest effects on earnings were found in the one locality that emphasized

40 See Friedlander and Gueron (1992) for an analysis of the relationship between net cost and program effects in 13 welfare-to-work programs.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1841

TABLE 3 (Cont.) TRAINING PROGRAM EVALUATION STUDIES

MANDATORY PROGRAMS

Method of Program Evaluation Study Evaluation

WIN-JS/WE Goldman (1981), Wolfhagen (1983), Friedlander et al. XL (1985," 1986, 1987), Riccio et al. (1986)"d Goldman,

Friedlander, & Long (1986) WIN-MIXED Friedlander (1987)"d Friedlander and Hamilton (1996)'/ XL

FSETP Puma and Burstein (1994) XL JOBS Fein, Beecroft, & Blomquist (1994), Blomquist (1994),e XL

Kemple, Friedlander, & Fellerath (1995), Riccio, Friedlander, & Freedman (1994), Freedman et al. (1996),

Hamilton et al. (forthcoming)

d Supplemental follow-up contained in Friedlander and Burtless (1995). e Cost estimates were obtained from personal communication with the author. Key: WIN-JS/WE = Work Incentive Program emphasizing supervised job search and work experience; WIN- MIXED = Work Incentive Program incorporating education and/or training; JOBS = Job Opportunities and Basic Skills Training Program; FSETP = Food Stamp Employment and Training Program; NAT = national; DEM = special demonstration; AFDC = Aid to Families with Dependent Children; JS = job search training and assistance; UWE = unpaid work experience; CT = classroom training (basic education and occupational skills training); XL = experimental.

rapid employment over initial assign- ment to extended education activities.

Effects on welfare receipt and wel- fare payments (not shown in the table) have been found for some programs but not for others. The magnitude of wel- fare effects appears to depend in large part on the goals of local program administrators: higher earnings and greater "economic security" versus in- creased employment and more rapid welfare case closure. Regardless, expen- ditures for welfare-to-work programs have remained small compared with to- tal welfare payments. Thus, even pro- grams producing relatively large welfare effects reduce average payments by only 10 to 15 percent in the short term, and these effects tend to decrease over time. The great majority of welfare re- cipients who would have remained on public assistance without WIN or JOBS did so even with those programs.

VII. Estimating Effects of Government Training Programs on Society

To determine whether government training programs are socially efficient, evaluations cannot be limited to effects only on persons enrolled in the pro- gram. They must also take account of effects on persons not enrolled in the program. Doing this, however, is useful only if the program is found to have positive effects on those who enroll. Otherwise, effects on society as a whole are likely to be negative or, at best, neg- ligible. If the program does have a posi- tive earnings effect on those who re- ceive training, then the effect on society may be positive, depending on the social costs incurred in operating the program. The usual way of taking account of societal effects is through benefit-cost analysis. For the reasons discussed below, existing benefit-cost

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1842 Journal of Economic Literature, Vol. XXXV (December 1997)

TABLE 4 EFFECTS OF MANDATORY TRAINING PROGRAMS ON EARNINGS OF INDIVIDUALS COVERED

BY THE PARTICIPATION REQUIREMENT BY DEMOGRAPHIC GROUP

Range of Effects (if more than one) Demographic Group (num. negative and stat. sig./num. negative

and Program Mean and not stat. sig./num. positive and not stat. (Num. of Studies) Annual Effect sig./num. positive and stat. sig.)

Adult Welfare Mena WIN-JS/WE (1) $190 (0/0/1/0) WIN-MIXED (1) $448 (0/0/1/0) JOBS (2) -$28 -$448 to $1,594 (0/3/2/1)

Adult Welfare Womenb WIN-JS/WE (7) $438 -$56 to $813 (0/1/1/5) WIN-MIXED (2) $728 $710 to $746 (0/0/0/2) JOBS (4) $444 $88 to $1,145 (0/0/4/7)

Food Stamp Recipients FSETP (1)c -$86 (0/1/0/0)

a WIN and JOBS results for "Adult Welfare Men" pertain to the two-parent AFDC-U welfare category, whose case heads are nearly all male. b WIN and JOBS results for "Adult Welfare Women" pertain to the single-parent AFDC basic welfare category, whose case heads are nearly all female. c The evaluation sample for FSETP was 58 percent male and 42 percent female.

findings for training programs are sub- ject to a great deal of uncertainty.

A. The Contemporary Benefit-Cost Approach

The accounting framework used to- day in conducting benefit-cost analyses of training programs was developed in the late 1960s by Einar Hardin and Borus (1969) and refined in the early 1980s by Peter Kemper, David Long, and Craig Thornton (1981). A very sim- ple version of this framework is given in Table 5.

Only benefits and costs that are typi- cally estimated are listed in Table 5. Dollar amounts are indicated in the ta- ble as the program effect on earnings (E), tax payments (T), welfare payments (W), and net costs (C). The plus and minus signs indicate whether each amount is expected to be a benefit (+)

or cost (-) from the perspectives of three groups: persons enrolled in the program, persons not enrolled in the program (including taxpayers, who pay the cost of operating the program), and the whole of society (enrollees plus nonenrollees). As indicated, benefits and costs to society are simply the alge- braic sum of benefits and costs to en- rollees and nonenrollees. Hence, the framework implies that if a training pro- gram causes a decline in transfer pay- ments received by program enrollees (for example, unemployment compensa- tion or welfare payments), then this de- cline should be regarded as a cost to en- rollees (albeit one that may be offset by earnings increases); as a savings or benefit to taxpayers; and as neither a benefit nor a cost to society, but simply a transfer of income from one segment of society to another. One goal of train-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1843

TABLE 4 (Cont.) EFFECTS OF MANDATORY TRAINING PROGRAMS ON EARNINGS OF INDIVIDUALS COVERED

BY THE PARTICIPATION REQUIREMENT BY DEMOGRAPHIC GROUP

Real Rate of Return Demographic Group Net Cost of Training If Mean Effect Lasts

and Program Per Enrollee (Num. of Studies) (Num. of Studies) 3 Years 10 Years

Adult Welfare Mena WIN-JS/WE (1) $1,120 (1) <0 <0 WIN-MIXED (1) $1,150 (1) 8% 8%d

JOBS (2) $2,149 (2) <0 <0 Adult Welfare Womenc

WIN-JS/WE (7) $412 (7) 91% 106% WIN-MIXED (2) $1,344 (2) 29% 53% JOBS (4) $1,936 (4) <0 19%

Food Stamp Recipients FSETP (1)d $173 (1) <0 <0

d The observed time pattern of effects indicates a rapid decline well before 10 years, making the 10-year rate of return similar to the 3-year rate of return. Notes: See Table 2. Key: See Table 3.

ing program benefit-cost analysis, as the last row of Table 5 suggests, is to deter- mine whether the program being evalu- ated has a positive or negative payoff from each of the three perspectives rep- resented by the three columns. The so- cietal perspective is usually viewed by economists as the appropriate one for assessing the efficiency of a training program. Policy makers, however, often focus on persons not enrolled in the program because the net effect on this group also determines whether the pro- gram increases or decreases govern- ment budgetary requirements.

The framework shown in Table 5 em- bodies several assumptions typically made in conducting benefit-cost analy- ses of training programs. A few of the more important of these assumptions are briefly discussed below.

1. Distributional Issues. In benefit-

cost analyses of training programs, dol- lars gained or lost by program enrollees are usually valued the same as dollars gained or lost by nonenrollees. En- rollees, however, have much lower in- comes, on average, than do nonen- rollees. The marginal utility of income, therefore, may differ between the two groups. This issue is relevant whenever a training program makes enrollees bet- ter off and nonenrollees worse off, or vice versa. For example, benefit-cost analyses of mandatory programs for welfare recipients indicate that some (but not all) of these programs reduce the incomes of enrollees but result in net gains for nonenrollee taxpayers (see Anthony Boardman et al. 1996, Table 14.1). A considerable literature exists concerning the possibility of treating this issue by giving each dollar of the gains and losses of relatively low-in-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1844 Journal of Economic Literature, Vol. XXXV (December 1997)

TABLE 5 STYLIZED ACCOUNTING FRAMEWORK FOR TRAINING PROGRAM BENEFIT-COST ANALYSIS

Persons Persons Not Enrolled in the Enrolled in the Society

Variable Program Program (row sum)

Program effect on Earnings +E 0 +E Tax payments -T +T 0 Welfare payments -w +W 0

Net program operating costs 0 -C -C Net effect of program (column sum) ? ? ?

come persons greater weight in benefit- cost analyses than each dollar of the gains and losses of higher-income per- sons (see Boardman et al. 1996, ch. 14, and the references contained therein). Because the weights needed to do this are unknown, training program benefit- cost analyses do not explicitly treat the gains and losses of lower- and higher-in- come persons differently. Instead, as indicated above, benefits and costs are reported separately for program en- rollees and nonenrollees so that policy makers can examine the effects on each group and apply their own subjective weights.

2. The Extrapolation Problem. As- sessing whether government training programs are socially efficient depends critically on how long earnings in- creases last for program enrollees.41 Program benefits will be smaller if earnings increases quickly fade (or "de- cay") and much larger if they last for the remainder of the enrollees' working lives. Unfortunately, data on earnings effects are usually limited to three years or less. Benefit-cost analysts must then project earnings effects

into the future without any firm em- pirical basis. Often a "sensitivity analy- sis" is conducted by making several different assumptions about the long- run time path of earnings effects. Such an analysis can illustrate the magnitude of uncertainty but does not diminish it.

3. Intangible Effects. Intangible ef- fects include the value of leisure for- gone by program enrollees and the value of satisfaction gained by both en- rollees and nonenrollees from the sub- stitution of earnings for welfare pay- ments. Because they are difficult to measure, intangible effects are rarely assigned a value in evaluations of gov- ernment training programs. This prac- tice may be conceptually unsound. For example, if a training program causes a program participant to work more, the individual's real net gain is not his or her financial gain (as implied by the usual training program benefit-cost framework) but rather his or her in- crease in utility. The few attempts that have been made to impute the real net gain rather than the financial gain (Bell and Orr 1994; Greenberg 1997), suggest that the difference between the finan- cial gain and the real net gain can be appreciable.

The issue of unobserved increases in utility arises in measuring the bene-

41 Because the benefits of training programs typically occur after the costs, discounting is nec- essary. There is, of course, some controversy among economists concerning the appropriate so- cial discount rate.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1845

fits of government training programs for nonenrollees. No attempt has ever been made to elicit taxpayers' will- ingness to pay for the substitution of work for transfer payments induced by training programs. One approach that could be used for this purpose is contingent valuation, which utilizes surveys to attempt to measure will- ingness to pay for changes in the quan- tity and quality of goods not exchanged in markets. (See Richard Bishop and Thomas Heberlein 1990, for an over- view.) Considerable controversy sur- rounds the validity of contingent valuation, however (Jerry Hausman 1993).

B. General Equilibrium Effects

Government training programs may have important effects on the behavior and well-being of some persons not en- rolled in a program. These effects are almost never taken into account in training program benefit-cost analyses. Two such effects are entry effects and displacement effects. Empirical evi- dence about the magnitude of both of these effects is quite limited. Our as- sessment of the theoretical arguments is that the importance of entry effects is somewhat speculative, whereas dis- placement could substantially undercut the social benefits of government train- ing programs, reducing them well below the benefits measured in a typical bene- fit-cost analysis.

1. Entry and Deterrent Effects. If training program services are perceived as beneficial, some persons who are initially ineligible to participate may leave their jobs in order to qualify (an "entry" effect). On the other hand, in the case of mandatory programs for wel- fare recipients, some individuals who might otherwise have entered the wel- fare rolls may decide not to do so to avoid the "hassle" of participating (a

"deterrent" effect).42 Manski and Gar- finkel (1992a) and Moffitt (1992, 1996), among others, argue that program entry effects or deterrent effects could be substantial.

Findings on entry effects are avail- able from five aggregate-level time se- ries studies that examine how training programs affect applications for wel- fare. However, the value of the empiri- cal results is reduced by their sensitivity to model specification changes.43 The results, in our view, are inconclusive. All five studies compare welfare appli- cation rates in sites that have a training program for welfare recipients with ap- plication rates in sites that do not have training. Three of the five studies are consistent with expectations that volun- tary programs for welfare recipients en- courage entry onto the welfare rolls and mandatory programs discourage entry. Of the two studies of voluntary pro- grams, T. Johnson, Daniel Klepinger, and Fred Dong (1990) find that a volun- tary program in Oregon had a positive entry effect, but Wissoker and Harold Watts (1994) do not find a positive en- try effect for a voluntary program in the state of Washington. Of the three stud- ies of mandatory programs, two indicate, as anticipated, that entry effects were negative (Fisher Chang 1996; and Eliza- beth Phillips 1993), while the remaining study finds no evidence of a negative en- try effect along with a much larger in-

42This is conceptually similar to welfare recipi- ents leaving the rolls when they are informed that they will be subject to newly established manda- tory work or training requirements. Such "exit ef- fects"occurred (but were not separately identified) under the experimental evaluations of mandatory pro grams discussed earlier.

UProgram effects on overall caseload levels may be due not only to changes in the number of appli- cations, but also to changes in the fraction of ap- plications accepted onto the welfare rolls and to changes in the number of exits from the rolls. Only changes in the number of applicants repre- sent an entry effect.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1846 Journal of Economic Literature, Vol. XXXV (December 1997)

crease in exits (Bradley Schiller; and C. Nielsen Brasher 1993).44

2. Displacement. Training program graduates may end up in jobs that otherwise would have been held by in- dividuals not in the program (G. Johnson 1979). If these displaced indi- viduals become unemployed or accept lower-wage jobs, their earnings will fall. The social effect of training programs on employment and earnings will there- fore be less than effects for program graduates. Despite these potential ad- verse effects, there is virtually no re- search quantifying the magnitude of dis- placement caused by training programs for the economically disadvantaged.45

Several arguments have been put forward to suggest that displacement may not seriously undermine training program effectiveness. First, macro- economic policy may be able to expand employment enough to absorb new training program graduates and thereby prevent displacement. Second, as Co- hen (1969) and G. Johnson (1979) point out, if training program partici- pants are less likely to seek employ- ment while they are in training than they otherwise would have been, then more jobs will be open to nonpartici- pants, at least temporarily. Third, as emphasized by G. Johnson (1979) and

Katz (1994), if training programs can impart skills that allow trainees to leave slack occupational labor markets for tight ones, then they can decrease the competition for job vacancies in the slack markets, making it easier for those who remain in these markets to find jobs. Such a possibility could produce a result that is the exact opposite of a displacement effect: total employment could increase by more than the number of persons who are trained.

VIII. Conclusions and Agenda for Future Research

Evaluations of government training programs for the economically disad- vantaged have yielded important infor- mation about the effectiveness of such programs. Nonetheless, some uncer- tainty remains about the returns to so- cial expenditures on existing programs, and large open questions persist about strategies for improving effectiveness in the future.

A. What We Know

Most of what we know about training programs concerns their costs and their short term financial effects on persons enrolled in them. The most optimistic findings are for adult women. Nearly every evaluation of training programs for this group has found positive earn- ings gains, and most of the estimates have been statistically significant. Al- though these gains may appear modest in absolute terms, the public invest- ment in these programs is also modest. The implied social rate of return on the resources expended on these programs is, in fact, sometimes quite high, and continued public funding seems war- ranted on this basis. These favorable re- sults for women hold not only for volun- tary programs but also for mandatory

44In addition to the studies cited in the text, Moffitt (1996) illustrated entry effects for volun- tary and mandatory programs using a microsimula- tion model. Moffitt's analysis suggests that a man- datory program for AFDC recipients with a heavy participation time requirement would reduce en- try into welfare, but a voluntary program would increase entry. Much of the latter effect results from an assumed reduction in the stigma attached to welfare receipt.

45 During the 1970s, there were a number of empirical studies of displacement. These studies focused almost exclusively on the extent to which unemployed workers absorbed by public sector job-creation programs displaced regular govern- ment workers and provide little insight into dis- placement in the private sector that would result from training programs.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1847

programs, in which adult women on welfare are the principal target popula- tion. The earnings increases to individu- als required to participate are, however, often partially offset by decreased wel- fare benefits, reducing the net effect on income.

The full set of evaluation findings for adult men leaves more uncertainty. Nevertheless, the estimated effects for adult men in the experimental evalu- ation of the current national voluntary training system, JTPA, are encouraging. The few results for adult men receiving welfare in mandatory programs show less positive earnings effects.

Evaluation findings for youth are of special interest, because youth ex- perience a relatively high degree of labor market difficulty and because permanent increases in earning power for youth could potentially provide returns to social investments over a relatively long post-program working life. With the possible exception of the Job Corps, however, no training program has been found effective for youth. The evidence for the Job Corps, which has by far the greatest per-par- ticipant cost of any currently operating government training program, is mixed. Findings from a nationally based ex- perimental evaluation of the Job Corps, currently under way, will be crucial in resolving uncertainties about the program's effectiveness. Negative re- sults from this evaluation would rein- force the serious questions about the efficacy of continued expenditures on traditional kinds of training programs for young persons, at least insofar as more favorable labor market outcomes are the program objective. Even posi- tive results will not diminish the need to develop new and less expensive strategies that will work effectively for large numbers of youth, because the high cost of Job Corps limits the

size of the population that can be served.46

Even if some training programs have substantial positive effects on the earn- ings of participants, their aggregate ef- fects would appear to be quite modest. It seems clear that the aggregate effects of JTPA are minimal, both on the le- gally defined target population and on the labor force as a whole. Program budgets are simply too small to permit JTPA to reach much of its target popu- lation (cf. Heckman, Roselius, and Smith 1994). The mandatory programs operated under JOBS may have pro- duced some reductions in aggregate welfare receipt in localities that have moved aggressively to expand program coverage, but contributions to reduced poverty almost certainly have been slight. At the same time, evaluation re- search has failed to provide a solid em- pirical basis for expecting that in- creased funding would not run up against sharply decreasing returns. One program feature determined by funding is scale-the number of participants en- gaging in program activities. For volun- tary programs, discussions about the ef- fects of increasing total participation must be viewed as fundamentally specu- lative, even for programs and target groups for which effectiveness at cur- rent spending and participation levels has been demonstrated. For mandatory programs, there is some empirical evi- dence indicating that recently legislated increases in the share of the welfare population covered by participation re- quirements have increased the total program effect on earnings and welfare receipt without dramatically increasing per-enrollee costs. Evidence has been inconsistent, however, regarding addi- tional earnings effects obtained from

46 Possible strategies for improving the eco- nomic circumstances of youth are discussed in Orr et al. (1996, pp. 216-31 passim).

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1848 Journal of Economic Literature, Vol. XXXV (December 1997)

working with greater numbers of more disadvantaged individuals within a de- fined mandatory population.

The second program feature deter- mined by funding is the intensity and duration of skills-building activities in which participants engage. It is these activities-classroom and on-the-job training rather than the lower-cost per- sonal counseling, job-search assistance, and direct placement-that are looked to for lasting improvements in individ- ual productivity and earning power. The most heartening empirical findings with respect to skills development are the long-term earnings effects found re- cently in analyses of extended follow-up data for a few skills-oriented programs. There remains, however, a lack of com- pelling evidence that skills-building ac- tivities have actually enhanced skills that are of value to employers or have accounted for a dominant share of pro- gram earnings effects relative to the lower-cost activities bundled with them. Indeed, evaluation results for both voluntary and mandatory programs heighten rather than allay concerns about the cost-effectiveness of more ex- pensive program components and sug- gest that the fine details of program organization, in ways as yet poorly un- derstood, may be critical in determining the success of skills-building efforts.

B. What We Need to Know

Several questions need to be ad- dressed in future evaluations of govern- ment training programs for the eco- nomically disadvantaged. Overarching all is this question: How and by how much can the aggregate effects of gov- ernment training programs be increased in a cost-effective manner? The ques- tion has three priority subtopics: youth, scale, and skills. Clearly, one top prior- ity is to find cost-effective techniques for working with disadvantaged out-of-

school youth. A second high priority is to answer questions pertaining to the cost-effectiveness of increasing program scale: Can total participation in volun- tary programs be increased at a reason- able cost? Can targeting strategies be developed to limit inefficient substitu- tion of program for nonprogram train- ing? If so, can the effectiveness of exist- ing service mixes be maintained over a broader, probably more disadvantaged population? Third, can additional inten- sive, skills-building activities be orga- nized in a fashion that will produce last- ing earnings effects large enough to make their use cost-effective at large scale, especially for the more disadvan- taged? At a somewhat lower priority, we ask: To what extent does displacement offset the measured employment and earnings effects? This issue, although recognized for nearly three decades to be potentially important, has yet to be tackled in any meaningful way. Finally, entry effects deserve some research at- tention as a potential threat to program effectiveness.

C. Agenda for Future Evaluations

The prospects for original and useful work by economists in addressing these questions, we believe, are significant. Some of this work must occur outside the context of specific evaluations. At the most basic level, economists need to do additional work toward developing a fully fledged economic theory of evalu- ation, a theory that would provide bet- ter guidance about the most valuable kinds of information that could be gen- erated from evaluations. Theoretical work also needs to be done to define better the optimal role of government training programs for the economically disadvantaged in relation to academic and vocational schooling and to training provided by employers on the job. In addition, we suspect that some immedi-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1849

ate progress may be made on the issue of displacement by reviewing existing labor market literature on substitution across grades of labor, both to provide empirical information and to formulate hypotheses that could be tested in fu- ture evaluations. Beyond that, only the evaluation of programs implemented at maximum scale within localities appears to us to be capable of yielding estimates of market-wide displacement effects. Finally, entry effects resulting from training programs targeted at welfare recipients can continue to be studied by collecting time-series data on welfare application rates for sites that adopt large-scale intensive programs. Although studies based on aggregate data are often problematic owing to the diffi- culty of controlling for other factors that influence application rates, they are inexpensive to conduct. Only a few such studies presently exist, and more should be undertaken. In addition, some progress is being made in designing field experiments capable of measuring entry effects using individual-level data (Card, Robins, and Winston Lin 1997).

In planning future program evalu- ations, the emphasis should be on ad- dressing the above-mentioned issues of youth, scale, and skills. Toward that end, we believe it important to continue the trend away from traditional "black box" evaluations that yield only a sum- mary estimate of program effectiveness. Instead, evaluation resources should fo- cus on improving training technique. Closer study of training program tech- nique means looking at those aspects of training associated with success or fail- ure, given the level of funding. Ways of working more effectively with youth, ways of increasing program participa- tion without decreasing average earn- ings gains, and ways of organizing skills- building activities to get the most out of them-these should all be the subjects

of studies of program technique. Future evaluation designs should make greater use of direct comparisons of competing candidates for best program practice. One method for testing the effective- ness of alternative "service strategies" was used in the JTPA evaluation, which estimated experimental effects sepa- rately for three clusters of activities by randomizing sample members after pro- gram intake staff had recommended them for specific services. Another method, employed in some JOBS evalu- ation sites, was to randomly assign all program enrollees, regardless of pro- gram staff preferences, to either a rapid employment program approach or to an approach that aimed for long-term skills development. Finally, some theoretical work has been undertaken to develop feasible designs for comparing program approaches across randomly assigned local offices (Greenberg, Meyer, and Wiseman 1993). These research designs do not come free of conceptual difficul- ties, however, and they also present se- rious practical challenges, not the least of which is simply maintaining the dis- tinctiveness of the competing service strategies and assuring high participa- tion rates in the activities of interest.

Under any of these designs, training techniques demonstrated to be highly effective in the study sites can be repli- cated and evaluated at additional sites to determine whether the original fa- vorable results are generalizable. Some efforts along these lines are currently taking place, such as the replication and evaluation in Los Angeles of a success- ful welfare-to-work program tested ear- lier in Riverside, California (Riccio, Friedlander, and Freedman 1994) and a multi-site replication and evaluation for youth of the San Jose CET program that produced earnings gains in MFSP and JOBSTART. There is, however, a need for further work in integrating ex-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1850 Journal of Economic Literature, Vol. XXXV (December 1997)

perimental and nonexperimental meth- ods. Nonexperimental methods could be an important adjunct to random assignment in determining the path through which training programs influ- ence earnings, such as through changes in educational attainment. They could also be important for increasing our un- derstanding of the determinants and consequences of population participa- tion rates and of substitution of activi- ties across government and nongovern- ment providers, across periods of an individual's life cycle, and across epi- sodes of nonemployment and employ- ment.

Studying training technique will re- quire additional and more detailed data than have been collected in most pre- vious evaluations. Clearly, to assess in- tensive skills-development activities, the focus must shift from short- to long- term earnings effects. As a partial sub- stitute for long-term results, however, greater attention should be paid to measuring program effects on the skills demands of jobs entered and on hourly wage rates and other terms of employ- ment, including prospects for future on-the-job training and wage growth, which, in theory, should be improved by skills upgrading. More detail will be re- quired in describing the nature of the training activities and the behavior of participants who engage in them, both to document that the prescribed train- ing was received and to serve as a basis for replicating approaches that prove successful. Special consideration will have to be paid to measuring substitu- tion of services provided by the pro- gram being evaluated for training ser- vices available elsewhere in the same locality. Efforts will have to be directed toward determining whether skills have actually been acquired, to what extent, and whether the acquired skills are val- ued by employers. To achieve these

measurement objectives, increasing re- search outlays on in-classroom observa- tion and on relatively expensive surveys and pre-and post-program skills tests for study participants would appear un- avoidable. Without detailed knowledge about the nature of the training, how it was administered, who received it, what activities it replaced, and whether it ac- tually increased the skill level and pro- ductivity of the trainee, it will be diffi- cult to draw firm conclusions about the relative effects of different program ac- tivities.

An important issue in studying train- ing technique involves the degree of control that evaluators should have over the program being evaluated. In a typi- cal training program evaluation, state or local administrators usually choose the array of services to be offered and, per- haps with input from participants, de- cide who receives each service. The evaluator then attempts to measure the effect of the program as implemented. Such an approach can hinder the ability to study new program techniques. Inno- vations in technique are, by their na- ture, often difficult to find in practice and must sometimes be set up spe- cifically for the purpose of study. In addition, when evaluators control the services for which each program partici- pant is eligible, they are better able to maintain the distinctiveness of alterna- tive service streams and to determine the differential effect of several service combinations. Greater control over the programs to be tested, however, can often be gained only when research budgets include substantial resources to compensate local agencies for changes in their program operations.

Were additional research funding available, it could be devoted to investi- gating various hypotheses advanced about group dynamics in training. These hypotheses stem from the nas-

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1851

cent literature on potential "community effects" and concern influences on mo- tivation coming from classroom peers and from the participant's social and community context. Economists have been responsible for most major evalu- ations of training programs, but they do not have special expertise in these moti- vational issues. In this area, then, economists might find it fruitful to work more closely with measurement experts in sociology, psychology, education, and urban anthropology.

REFERENCES

ANDERSON, KATHRYN H.; BURKHAUSER, RICH- ARD V. AND RAYMOND, JENNIE E. "The Effect of Creaming on Placement Rates under the job Training Partnership Act," Ind. Lab. Relat. Rev., July 1993, 46(4), pp. 613-24.

ANGRIST, JOSHUA D. AND IMBENS, GUIDO W. "Sources of Identifying Information in Evalu- ation Models." NBER Technical Working Paper 117, Dec. 1991.

ANGRIST, JOSHUA D.; IMBENS, GUIDO W. AND RUBIN, DONALD B. "Identification of Causal Effects Using Instrumental Variables," J. Amer. Statist. Assoc., June 1996, 91(434), pp. 444-55.

ASHENFELTER, ORLEY C. "Estimating the Effect of Training Programs on Earnings," Rev. Econ. Statist., Feb. 1978, 60(1), pp. 47-57.

ASHENFELTER, ORLEY AND CARD, DAVID. "Using the Longitudinal Structure of Earnings to Esti- mate the Effect of Training Programs," Rev. Econ. Statist., Nov. 1985, 67(4), pp. 648-60.

AUSPOS, PATRICIA; CAVE, GEORGE AND LONG, DAVID. Maine: Final report on the training op- portunities in the private sector program. New York. Manpower Demonstration Research Cor- poration, 1988.

BARNOW, BURT S. "The Impact of CETA Pro- grams on Earnings: A Review of the Litera- ture," J. Human Res., Spring 1987, 22(2), pp. 157-93.

. "Government Training as a Means of Re- ducing Unemployment," in Rethinking employ- ment policy. Eds.: D. LEE BAWDEN AND FELICITY SKIDMORE. Washington, DC: Urban Institute Press, 1989, pp. 109-35.

BARNOW, BURT S.; CAIN, GLEN G. AND GOLD- BERGER, ARTHUR S. "Issues in the Analysis of Selectivity Bias," Evaluation studies review annual. Vol. 5. Edited by ERNST W. STROMSDORFER AND GEORGE FARKAS. Bev- erly Hills, CA and London: Sage Pub., 1980, pp. 43-59.

BASSI, LAURIE J. "The Effect of CETA on the Postprogram Earnings of Participants," J. Hu- man Res., Fall 1983, 18(4), pp. 539-56.

. "Estimating the Effect of Training Pro- grams with Non-Random Selection," Rev. Econ. Statist. Feb. 1984, 66(1), p. 36-43.

- . "Estimating the Effect of Job Training Programs, Using Longitudinal Data: Ashenfel- ter's Findings Reconsidered: A Comment," J. Human Res., Spring 1987, 22(2), pp. 300-03.

BASSI, LAURIE J. ET AL. Measuring the effect of CETA on youth and the economically disadvan- taged. Final Report prepared for the U.S. Department of Labor under Contract No. 20- 11-82-19. Washington, DC: Urban Institute, 1984.

BELL, STEPHEN H. AND ORR, LARRY L. "Is Subsi- dized Employment Cost Effective for Welfare Recipients? Experimental Evidence from Seven State Demonstrations," J. Human Res., Winter 1994, 19(1) , pp. 42-61.

BELL, STEPHEN H. ET AL. Program applicants as a comparison group in evaluating training pro- grams. Kalamazoo, MI: W.E. Upjohn Institute or Employment Research, 1995.

BETSEY, CHARLES L.; HOLLISTER, ROBINSON G., JR., AND PAPAGEORGIOU, MARY R., eds. Youth employment and training programs: the YEDPA years. Washington, DC: National Academy Press, 1985.

BISHOP, RICHARD C. AND HEBERLEIN, THOMAS A. "The Contingent Valuation Method," in Eco- nomic valuation of natural resources: Issues, theory, and applications. Eds.: REBECCA L. JOHNSON AND GARY V. JOHNSON,. Boulder, CO: Westview Press, 1990, pp. 81-104.

BJORKLUND, ANDERS AND MOFFITT, ROBERT. "The Estimation of Wage Gains and Welfare Gains in Self-selection Models," Rev. Econ. Sta- tist., Feb. 1987, 69(1), pp. 42-49.

BLOMQuIST, JOHN D. The Ohio Transitions to In- dependence Demonstration: Report on program costs and benefits. Bethesda, MD: Abt Assoc., Inc., 1994.

BLOOM, HOWARD S. "Accounting for No-Shows in Experimental Evaluation Designs," Evaluation Review, Apr. 1984a, 8(2), pp. 225-46.

. "Estimating the Effect of Job-Training Programs, Using Longitudinal Data: Ashenfel- ter's Findings Reconsidered," J. Human Res., Fall, 1984b, 19(4), pp 544-56.

. "What Works For Whom?: CETA Impacts for Adult Participants," Evaluation Review, Aug. 1987, 11(4), pp. 510-27.

BLOOM, HOWARD S. ET AL. "The Benefits and Costs of JTPA Title II-A Programs: Key Find- ings From the National JTPA Study," J. Human Res., Summer 1997, 32(3), pp. 549-76.

BOARDMAN, ANTHONY E. ET AL. Cost-benefit analysis: Concepts and practice. Upper Saddle River, NJ: Prentice Hall, 1996.

BORUS, MICHAEL E. "A Benefit-Cost Analysis of the Economic Effectiveness of Retraining the Unemployed," Yale Econ. Essays, Fall 1964,4(2), pp. 371-429.

BRYANT, EDWARD C. AND RuPP, KALMAN. "Evaluating the Impact of CETA on Participant

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1852 Journal of Economic Literature, Vol. XXXV (December 1997)

Earnings," Evaluation Review, Aug. 1987, 11(4), pp. 473-92.

BURGHARDT, JOHN ET AL. Evaluation of the mi- nority female single parent demonstration. Vol. I, Summary Report. New York: The Rockefeller Foundation, Oct. 1992.

BURTLESS, GARY. "The Case for Randomized Field Trials in Economic and Policy Research," J. Econ. Perspectives, Spring 1995, 9(2), pp. 63- 84.

BURTLESS, GARY AND ORR, LARRY L. "Are Classi- cal Experiments Needed for Manpower Pol- icy?" J. Human Res., Fall 1986, 21(4), pp. 606- 39.

CAIN, CLEN G. "Benefit-Cost Estimates for Job Corps." Discussion Paper No. 9-68, Institute for Research on Poverty, U. of Wisconsin, Madison, Sept. 1968.

-. "Regression and Selection Models to Im- prove Nonexperimental Comparisons," in Eval- uation and experiment. Eds.: CARL A. BEN- NETT AND ARTHUR A. LUMSDAINE. New York: Academic Press, 1975, pp. 297-317.

CARD, DAVID E.; ROBINS, PHILIP K. AND LIN, WINSTON. How important are 'entry effects' in financial incentive programs for welfare recipi- ents. Ottawa, Ont.: Social Research and Dem- onstration Corporation, Aug. 1997.

CARD, DAVID AND SULLIVAN, DANIEL G. "Meas- uring the Effect of Subsidized Training Pro- grams on Movements in and out of Employ- ment," Econometrica, May 1988, 56(3), pp. 497-530.

CAVE, GEORGE ET AL. JOBSTART: Final report on a program for school dropouts. New York, Manpower Demonstration Research Corpora- tion, Oct. 1993.

CHANG, FISHER. "Evaluating the Impact of Man- datory Work Programs on Two-Parent Welfare Caseloads." Unpublished doctoral dissertation. Baltimore: U. of Maryland Baltimore County, 1996.

CLEMENTS, NANCY; HECKMAN, JAMES AND SMITH, JEFFREY. "Making the Most Out of So- cial Experiments: Reducing the Intrinsic Un- certainty in Evidence from Randomized Trials with an Application to the National JTPA Ex- periment." National Bureau of Economic Re- search Technical Paper 149, Jan. 1994.

COHEN, MALCOLM S. "The Direct Effects of Fed- eral Manpower Programs in Reducing Unem- ployment," J. Human Res., Fall 1969, 4(1), pp. 491-507.

CONLISK, JOHN. "Choice of Sample Size in Evalu- ating Manpower Programs: Comment on Pitcher an d Stafford," in Research in Labor Economics. Ed.: FARRELL E. BLOCH. Supple- ment 1, 1979, pp. 79-96.

COOLEY, THOMAS M.; MCGUIRE, TIMOTHY W. AND PRESCOTT, EDWARD C. "Earnings and Employment Dynamics of Manpower Trainees: An Econometric Analysis," in Research in Labor Economics. Eds: FARRELL E. BLOCH. Supple- ment 1, 1979, pp. 119-47.

COUCH, KENNETH A. "New Evidence on the Long-Term Effects of Employment Training Programs," J. Lab. Econ. , Oct. 1992,10(4), pp. 380-88.

DEHEJIA, RAJEEV H. AND WAHBA, SADEK. "Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Pro- grams." Unpublished paper. Nov. 1995.

DICKINSON, KATHERINE P.; JOHNSON, TERRY R. AND WEST, RICHARD W. An analysis of the im- pact of CETA programs on components of earn- ings and other out-comes. Final Report prepared for the U.S. Department of Labor, Employment and Training Administration, Menlo Park, CA: SRI International, Nov. 1984.

. "An Analysis of the Impact of CETA Pro- grams on Participants' Earnings," J. Human Res., Winter 1986, 21(1), pp. 64-91.

". The Impact of CETA Programs on Com- ponents of Participants' Earnings," Ind. Lab. Relat. Rev., Apr. 1987a, 40(3), pp. 430-41.

. "An Analysis of the Sensitivity of Quasi- Experimental Net Impact Estimates of CETA Programs," Evaluation Review, Aug. 1987b, 11(4), pp. 452-72.

FEIN, DAVID J.; BEECROFT, ERIK AND BLOMQUIST, JOHN D. The Ohio Transitions to Independence Demonstration: Final impactsfor JOBS and Work Choice. Bethesda, MD: Abt As- sociates, Inc., 1994.

FINIFTER, DAVID H. "An Approach to Estimating Net Earnings Impact of Federally Subsidized Employment and Training Programs, Evalu- ation Review, Aug. 1987, 11(4), pp- 528-47.

FRAKER, THOMAS AND MAYNARD, REBECCA. "The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Pro- grams," J. Human Res., Spring 1987, 22(2), pp. 194-227.

FREEDMAN, STEPHEN; BRYANT, JAN AND CAVE, GEORGE. New Jersey: Final report on the grant diversion project. New York: Manpower Dem- onstration Research Corporation, 1988.

FREEDMAN, STEPHEN ET AL. The GAIN evalu- ation: Five-year impacts on employment, earn- ings, and AFDC receipt. New York: Manpower Demonstration Research Corporation, July 1996.

FRIEDLANDER, DANIEL. Maryland: Supplemental report on the Baltimore Options Program. New York: Manpower Demonstration Research Cor- poration, 1987.

. "Subgroup Impacts of Large-Scale Wel- fare Employment Programs," Rev. Econ. Sta- tist., Feb. 1993, 75(1), pp. 138-43.

FRIEDLANDER, DANIEL AND BURTLESS, GARY. Five years after: The long-term effects of wel- fare-to-work programs. New York: Russell Sage Foundation, 1995.

FRIEDLANDER, DANIEL AND GUERON, JUDITH M. "Are High-Cost Services More Effective Than Low-Cost Services?" in CHARLES F. MAN- SKI AND IRWIN GARFINKEL, eds. 1992, pp. 143-98.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1853

FRIEDLANDER, DANIEL AND HAMILTON. GAYLE. The Saturation Work Initiative Model in San Diego: A five-year follow-up study. New York: Manpower Demonstration Research Corpora- tion, 1993.

"The Impact of a Continuous Participation Obligation in a Welfare Employment Program," J. Human Res., Fall 1996, 31(4), pp. 734-56.

FRIEDLANDER, DANIEL AND MARTINSON, KARIN. "Effects of Mandatory Basic Education for Adult AFDC Recipients," Educational Evaluation and Policy Analysis, Winter 1996, 18(4), pp. 327-37.

FRIEDLANDER, DANIEL AND ROBINS, PHILIP K. "Evaluating Program Evaluations: New Evi- dence on Commonly Used Nonexperimental Methods," Amer. Econ. Rev., Sept. 1995, 85(4), pp. 923-37.

. "The Distributional Impacts of Social Pro- grams," Evaluation Review, Oct. 1997, 21(5), pp. 531-53.

FRIEDLANDER, DANIEL ET AL. Arkansas: Final report on the WORK program in two counties. New York: Manpower Demonstration Research Corporation, Sept. 1985.

West Virginia: Final report on the Com- munity Work Experience Demonstrations. New York: Manpower Demonstration Research Cor- poration, 1986.

. Illinois: Final report on job search and work experience in Cook County. New York: Manpower Demonstration Research Corpora- tion, 1987.

GARFINKEL, IRWIN; MANSKI, CHARLES F. AND MICHALOPOULOS, CHARLES. "Micro Experi- ments and Macro Effects," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 253-73.

GAY, ROBERT S. AND BORUS, MICHAEL E. "Vali- dating Performance Indicators for Employment and Training Programs," J. Human Res., Winter 1980,15(1), pp. 29-48.

GOLDBERGER, ARTHUR S. "Selection Bias in Evaluating Treatment Effects." Discussion Pa- per 123-72, Institute for Research on Poverty, U. of Wisconsin, Madison, 1972.

GOLDMAN, BARBARA S. Impacts of the immediate job search assistance experiment. New York: Manpower Demonstration Research Corpora- tion, 1981.

GOLDMAN, BARBARA; FRIEDLANDER, DANIEL AND LONG, DAVID. California: Final report on the San Diego job search and work experience demonstration. New York: Manpower Demon- stration Research Corporation, 1986.

GREENBERG, DAVID. "The Leisure Bias in Cost- Benefit Analyses of Employment and Training Programs," J. Human Res., Spring 1997, 32(2), pp. 413-39.

GREENBERG, DAVID; MEYER, ROBERT H. AND WISEMAN, MICHAEL. "Prying the Lid from the Black Box: Plotting Evaluation Strategy for Welfare Employment and Training Programs." Discussion Paper 989-93. Madison, WI: U. of

Wisconsin Institute for Research on Poverty, 1993.

. "Multisite Employment and Training Evaluations: A Tale of Three Studies," Ind Lab. Relat. Rev., July 1994, 47(4), pp. 679-91.

GUERON, JUDITH M. AND PAULY, EDWARD. From welfare to work. New York: Russell Sage Foundation, 1991.

HAM, JOHN C. AND LALONDE, ROBERT J. "Using Social Experiments to Estimate the Effect of Training on Transition Rates," in Panel data and labor market studies. Eds.: Joop HARTOG, GEERT RIDDER, AND JULES THEEUWES. North Holland: Elsevier Science Pub., 1990, pp. 157- 72

. "The Effect of Sample Selection and Ini- tial Conditions in Duration Models: Evidence from Experimental Data on Training," Econo- metrica, Jan. 1996, 64(1), pp. 175-205.

HAMERMESH, DANIEL S. "The Secondary Effects of Manpower Programs," Econ. Bus. Bull., Spring-Summer 1972, 24(3), pp. 18-26.

HAMILTON, GAYLE. Interim report on the Satura- tion Work Initiative Model in San Diego. New York: Manpower Demonstration Research Cor- poration, Aug. 1988.

HAMILTON, GAYLE AND FRIEDLANDER, DANIEL. Final report on the Saturation Work Initiative Model in San Diego. New York: Manpower Demonstration Research Corporation, Nov. 1989.

HAMILTON, GAYLE ET AL. Evaluating two wel- fare-to-work approaches: Two-year findings on the labor force attachment and human capital development programs in three sites. New York: Manpower Demonstration Research Corpora- tion, forthcoming.

HARDIN, EINAR AND BORUS, MICHAEL E. Eco- nomic benefits and costs of retraining courses in Michigan. East Lansing, MI: School of Labor and Industrial Relations, Michigan State U., Dec. 1969.

HAUSMAN, JERRY A., ed. Contingent evaluation: A critical assessment, New York: North-Holland, 1993.

HECKMAN, JAMES J. "Dummy Endogenous Vari- ables in a Simultaneous Equation System," Econometrica, July 1978, 46(3), pp. 931-59.

- . "Randomization and Social Policy Evalu- ation," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 201-30.

. "Instrumental Variables: A Study of Im- plicit Behavioral Assumptions in One Widely- Used Estimator." Unpublished manuscript, Jan. 18, 1996.

HECKMAN, JAMES J. AND HOTZ, V. JOSEPH. "Choosing among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs," J. Amer. Statist. Assoc., Dec. 1989, 84(408), pp. 862-74.

HECKMAN, JAMES J. AND ROBB, RICHARD, JR. "Alternative Methods for Evaluating the Impact of Interventions: An Overview," J. Econo- metrics, Oct./Nov. 1985, 30(1-2), pp. 239-67.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

1854 Journal of Economic Literature, Vol. XXXV (December 1997)

HECKMAN, JAMES J.; ROSELIUS, REBECCA L. AND SMITH, JEFFREY A. "U.S. Education and Train- ing Policy: A Re-Evaluation of the Underlying Assumptions Behind the "New Consensus'," in Labor markets, employment policy, and job creation. Eds.: LEWIS C. SOLMON AND ALEC R. LEVENSON. Boulder, CO: Westview Press, 1994, pp.83-121

HECKMAN, JAMES J. AND SMITH, JEFFREY A. "As- sessing the Case for Social Experiments," J. Econ. Perspectives, Spring 1995, 9(2), pp. 85- 110.

HECKMAN, JAMES; SMITH, JEFFREY AND TABER, CHRISTOPHER. "Accounting for Dropouts in Evaluations of Social Experiments." Economic Research Center, NORC Discussion Paper 94/3 May 1994.

HECKMAN, JAMES J. ET AL. "Characterizing Selec- tion Bias Using Experimental Data," Econo- metrica, forthcoming.

HOLLISTER, ROBINSON G., JR. AND HILL, JEN- NIFER. 'Problems in the Evaluation of Commu- nity-Wide Initiatives," in New approaches to evaluating community initiatives: Concepts, methods, and contexts. Eds.: JAMES P. CON- NELL ET AL. Washington, DC: Aspen Institute, 1995, pp.127-72.

HOLLISTER, ROBINSON G., JR.; KEMPER, PETER AND MAYNARD, REBECCA, eds. The National Supported Work Demonstration. Madison: U. of Wisconsin Press, 1984.

HOTZ, V. JOSEPH. "Designing an Evaluation of the Job Training Partnership Act," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, PP. 76-114.

HOTZ, V. JOSEPH AND SANDERS, SETH G. "Bounding Treatment Effects in Controlled and Natural Experiments Subject to Post-Randomi- zation Treatment Choice." Population Research Center, NORC Discussion Paper 94/2, Mar. 1994.

JACOBSON, LOUIS S. ET AL. "The Returns to Classroom Training for Displaced Workers." Chicago: Federal Reserve Bank of Chicago Working Paper, Macroeconomic Issues WP 94- 27, Oct. 1994.

JOHNSON, GEORGE. "The Labor Market Displace- ment Effect in the Analysis of the Net Impact of Manpower Training Programs," in Research in Labor Economics, Stupp[ement 1, 1979, pp. 227-54.

JOHNSON, TERRY R.; KLEPINGER, DANIEL H. AND DONG, FRED B. "Preliminary Evidence from the Oregon Welfare Reform Demonstra- tion." Unpublished paper, June 1990.

KATZ, LAWRENCE F. "Active Labor Market Poli- cies to Expand Employment Opportunity," in Reducing unemployment: Current issues and policy options. Proceedings from a Symposium sponsored by the Federal Reserve Bank of Kan- sas City, Jackson Hole, Wyoming, Aug. 1994,

pp.239-90. KEMPER, PETER; LONG, DAVID A. AND THORN-

TON, CRAIG. The supported work evaluation:

Final benefit-cost analysis. New York: Man- power Demonstration Research Corporation, 1981.

KEMPLE, JAMES J.; FRIEDLANDER, DANIEL AND FELLERATH, VERONICA. Florida's Project Inde- pendence: Benefits, costs, and two-year impacts of Florida's Jobs program. New York; Man- power Demonstration Research Corporation, Apr. 1995.

KIEFER, NICHOLAS M. "Federally Subsidized Oc- cupational Training and the Employment and Earnings of Male Trainees," J. Econometrics, Aug. 1978, 8(1), pp. 111-25.

.."The Economic Benefits from Four Gov- ernment Training Programs," in Research in Labor Economics, Supplement 1. Ed.: FARRELL E. BLOCH. 1979, pp. 159-86.

LALONDE, ROBERT J. "Evaluating the Econo- metric Evaluations of Training Programs with Experimental Data," Amer. Econ. Rev., Sept. 1986, 76(4), pp. 604-20.

- . "The Promise of Public Sector-Sponsored Training Programs. J. Econ. Perspectives, Spring 1995, 9(2), pp. 149-68.

LALONDE, ROBERT J. AND MAYNARD, REBECCA. "How Precise are Evaluations of Employment and Training Programs?: Evidence from Field Experiment," Evaluation Review, Aug. 1987, 11(4), pp. 428-51.

LONG, SHARON K. AND WISSOKER, DOUGLAS. "Welfare Reform at Three Years: The Case of Washington State's Family Independence Pro- gram," J. Human Res., Fall 1995, 30(4), pp. 766-90.

MALLAR, CHARLES D. "Alternative Econometric Procedures for Program Evaluations: Illustra- tions from an Evaluation of Job Corps," Ameri- can Statistical Association, Proceedings of the Business and Economics Statistics Section, 1979, pp. 317-21.

MALLAR, CHARLES ET AL. Evaluation of the eco- nomic impact of the Job Corps program: Third follow-up report. Princeton: Mathematica Pol- icy Research, Sept. 1982.

MANSKI, CHARLES F. "What Do Controlled Ex- periments Reveal About Outcomes When Treatments Vary?" Institute for Research on Poverty Discussion Paper # 1005-93. U. of Wis- consin-Madison, June 1993.

- . "Learning About Social Programs From Experiments With Random Assignment of Treatments," Institute for Research on Poverty Discussion Paper # 1061-95. U. of Wisconsin, Madison, Mar. 1995.

MANSKI, CHARLES F. AND GARFINKEL, IRWIN, eds. Evaluating welfare and training programs. Cambridge and London: Harvard U. Press, 1992a.

. Introduction," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds., 1992b, pp. 1-22.

MOFFITT, ROBERT A. "Program Evaluation with Nonexperimental Data," Evaluation Review, June 1991, 15(3), pp. 291-314.

- . "Evaluation Methods for Program Entry

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions

Friedlander, Greenberg, and Robins: Evaluating Training Programs 1855

Effects," in CHARLES F. MANSKI AND IRWIN GARFINKEL, eds. 1992, pp. 231-52.

. "The Effect of Employment and Training Programs on Entry and Exit From the Welfare Caseload," J. Pol. Anal. Manage., Winter 1996, 15(1), pp. 32-50.

NIGHTINGALE, DEMETRA SMITH ET AL. Evalu- ation of the Massachusetts Employment and Training (ET) Program. Urban Institute Report 91-1. Washington, DC: Urban Institute Press, 1991.

ORR, LARRY L. ET AL. Does training for the disad- vantaged work? Evidence from the National JTPA Study. Washington, DC: Urban Institute Press, 1996.

PERRY, CHARLES R. ET AL. The impact of govern- ment manpower programs in general, and on minorities and women. Philadelphia, PA: Indus- trial Research Unit, Wharton School, U. of Pennsylvania, 1975.

PHILLIPS, ELIZABETH H. "The Effect of Manda- tory Work and Training Programs on Welfare Entry: The Case of GAIN in California." Un- published Ph.D. dissertation. Madison: U. of Wisconsin, 1993.

PUMA, MICHAEL J. AND BURSTEIN, NANCY R. "The National Evaluation of the Food Stamp Employment and Training Program," J. Pol. Anal. Manage., Spring 1994, 13(2), pp. 311- 30.

QUINT, JANICE C. ET AL. New Chance: Interim findings on a comprehensive programfor disad- vantaged young mothers and their children. New York: Manpower Demonstration Research Corporation, Sept. 1994.

RICCIO, JAMES; FRIEDLANDER, DANIEL AND FREEDMAN, STEPHEN. GAIN: Benefits, costs, and three-year impacts of a welfare-to-work program. New York: Manpower Demonstration Research Corporation, 1994.

RICCIo, JAMES ET AL. Final report on the Virginia Employment Services Program. New York: Manpower Demonstration Research Corpora- tion, Aug. 1986.

ROSENBAUM, PAUL AND RUBIN, DONALD B. "The Central Role of the Propensity Score in Ob- servational Studies for Causal Effects," Bio- metrika, Apr. 1983, 70(1), pp. 41-55.

RUBIN, DONALD B. "Matching to Remove Bias in Observational Studies," Biometrics, Mar. 1973, 29(1), pp. 159-83.

. "Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observation Studies," J. Amer. Statist. Assoc., June 1979, 74(366), pp. 318-28.

SANDELL, STEVEN H. AND RuPP, KALMAN. Who is served in JTPA programs: Patterns of partici- pation and intergroup equity. Washington, DC: National Commission for Employment Policy, Feb. 1988.

SCHILLER, BRADLEY R. AND BRASHER, C. NIEL- SEN. "Effects of Workfare Saturation on AFDC Caseloads," Contemporary Policy Issues, Apr. 1993, 11(1), pp. 39-49.

STAFFORD, FRANK P. "A Decision Theoretic Ap- proach to the Evaluation of Training Programs," in Research in Labor Economics, Supplement 1. Ed.: FARRELL E. BLOCH. Greenwich, CT, 1979, pp. 9-35.

STROMSDORFER, ERNST W. "Determinants of Economic Success in Retraining the Unem- ployed: The West Virginia Experience," J. Hu- man Res., Spring 1968, 3(2), pp. 139-58.

STROMSDORFER, ERNST ET AL. Recommendations of the Job Training Longitudinal Survey Re- search Advisory Panel. Report prepared for the Office of Strategic Planning and Policy Devel- opment, Employment and Training Administra- tion. Washington, DC: U.S. Department of La- bor, 1985.

THISTLETHWAITE, DONALD L. AND CAMPBELL, DONALD T. "Regression-Discontinuity Analysis: An Alternative to the ex post facto Experiment," J. Educational Psychology, 1960, 51, pp. 309- 17.

U.S. GENERAL ACCOUNTING OFFICE. Multiple employment training programs: Information crosswalk on 163 employment training pro- grams. Washington, DC: U.S. GPO., Feb. 14, 1995.

- . Job Training Partnership Act: Long-term earnings and employment outcomes. Washing- ton, DC: U.S. GPO, Mar. 1996.

WESTAT, INC. Continuous longitudinal manpower survey: Summary of net impact results. Report MEL 84-02 prepared for the U.S. Department of Labor under contract No. 23-24-75-07. Rockville, MD: Westat, Inc., Apr. 1984.

WISSOKER, DOUGLAS A. AND WATTS, HAROLD W. The impact of FIP on AFDC caseloads. Washington, DC: Urban Institute, June 1994.

WOLFHAGEN, CARL F. Job search strategies: Les- sons from the Louisville WIN laboratory. New York: Manpower Demonstration Research Cor- poration, 1983.

ZAMBROWSKI, AMY AND GORDON, ANNE. Evalu- ation of the Minority Female Single Parent Demonstration: Fifth-year impacts at CET. Princeton: Mathematica Policy Research, Inc., Dec. 1993.

This content downloaded from 129.171.178.62 on Tue, 11 Nov 2014 11:06:35 AMAll use subject to JSTOR Terms and Conditions